我从PHP.net站点有以下函数来确定ASCII和UTF-8字符串中的字节数:
<?PHP /** * Count the number of bytes of a given string. * Input string is expected to be ASCII or UTF-8 encoded. * Warning: the function doesn't return the number of chars * in the string,but the number of bytes. * * @param string $str The string to compute number of bytes * * @return The length in bytes of the given string. */ function strBytes($str) { // STRINGS ARE EXPECTED TO BE IN ASCII OR UTF-8 FORMAT // Number of characters in string $strlen_var = strlen($str); // string bytes counter $d = 0; /* * Iterate over every character in the string,* escaping with a slash or encoding to UTF-8 where necessary */ for ($c = 0; $c < $strlen_var; ++$c) { $ord_var_c = ord($str{$d}); switch (true) { case (($ord_var_c >= 0x20) && ($ord_var_c <= 0x7F)): // characters U-00000000 - U-0000007F (same as ASCII) $d++; break; case (($ord_var_c & 0xE0) == 0xC0): // characters U-00000080 - U-000007FF,mask 110XXXXX // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 $d+=2; break; case (($ord_var_c & 0xF0) == 0xE0): // characters U-00000800 - U-0000FFFF,mask 1110XXXX // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 $d+=3; break; case (($ord_var_c & 0xF8) == 0xF0): // characters U-00010000 - U-001FFFFF,mask 11110XXX // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 $d+=4; break; case (($ord_var_c & 0xFC) == 0xF8): // characters U-00200000 - U-03FFFFFF,mask 111110XX // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 $d+=5; break; case (($ord_var_c & 0xFE) == 0xFC): // characters U-04000000 - U-7FFFFFFF,mask 1111110X // see http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 $d+=6; break; default: $d++; } } return $d; } ?>
然而,当我尝试这与俄罗斯(例如Посвоейприродекомпьютерымогутработатьлишьсчислами.Идлятого,чтобыонимоглихранитьвпамятибуквыилидругиесимволы,каждомутакомусимволудолжнобытьпоставленовсоответствиечисло).它似乎没有返回正确的字节数.
switch语句使用默认条件.任何想法为什么俄罗斯人物不会按预期工作?或者会有更好的选择.
我问这个,因为我需要将UTF-8字符串缩短为一定数量的字节.即我只能发送最大值在我的情况下,169个字节的JSON数据到iPhone APNS(不包括其他数据包数据).
I am asking this as I need to shorten
a utf-8 string to a certain number of
bytes.
mb_strcut()
就是这样做的,尽管你可能无法从几乎无法理解的文档中辨别出来.