Problem description
A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:For 1-byte character, the first bit is a 0, followed by its unicode code.For n-bytes character, the first n-bits are all one’s, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.
| Char. number range(hexadecimal) | UTF-8 octet sequence (binary) |
| ——————————- | —————————– | — | — |
| 0000 0000-0000 007F | 0xxxxxxx |
| 0000 0080-0000 07FF | 110xxxxx 10xxxxxx |
| 0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx |
| 0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx |
is how the UTF-8 encoding would work:
Examples
1 | Example 1: |
1 | Example 2: |
Solution
看着题目发了好久的呆,实在读不懂题目,UTF-8 编码的意思是读懂了,可是感觉给的样例对应不起来。看着解析的代码一步步分析终于得出意思了。样例中数组的每个数值x的二进制若以0开头,
则x表示一个unicode码。若以110开头,接下来的一个数字都必须以10开头。若以1110开头,接下来的两个数字都必须以10开头。若以11110开头,接下来的三个数字都必须以10开头。
Code
1 | public boolean validUtf8(int[] data) { |