let uint8Array = new Uint8Array([228, 189, 160, 229, 165, 189]); alert( new TextDecoder().decode(uint8Array) ); // 你好 How does the encoding of this ended up to be an Asian character?
As I know the UTF-8 is 8 bit. So if I look at utf-8 charset map then I don't any Asian characters till 255.
On investigating the bits
- finding bits for the input
[228, 189, 160, 229, 165, 189].map(i => parseInt(i).toString(2)) // ["11100100", "10111101", "10100000", "11100101", "10100101", "10111101"] - finding bits for the output
'你好'.split('').map((e,index) => '你好'.charCodeAt(index).toString(2) ) // ["100111101100000", "101100101111101"] Things that are a mystery to me:
- total bits in the input are 48 while total bits in output are 30. Why?
- Also the bits pattern match at some places but not as whole. Like for 3rd and 6th element in input bit array matches the output bits array.
Is there something i am missing? Feel free to correct me
https://stackoverflow.com/questions/65585672/how-does-the-decoding-works-in-javascript-textdecoder-with-asian-charaters January 06, 2021 at 03:54AM
没有评论:
发表评论