2021年1月5日星期二

How does the decoding works in Javascript TextDecoder with Asian charaters?

let uint8Array = new Uint8Array([228, 189, 160, 229, 165, 189]);    alert( new TextDecoder().decode(uint8Array) ); // 你好  

How does the encoding of this ended up to be an Asian character?

As I know the UTF-8 is 8 bit. So if I look at utf-8 charset map then I don't any Asian characters till 255.

On investigating the bits

  1. finding bits for the input
    [228, 189, 160, 229, 165, 189].map(i => parseInt(i).toString(2))      // ["11100100", "10111101", "10100000", "11100101", "10100101", "10111101"]  
  1. finding bits for the output
    '你好'.split('').map((e,index) => '你好'.charCodeAt(index).toString(2) )      // ["100111101100000", "101100101111101"]  

Things that are a mystery to me:

  1. total bits in the input are 48 while total bits in output are 30. Why?
  2. Also the bits pattern match at some places but not as whole. Like for 3rd and 6th element in input bit array matches the output bits array.

Is there something i am missing? Feel free to correct me

https://stackoverflow.com/questions/65585672/how-does-the-decoding-works-in-javascript-textdecoder-with-asian-charaters January 06, 2021 at 03:54AM

没有评论:

发表评论