2021年3月24日星期三

Java - character encoding confusion on Windows

I have a simple Java program that takes in hex and converts it to ASCII. Using JDK 8, I compiled the following:

import java.nio.charset.Charset;  import java.util.Scanner;    public class Main   {      public static void main(String[] args)       {          System.out.println("Charset: " + Charset.defaultCharset());          Scanner in = new Scanner(System.in);          System.out.print("Type a HEX string: ");          String s = in.nextLine();          String asciiStr = new String();                    //  Split the string into an array          String[] hexes = s.split(":");                    //  For each hex          for (String hex : hexes) {              //  Translate the hex to ASCII              System.out.print(" " + Integer.parseInt(hex, 16) + "|" + (char)Integer.parseInt(hex, 16));              asciiStr += ((char) Integer.parseInt(hex, 16));          }                    System.out.println("\nthe ASCII string is " + asciiStr);                    in.close();      }  }  

I am passing in a hex string of C0:A8:96:FE to the program. My main concern is the 0x96 value, because it is defined as a control character (characters in the range of 128 - 159).

The output when I run the program without any JVM flags is the following:

Charset: windows-1252  Type a HEX string: C0:A8:96:FE   192|À 168|¨ 150|? 254|þ  the ASCII string is À¨?þ  

The output when I use the JVM flag -Dfile.encoding=ISO-8859-1 to set the character encoding appears to be the following:

Charset: ISO-8859-1  Type a HEX string: C0:A8:96:FE   192|À 168|¨ 150|– 254|þ  the ASCII string is À¨–þ  

I'm wondering why, when the character encoding is set to ISO-8859-1, I get the extra Windows-1252 characters for characters 128 - 159? These characters shouldn't be defined in ISO-8859-1, but should be defined in Windows-1252, but it is appearing to be backwards here. In ISO-8859-1, I would think that the 0x96 character is supposed to be encoded as a blank character, but that is not the case. Instead, the Windows-1252 encoding does this, when it should properly encode it as a . Any help here?

https://stackoverflow.com/questions/66790488/java-character-encoding-confusion-on-windows March 25, 2021 at 06:45AM

没有评论:

发表评论