2021年3月5日星期五

Removing a control character using Python

I have a script that processes the output of a command (the aws help cli command).

I step through the output line-by-line and don't start the actual real parsing until I encounter the text "AVAILABLE COMMANDS" at which point I set a flag to true and start further processing on each line.

I've had this working fine - BUT on Ubuntu we encounter a problem which is this : The CLI HIGHLIGHTS the text in a way I have not seen before :

The output is very long, so I've grep'd the particular line in question - see below :

aws ec2 help | egrep '^A' AVAILABLE COMMANDS ubuntu@ip-172-31-35-188:~/bin$ aws ec2 help | egrep '^A' | cat -vet A^HAV^HVA^HAI^HIL^HLA^HAB^HBL^HLE^HE C^HCO^HOM^HMM^HMA^HAN^HND^HDS^HS$

What I haven't seen before is that each letter that is highligted is in the format X^HX. I'd like to apply a simple transformation of the type X^HX --> X (for all a-zA-Z).

What have I tried so far : well my workaround is this - first I remove control characters like this : String = re.sub(r'[\x00-\x1f\x7f-\x9f]','',String) but I still have to search for 'AAVVAAIILLAABBLLEE' which is totally ugly. I considered using a further regex to turn doubles to singles but that will catch true doubles and get messy.

I started writing a function with an iteration across a constructed list of alpha characters to translate as described, and I used hexdump to try to figure out the exact \x code of the control characters in question but could not get it working - I could remove H but not the ^.

I really don't want to use any additional modules because I want to make this available to people without them having to install extras. In conclusion I have a workaround that is quite ugly, but I'm sure someone must know a quick an easy way to do this translation. It's odd that it only seems to show up on Ubuntu.

Any help greatly appreciated.

Peter.

https://stackoverflow.com/questions/66501394/removing-a-control-character-using-python March 06, 2021 at 09:06AM

没有评论:

发表评论