I have a LARGE number of bookmarks and wanted to export them and share them with a group I work with. The issue is that when I export them, there are ADD_DATE, LAST_MODIFIED, ICON, and ICON_URI fields added by the browser (Firefox). To make it worse, only some entries have the ICON and ICON_URI fields. I was hoping to just use cut or awk to pull the fields I want but the lack of a space before the >(website_name) is making that difficult. And my regex skills are weak. I figured I need to remove the ICON fields first, then remove the DATE and MODIFIED fields. The few other parts of the file are easy enough to cut and paste or make manual edits to.
So what I need help with is two things: How do I delete the ICON and ICON_URI fields from the line, if they are there?
And how do I add a single space before the second to last > at the end of the lines so that I can use cut or awk to pull out the fields I want into a new file?
Ex: 123456">SecurityTrails would become 123456 >SecurityTrails
Please see below for examples of what I'm working with. Any help is greatly appreciated!
<DT><A HREF="https://securitytrails.com/" ADD_DATE="1586881447" LAST_MODIFIED="1612650221">SecurityTrails</A> <DT><A HREF="https://domainbigdata.com/" ADD_DATE="1586880165" LAST_MODIFIED="1612650221" ICON_URI="https://domainbigdata.com/img/favicons/favicon.png" ICON="data:image/png;base64,iVBORw0KGgoA-truncated-">DomainBigData</A> https://stackoverflow.com/questions/66681906/parsing-bookmarks-file March 18, 2021 at 06:04AM
没有评论:
发表评论