2021年4月23日星期五

How to group text by its format in Python

Hi my grandfather has been spending the last decade of his life translating and writing his commentary on an ancient biblical text (ancient Hebrew to English). He stores his work in about 1000 google documents. On each page there are about 10-15 different types of formatted text ( eg. original hebrew text, English translation, summary of translation, his commentary, footnotes etc.) and all these different types of formats have been uniformly used throughout all 1000 pages.I am trying to organize his text so that he can make a dynamic and intuitive webpage to show his work, but in order to do this I was to group each type of format used on each page. For example if something is written in Times New Roman Bold size 12 I want to classify that portion of words as 'Title'. I am pretty sure I need to export the information as HTML but not sure where to go from there. I am pretty savvy with python but dont have much experience parsing text data/html files. Thanks!

https://stackoverflow.com/questions/67238445/how-to-group-text-by-its-format-in-python April 24, 2021 at 09:04AM

没有评论:

发表评论