2021年3月7日星期日

How to extract HTML text which has no tags using Beautifulsoup?

I want to scrape text from a website. However, it has no HTML tags and I therefore do not know how to grab it. Here is the HTML code:

<div class="card-body">    <p><strong>Número de item:</strong> <label id="itemNumber2">46369</label></p>       0 g de grasas trans. Sin colesterol. 4.73 l.  </div>  

The text I wish to obtain is " 0 g de grasas trans. Sin colesterol. 4.73 l.". So far, I have tried the following with Beautifulsoup:

for especifica in subsoup.find('div',{'class':'card-body'}).find_all(text=True, recursive=False):    esp = especifica.replace('\xa0l','')    descripcion_especifica.append(esp.strip())    

The output obtained for the key 'Descripcion especifica' within a dictionary is:

'Descripcion especifica': ['', '0 g de grasas trans. Sin colesterol. 4.73.']  

This is as close as I have come to obtaining the actual text. However, whenever I try to get rid of the first element of the list or merge it with the second one, I get further errors. Does anyone know how to scrape this text?

https://stackoverflow.com/questions/66523475/how-to-extract-html-text-which-has-no-tags-using-beautifulsoup March 08, 2021 at 10:06AM

没有评论:

发表评论