2021年4月23日星期五

How can we scrape minified HTML with BeautifulSoup and Python?

I have a HTML table like below:

<table class="pull-right table table-bordered table-hover table-striped" id="cost-comparison-table">      <tr>          <td> ABC          <td>USD 17000      <tr>          <td> DEF          <td>USD 4000      <tr>          <td> GHI          <td>USD 5000      <tr>          <td> JKL          <td>USD 18000      <tr>          <td> MNO          <td>USD 19000      <tr>          <td> PQR          <td>USD 10500          </td>          </td>      </tr>      </td>      </td>      </tr>      </td>      </td>      </tr>      </td>      </td>      </tr>      </td>      </td>      </tr>      </td>      </td>      </tr>  </table>  

Is there any way to scrape the HTML formatted in this way? Actually, this is minified version of the HTML. To be noted - in HTML5 closing tags like li, tr, td, br, img is not mandatory.

I need to create a dictionary from the table contents, my code so far:

tds = [row.findAll('td') for row in soup.findAll('tr')]  results = { td[0].string: td[1].string for td in tds }  
https://stackoverflow.com/questions/67213358/how-can-we-scrape-minified-html-with-beautifulsoup-and-python April 22, 2021 at 08:31PM

没有评论:

发表评论