2021年1月20日星期三

Extracting list of urls from url using BeautifulSoup

I would like to extract information about website similarity from this link:

https://www.alexa.com/siteinfo/amazon.com

I am looking at class='site', trying to extract information from

<a href="/siteinfo/ebay.com" class="truncation">ebay.com</a>  

but I can see only one value. Could it be possible to extract all the 4 values and related overlap score?

What I am trying to achieve is a table which includes this information

W                      amazon.com                eBay.com                   70.1  pinterest.com              54.7  wikipedia.org              51.3  facebook.com               50.4  

I have tried

from bs4 import BeautifulSoup    soup = BeautifulSoup(data, "html.parser")  print([item.get_text(strip=True) for item in soup.select("span.site")])   

but this seems to be enough for getting information because of some wrong parameters in the code.

https://stackoverflow.com/questions/65820235/extracting-list-of-urls-from-url-using-beautifulsoup January 21, 2021 at 10:07AM

没有评论:

发表评论