Based on the answered code from this link, I'm able to create a new column: df['url'] = 'https://www.cspea.com.cn/list/c01/' + df['projectCode']
.
Next step I would like to pass the url
column's values to the following code and append all the scrapied contents as dataframe.
import urllib3 import requests from bs4 import BeautifulSoup import pandas as pd url = "https://www.cspea.com.cn/list/c01/gr2021bj1000186" # url column's values should be passed here one by one soup = BeautifulSoup(requests.get(url, verify=False).content, "html.parser") index, data = [], [] for th in soup.select(".project-detail-left th"): h = th.get_text(strip=True) t = th.find_next("td").get_text(strip=True) index.append(h) data.append(t) df = pd.DataFrame(data, index=index, columns=["value"]) print(df)
How could I do that in Python? Thanks.
Updated:
import requests from bs4 import BeautifulSoup import pandas as pd df = pd.read_excel('items_scraped.xlsx') data = [] urls = df.url.tolist() for url_link in urls: url = url_link # url = "https://www.cspea.com.cn/list/c01/gr2021bj1000186" soup = BeautifulSoup(requests.get(url, verify=False).content, "html.parser") index, data = [], [] for th in soup.select(".project-detail-left th"): h = th.get_text(strip=True) t = th.find_next("td").get_text(strip=True) index.append(h) data.append(t) df = pd.DataFrame(data, index=index, columns=["value"]) df = df.T df.reset_index(drop=True, inplace=True) print(df) df.to_excel('result.xlsx', index = False)
But it only saved one rows into excel file.
https://stackoverflow.com/questions/66890578/pass-url-columns-values-one-by-one-to-web-crawler-code-in-python March 31, 2021 at 11:15PM
没有评论:
发表评论