2021年1月23日星期六

Using BeautifulSoup to submit button/expand aria-expandable

Hi I am trying to expand the button on this page to capture the remaining attributes using beautiful soup. This is the html for the button I am trying to press:

<button class="button__373c0__3lYgT secondary__373c0__1bsQo" aria-expanded="false" aria-controls="expander-link-content-cf6b4b45-8720-4627-96f8-397a766b8ddb" type="submit" value="submit" style="--mousedown-x:30px; --mousedown-y:27.625px; --button-width:113.422px;"><div class=" button-content__373c0__1QNtB border-color--default__373c0__3-ifU"><span class=" text__373c0__2Kxyz button-content-text__373c0__Z-7FO text-color--inherit__373c0__1lczC text-align--center__373c0__3VrfZ text-weight--semibold__373c0__2l0fe text-size--large__373c0__3t60B text--truncated__373c0__3sLaf"><p class=" text__373c0__2Kxyz text-color--normal__373c0__3xep9 text-align--left__373c0__2XGa- text-weight--semibold__373c0__2l0fe text-size--large__373c0__3t60B">15 More Attributes</p></span></div></button>  

and this is what I have so far:

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}  url = 'https://www.yelp.com/biz/lazi-cow-davis?osq=lazi+cow'  response = requests.get(url, headers=headers)    sub_response = requests.get(sub_url, headers=headers)  sub_soup = BeautifulSoup(sub_response.content, 'lxml')    for item in sub_soup.select('section'):      if item.find('h4'):         name = item.find('h4').get_text()         if name == "Amenities and More":            tests = item.find_all('span')            for span in tests:                print(span.get_text())  

I understand that you can use the Yelp API to scrape, but I need to do this with +1000 different yelp sites, so I was wondering if there was a workaround, as my current approach works (I will add proxies later), just not for all the attributes

https://stackoverflow.com/questions/65866489/using-beautifulsoup-to-submit-button-expand-aria-expandable January 24, 2021 at 10:01AM

没有评论:

发表评论