2021年4月29日星期四

Unable to scrape the title of the main book along with the books viewed by customers from a webpage

I've been trying to scrape the title of the book located in the landing page along with the titles of the books of customers's choice from a webpage. To get the titles of all the books, it is necessary to keep clicking on the right arrow button as you see in the image above.

I've tried with:

from selenium import webdriver  from selenium.webdriver.common.by import By  from selenium.webdriver.support.ui import WebDriverWait  from selenium.webdriver.support import expected_conditions as EC    links = [      "https://www.amazon.com/Keto-Meal-Prep-Cookbook-Beginners/dp/1673455980/",      "https://www.amazon.com/Keto-Diet-Cookbook-Beginners-Recipes/dp/1792145454/"  ]    def fetch_content(link):      driver.get(link)      title = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,'h1#title > span#productTitle'))).text      page_count = wait.until(EC.presence_of_element_located((By.XPATH,'//*[contains(@class,"a-carousel-header-row")][.//h2[contains(@class,"a-carousel-heading")][contains(.,"Customers who")]]//span[@class="a-carousel-page-max"]'))).text        title_list = []      for i in range(int(page_count)+1):          wait.until(EC.presence_of_element_located((By.XPATH,'//*[contains(@class,"a-carousel-header-row")][.//h2[contains(@class,"a-carousel-heading")][contains(.,"Customers who")]]/following-sibling::*[contains(@class,"a-carousel-row")]//a[contains(@class,"a-carousel-goto-nextpage")]'))).click()          for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"li.a-carousel-card > a.a-link-normal > div[data-rows]"))):              title_list.append(item.text)      return title,title_list    if __name__ == '__main__':      with webdriver.Chrome() as driver:          wait = WebDriverWait(driver,15)          for link in links:              print(fetch_content(link))  

When I execute the above script, I could notice that (if I scroll down manually a bit while the script is running) it grabs the first two titles from Customers who viewed container and then throws stale element reference error pointing at title_list.append(item.text).

How can I scrape the title of the main book along with the books viewed by customers from a webpage?

https://stackoverflow.com/questions/67324375/unable-to-scrape-the-title-of-the-main-book-along-with-the-books-viewed-by-custo April 30, 2021 at 04:06AM

没有评论:

发表评论