2021年1月1日星期五

infinite scroll down using selenium alwais fail because of automatic reload of the page

I want to scroll down this page https://www.newsnow.com/us/World?type=ln&d=1609455600 by clicking on the button "view more headlines" so I can scrape headlines of previous days. But the page on the driver reloads automatically after some loops (some clicks on view more headlines) and returns to the initial position. This is the code :

url = 'https://www.newsnow.com/us/World?type=ln&d=1609455600'    options = Options()  options.add_argument('--no-sandbox')  options.add_argument('--ignore-certificate-errors')    driver = webdriver.Chrome(executable_path=r"C:/chromedriver.exe", options=options)    driver.get(url)  time.sleep(10)  # driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")  for i in range(3000):      try:          elem =WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CLASS_NAME,'btn--primary__label')))          driver.execute_script("arguments[0].scrollIntoView();", elem)          elem.click()          print(f'click {i} done')          time.sleep(5)      except:          print('end of the scrolling down')          break  soup = BeautifulSoup(driver.page_source, 'html.parser')  # ...  # working with the sope   
https://stackoverflow.com/questions/65532282/infinite-scroll-down-using-selenium-alwais-fail-because-of-automatic-reload-of-t January 02, 2021 at 02:02AM

没有评论:

发表评论