2021年1月21日星期四

Capturing info from console using Python

I'm creating a script where I'm trying to rip m4a files from a website specifically. I'm using BS4 and selenium for this purpose presently.

I'm having some trouble getting the info. The file link is not located in the HTML source for the page. Instead, I can only find it in the console. The link I'm trying to get is here in this image (https://imgur.com/a/DLwcE0p) labeled "audio_url_m4a:".

Here's some sample code I'm using:

from selenium import webdriver  from selenium.webdriver.common.desired_capabilities import DesiredCapabilities\    d = DesiredCapabilities.CHROME  d['loggingPrefs'] = {'browser':'ALL ' }  driver = webdriver.Chrome(r'chromedriver path', desired_capabilities = d)    ~~lots of code doing other things not relevant to the post~~    for URL in audm_URL: #this is referencing a line of code where I construct a list of URLs              driver.get(audm)              time.sleep(3)                for entry in driver.get_log('browser'):                  print(entry)  

Here is the output I get:

  {'level': 'SEVERE', 'message': 'https://audm.herokuapp.com/favicon.ico - Failed to load resource: the server responded with a status of 404 (Not Found)', 'source': 'network', 'timestamp': 1611291689357}  {'level': 'SEVERE', 'message': 'https://cdn.segment.com/analytics.js/v1/5DOhLj2nIgYtQeSfn9YF5gpAiPqRtWSc/analytics.min.js - Failed to load resource: net::ERR_NAME_NOT_RESOLVED', 'source': 'network', 'timestamp': 1611291689357}    

Most questions relating to grabbing things from the console point me towards grabbing the logs, but nothing that seems to let me know how to grab those other variables. Any ideas?

Here's a link to a random audio page that I want to grab the file from: https://audm.herokuapp.com/player-embed?pub=newyorker&articleID=5fe0b9b09fabedf20ec1f70c

Thanks everyone!

https://stackoverflow.com/questions/65839595/capturing-info-from-console-using-python January 22, 2021 at 01:05PM

没有评论:

发表评论