I have a column of URLs and would like to retrieve the digits after the "/show" but before the next "/" and would like these digits to be in the form of integer
sn URL 1 https://tvseries.net/show/51/johnny155 2 https://tvseries.net/show/213/kimble2 3 https://tvseries.net/show/46/forceps 4 https://tvseries.net/show/90/tr9 5 https://tvseries.net/show/22/candlenut
expected output is
sn URL show_id 1 https://tvseries.net/show/51/johnny155 51 2 https://tvseries.net/show/213/kimble2 213 3 https://tvseries.net/show/46/forceps 46 4 https://tvseries.net/show/90/tr9 90 5 https://tvseries.net/show/22/candlenut 22
Currently, i've tried the following code to retrieve the digits after "show" and it is able to produce a column where the show_id is in brackets (i.e., [51], [213]) and its type is pandas.core.series.Series.
Is there a more efficient way to get the show_id in integer form without the brackets? Appreciate any form of help, thank you
import urllib.parse as urlparse df['protocol'],df['domain'],df['path'], df['query'], df['fragment'] = zip(*df['URL'].map(urlparse.urlsplit)) df['UID'] = df['path'].str.findall(r'(?<=show)[^,.\d\n]+?(\d+)')
https://stackoverflow.com/questions/66978572/extract-part-of-url-from-column-of-urls-in-python April 07, 2021 at 09:53AM
没有评论:
发表评论