2021年4月6日星期二

Extract part of URL from column of URLs in python

I have a column of URLs and would like to retrieve the digits after the "/show" but before the next "/" and would like these digits to be in the form of integer

sn    URL  1     https://tvseries.net/show/51/johnny155  2     https://tvseries.net/show/213/kimble2  3     https://tvseries.net/show/46/forceps  4     https://tvseries.net/show/90/tr9  5     https://tvseries.net/show/22/candlenut  

expected output is

sn    URL                                          show_id  1     https://tvseries.net/show/51/johnny155       51  2     https://tvseries.net/show/213/kimble2        213  3     https://tvseries.net/show/46/forceps         46   4     https://tvseries.net/show/90/tr9             90  5     https://tvseries.net/show/22/candlenut       22  

Currently, i've tried the following code to retrieve the digits after "show" and it is able to produce a column where the show_id is in brackets (i.e., [51], [213]) and its type is pandas.core.series.Series.

Is there a more efficient way to get the show_id in integer form without the brackets? Appreciate any form of help, thank you

import urllib.parse as urlparse    df['protocol'],df['domain'],df['path'], df['query'], df['fragment'] = zip(*df['URL'].map(urlparse.urlsplit))    df['UID'] = df['path'].str.findall(r'(?<=show)[^,.\d\n]+?(\d+)')    
https://stackoverflow.com/questions/66978572/extract-part-of-url-from-column-of-urls-in-python April 07, 2021 at 09:53AM

没有评论:

发表评论