2021年3月6日星期六

Bash regex to grab subdomain from list of urls

I have a file which contains list of URLs and I want to grab the subdomains from them.

List of URLs are:

https://www.google.com [match www]  https://www.something.random-name.domain.com [match www, something, and random-name]  https://facebook.com [don't match anything]  http://test.prod-op.bpo.yahoo.com [match test, prod-op and bpo]  

I've been using the "sed" command to ditch https and http prefix and then using "awk "command to get the subdomains but the problem is I can only match the first subdomain for example: https://www.something.random-name.domain.com

In the above example my approach would only match "www" But I want it to match "www" along with "something" and "random-name".

Input would be:

https://www.google.com   https://www.something.random-name.domain.com   https://facebook.com  http://test.prod-op.bpo.yahoo.com   

Output would be:

www  www something random-name  null  test prod-op bpo  

Kindly, explain me what shall be done so that I could match and extract the subdomains.

Thank you!

https://stackoverflow.com/questions/66513807/bash-regex-to-grab-subdomain-from-list-of-urls March 07, 2021 at 02:23PM

没有评论:

发表评论