Python BeautifulSoup: Extract all the URLs from the webpage python.org that are nested within <li> tags from
Write a Python program to extract all the URLs from the webpage python.org that are nested within <li> tags from.
Sample Solution:
Python Code:
import requests
from bs4 import BeautifulSoup
url = 'https://www.python.org/'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, 'lxml')
urls = []
for h in soup.find_all('li'):
a = h.find('a')
urls.append(a.attrs['href'])
print(urls)
Sample Output:
['/', '/psf-landing/', 'https://docs.python.org', 'https://pypi.python.org/', '/jobs/', '/community/', '#', 'javascript:;', 'javascript:;', 'javascript:;', '#', 'https://www.facebook.com/pythonlang?fref=ts', 'https://twitter.com/ThePSF', '/community/irc/', '/about/', '/about/apps/', '/about/quotes/', '/about/gettingstarted/', '/about/help/', 'http://brochure.getpython.info/', '/downloads/', '/downloads/', '/downloads/source/', '/downloads/windows/', '/downloads/mac-osx/', '/download/other/', 'https://docs.python.org/3/license.html', '/download/alternatives', '/doc/', '/doc/', '/doc/av', 'https://wiki.python.org/moin/BeginnersGuide', 'https://devguide.python.org/', 'https://docs.python.org/faq/', 'http://wiki.python.org/moin/Languages', 'http://python.org/dev/peps/', 'https://wiki.python.org/moin/PythonBooks', '/doc/essays/', '/community/', '/community/survey', '/community/diversity/', '/community/lists/', '/community/irc/', '/community/forums/', '/community/workshops/', '/community/sigs/', '/community/logos/', 'https://wiki.python.org/moin/', '/community/merchandise/', '/community/awards', 'https://www.python.org/psf/codeofconduct/', '/success-stories/', '/success-stories/category/arts/', '/success-stories/category/business/', '/success-stories/category/education/', '/success-stories/category/engineering/', '/success-stories/category/government/', '/success-stories/category/scientific/', '/success-stories/category/software-development/', '/blogs/', '/blogs/', 'http://planetpython.org/', 'http://pyfound.blogspot.com/', 'http://pycon.blogspot.com/', '/events/', '/events/python-events', '/events/python-user-group/', '/events/python-events/past/', '/events/python-user-group/past/', 'https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event', '/shell/', '//docs.python.org/3/tutorial/controlflow.html#defining-functions', '//docs.python.org/3/tutorial/introduction.html#lists', 'http://docs.python.org/3/tutorial/introduction.html#using-python-as-a-calculator', '//docs.python.org/3/tutorial/', '//docs.python.org/3/tutorial/controlflow.html', 'http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/NXMcoIchkxY/2018-in-review.html', 'http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/t_DSEH1vASY/python-core-developer-mentorship.html', 'http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/v7pD576k9iA/mariatta-wijaya-lets-use-github-issues.html', 'http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/mnSfdQZDRUM/petr-viktorin-extension-modules-and.html', 'http://feedproxy.google.com/~r/PythonSoftwareFoundationNews/~3/-JcoXQeMgsQ/scott-shawcroft-history-of-circuitpython.html', '/events/python-events/809/', '/events/python-user-group/848/', '/events/python-user-group/838/', '/events/python-events/827/', '/events/python-events/826/', 'http://www.djangoproject.com/', 'http://wiki.python.org/moin/TkInter', 'http://www.scipy.org', 'http://buildbot.net/', 'http://www.ansible.com', '/about/', '/about/apps/', '/about/quotes/', '/about/gettingstarted/', '/about/help/', 'http://brochure.getpython.info/', '/downloads/', '/downloads/', '/downloads/source/', '/downloads/windows/', '/downloads/mac-osx/', '/download/other/', 'https://docs.python.org/3/license.html', '/download/alternatives', '/doc/', '/doc/', '/doc/av', 'https://wiki.python.org/moin/BeginnersGuide', 'https://devguide.python.org/', 'https://docs.python.org/faq/', 'http://wiki.python.org/moin/Languages', 'http://python.org/dev/peps/', 'https://wiki.python.org/moin/PythonBooks', '/doc/essays/', '/community/', '/community/survey', '/community/diversity/', '/community/lists/', '/community/irc/', '/community/forums/', '/community/workshops/', '/community/sigs/', '/community/logos/', 'https://wiki.python.org/moin/', '/community/merchandise/', '/community/awards', 'https://www.python.org/psf/codeofconduct/', '/success-stories/', '/success-stories/category/arts/', '/success-stories/category/business/', '/success-stories/category/education/', '/success-stories/category/engineering/', '/success-stories/category/government/', '/success-stories/category/scientific/', '/success-stories/category/software-development/', '/blogs/', '/blogs/', 'http://planetpython.org/', 'http://pyfound.blogspot.com/', 'http://pycon.blogspot.com/', '/events/', '/events/python-events', '/events/python-user-group/', '/events/python-events/past/', '/events/python-user-group/past/', 'https://wiki.python.org/moin/PythonEventsCalendar#Submitting_an_Event', '/dev/', 'https://devguide.python.org/', 'https://bugs.python.org/', 'https://mail.python.org/mailman/listinfo/python-dev', '/dev/core-mentorship/', '/news/security/', '/about/help/', '/community/diversity/', 'https://github.com/python/pythondotorg/issues', 'https://status.python.org/']
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Python program to find the text of the first <a> tag of a given html text.
Next: Write a Python program to find all the h2 tags and list the first four from the webpage python.org.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics