w3resource

Python Web Scraping: Retrieves an arbitary Wikipedia page of "Python" and creates a list of links on that page


Write a Python program to that retrieves an arbitary Wikipedia page of "Python" and creates a list of links on that page.

Sample Solution:

Python Code:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("https://en.wikipedia.org/wiki/Python")
bsObj = BeautifulSoup(html)
for link in bsObj.findAll("a"):
  if 'href' in link.attrs:
    print(link.attrs['href'])

Sample Output:

#mw-head
#p-search
https://en.wiktionary.org/wiki/Python
https://en.wiktionary.org/wiki/python
#Snakes
#Ancient_Greece
#Media_and_entertainment
#Computing
#Engineering
#Roller_coasters
#Vehicles
#Weaponry
#See_also
/w/index.php?title=Python&action=edit§ion=1
/wiki/Pythonidae
/wiki/Python_(genus)
/w/index.php?title=Python&action=edit§ion=2
/wiki/Python_(mythology)
/wiki/Python_of_Aenus
/wiki/Python_(painter)
/wiki/Python_of_Byzantium
/wiki/Python_of_Catana
/w/index.php?title=Python&action=edit§ion=3
/wiki/Python_(film)
/wiki/Pythons_2
/wiki/Monty_Python
/wiki/Python_(Monty)_Pictures
/w/index.php?title=Python&action=edit§ion=4
/wiki/Python_(programming_language)
/wiki/CPython
/wiki/CMU_Common_Lisp
/wiki/PERQ#PERQ_3
/w/index.php?title=Python&action=edit§ion=5
/w/index.php?title=Python&action=edit§ion=6
/wiki/Python_(Busch_Gardens_Tampa_Bay)
/wiki/Python_(Coney_Island,_Cincinnati,_Ohio)
/wiki/Python_(Efteling)
/w/index.php?title=Python&action=edit§ion=7
/wiki/Python_(automobile_maker)
/wiki/Python_(Ford_prototype)
/w/index.php?title=Python&action=edit§ion=8
/wiki/Colt_Python
/wiki/Python_(missile)
/w/index.php?title=Python&action=edit§ion=9
/wiki/Cython
/wiki/Pyton
/wiki/File:Disambig_gray.svg
/wiki/Help:Disambiguation
//en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Python&namespace=0
https://en.wikipedia.org/w/index.php?title=Python&oldid=845762125
/wiki/Help:Category
/wiki/Category:Disambiguation_pages
/wiki/Category:Disambiguation_pages_with_short_description
/wiki/Category:All_article_disambiguation_pages
/wiki/Category:All_disambiguation_pages
/wiki/Category:Animal_common_name_disambiguation_pages
/wiki/Special:MyTalk
/wiki/Special:MyContributions
/w/index.php?title=Special:CreateAccount&returnto=Python
/w/index.php?title=Special:UserLogin&returnto=Python
/wiki/Python
/wiki/Talk:Python
/wiki/Python
/w/index.php?title=Python&action=edit
/w/index.php?title=Python&action=history
/wiki/Main_Page
/wiki/Main_Page
/wiki/Portal:Contents
/wiki/Portal:Featured_content
/wiki/Portal:Current_events
/wiki/Special:Random
https://donate.wikimedia.org/wiki/Special:FundraiserRedirector?utm_source=donate&utm_medium=sidebar&utm_campaign=C13_en.wikipedia.org&uselang=en
//shop.wikimedia.org
/wiki/Help:Contents
/wiki/Wikipedia:About
/wiki/Wikipedia:Community_portal
/wiki/Special:RecentChanges
//en.wikipedia.org/wiki/Wikipedia:Contact_us
/wiki/Special:WhatLinksHere/Python
/wiki/Special:RecentChangesLinked/Python
/wiki/Wikipedia:File_Upload_Wizard
/wiki/Special:SpecialPages
/w/index.php?title=Python&oldid=845762125
/w/index.php?title=Python&action=info
https://www.wikidata.org/wiki/Special:EntityPage/Q747452
/w/index.php?title=Special:CiteThisPage&page=Python&id=845762125
/w/index.php?title=Special:Book&bookcmd=book_creator&referer=Python
/w/index.php?title=Special:ElectronPdf&page=Python&action=show-download-screen
/w/index.php?title=Python&printable=yes
https://commons.wikimedia.org/wiki/Category:Python
https://af.wikipedia.org/wiki/Python
https://als.wikipedia.org/wiki/Python
https://bn.wikipedia.org/wiki/%E0%A6%AA%E0%A6%BE%E0%A6%87%E0%A6%A5%E0%A6%A8_(%E0%A6%A6%E0%A7%8D%E0%A6%AC%E0%A7%8D%E0%A6%AF%E0%A6%B0%E0%A7%8D%E0%A6%A5%E0%A6%A4%E0%A6%BE_%E0%A6%A8%E0%A6%BF%E0%A6%B0%E0%A6%B8%E0%A6%A8)
https://be.wikipedia.org/wiki/Python
https://bg.wikipedia.org/wiki/%D0%9F%D0%B8%D1%82%D0%BE%D0%BD_(%D0%BF%D0%BE%D1%8F%D1%81%D0%BD%D0%B5%D0%BD%D0%B8%D0%B5)
https://cs.wikipedia.org/wiki/Python_(rozcestn%C3%ADk)
https://da.wikipedia.org/wiki/Python
https://de.wikipedia.org/wiki/Python
https://eo.wikipedia.org/wiki/Pitono_(apartigilo)
https://eu.wikipedia.org/wiki/Python_(argipena)
https://fa.wikipedia.org/wiki/%D9%BE%D8%A7%DB%8C%D8%AA%D9%88%D9%86
https://fr.wikipedia.org/wiki/Python
https://ko.wikipedia.org/wiki/%ED%8C%8C%EC%9D%B4%EC%84%A0
https://hr.wikipedia.org/wiki/Python_(razdvojba)
https://io.wikipedia.org/wiki/Pitono
https://id.wikipedia.org/wiki/Python
https://ia.wikipedia.org/wiki/Python_(disambiguation)
https://is.wikipedia.org/wiki/Python
https://it.wikipedia.org/wiki/Python_(disambigua)
https://he.wikipedia.org/wiki/%D7%A4%D7%99%D7%AA%D7%95%D7%9F
https://ka.wikipedia.org/wiki/%E1%83%9E%E1%83%98%E1%83%97%E1%83%9D%E1%83%9C%E1%83%98_(%E1%83%9B%E1%83%A0%E1%83%90%E1%83%95%E1%83%90%E1%83%9A%E1%83%9B%E1%83%9C%E1%83%98%E1%83%A8%E1%83%95%E1%83%9C%E1%83%94%E1%83%9A%E1%83%9D%E1%83%95%E1%83%90%E1%83%9C%E1%83%98)
https://kg.wikipedia.org/wiki/Mboma_(nyoka)
https://la.wikipedia.org/wiki/Python_(discretiva)
https://lb.wikipedia.org/wiki/Python
https://hu.wikipedia.org/wiki/Python_(egy%C3%A9rtelm%C5%B1s%C3%ADt%C5%91_lap)
https://mr.wikipedia.org/wiki/%E0%A4%AA%E0%A4%BE%E0%A4%AF%E0%A4%A5%E0%A5%89%E0%A4%A8_(%E0%A4%86%E0%A4%9C%E0%A5%8D%E0%A4%9E%E0%A4%BE%E0%A4%B5%E0%A4%B2%E0%A5%80_%E0%A4%AD%E0%A4%BE%E0%A4%B7%E0%A4%BE)
https://nl.wikipedia.org/wiki/Python
https://ja.wikipedia.org/wiki/%E3%83%91%E3%82%A4%E3%82%BD%E3%83%B3
https://no.wikipedia.org/wiki/Pyton
https://pl.wikipedia.org/wiki/Pyton
https://pt.wikipedia.org/wiki/Python_(desambigua%C3%A7%C3%A3o)
https://ru.wikipedia.org/wiki/Python_(%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8%D1%8F)
https://sd.wikipedia.org/wiki/%D8%A7%D8%B1%DA%99
https://sk.wikipedia.org/wiki/Python
https://sh.wikipedia.org/wiki/Python
https://fi.wikipedia.org/wiki/Python
https://sv.wikipedia.org/wiki/Pyton
https://th.wikipedia.org/wiki/%E0%B9%84%E0%B8%9E%E0%B8%97%E0%B8%AD%E0%B8%99
https://tr.wikipedia.org/wiki/Python
https://uk.wikipedia.org/wiki/%D0%9F%D1%96%D1%84%D0%BE%D0%BD
https://ur.wikipedia.org/wiki/%D9%BE%D8%A7%D8%A6%DB%8C%D8%AA%DA%BE%D9%88%D9%86
https://vi.wikipedia.org/wiki/Python
https://zh.wikipedia.org/wiki/Python_(%E6%B6%88%E6%AD%A7%E4%B9%89)
https://www.wikidata.org/wiki/Special:EntityPage/Q747452#sitelinks-wikipedia
//en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License
//creativecommons.org/licenses/by-sa/3.0/
//wikimediafoundation.org/wiki/Terms_of_Use
//wikimediafoundation.org/wiki/Privacy_policy
//www.wikimediafoundation.org/
https://wikimediafoundation.org/wiki/Privacy_policy
/wiki/Wikipedia:About
/wiki/Wikipedia:General_disclaimer
//en.wikipedia.org/wiki/Wikipedia:Contact_us
https://www.mediawiki.org/wiki/Special:MyLanguage/How_to_contribute
https://wikimediafoundation.org/wiki/Cookie_statement
//en.m.wikipedia.org/w/index.php?title=Python&mobileaction=toggle_view_mobile
https://wikimediafoundation.org/
//www.mediawiki.org/
/usr/local/lib/python3.6/dist-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 4 of the file /tmp/sessions/0f56b56f1170593f/main.py. To get rid of this warning, change code that looks like this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")
 

Flowchart:

Python Web Scraping Flowchart: Retrieves an arbitary Wikipedia page of 'Python' and creates a list of links on that page

Python Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Python program to extract and display all the image links from en.wikipedia.org/wiki/Peter_Jeffrey_(RAAF_officer)
Next: Write a Python program to check whether a page contains a title or not.

What is the difficulty level of this exercise?



Follow us on Facebook and Twitter for latest update.