w3resource

Python Web Scraping: Extract and display all the header tags from en.wikipedia.org/wiki/Main_Page

Python Web Scraping: Exercise-7 with Solution

Write a Python program to extract and display all the header tags from en.wikipedia.org/wiki/Main_Page.

Sample Solution:

Python Code:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://en.wikipedia.org/wiki/Main_Page')
bs = BeautifulSoup(html, "html.parser")
titles = bs.find_all(['h1', 'h2','h3','h4','h5','h6'])
print('List all the header tags :', *titles, sep='\n\n')

Sample Output:

List all the header tags :

<h1 class="firstHeading" id="firstHeading" lang="en">Main Page</h1>

<h2 id="mp-tfa-h2" style="margin:0.5em; background:#cef2e0; font-family:inherit; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; color:#000; padding:0.2em 0.4em;"><span id="From_today.27s_featured_article"></span><span class="mw-headline" id="From_today's_featured_article">From today's featured article</span></h2>

<h2 id="mp-dyk-h2" style="clear:both; margin:0.5em; background:#cef2e0; font-family:inherit; font-size:120%; font-weight:bold; border:1px solid #a3bfb1; color:#000; padding:0.2em 0.4em;"><span class="mw-headline" id="Did_you_know...">Did you know...</span></h2>

<h2 id="mp-itn-h2" style="margin:0.5em; background:#cedff2; font-family:inherit; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; color:#000; padding:0.2em 0.4em;"><span class="mw-headline" id="In_the_news">In the news</span></h2>

<h2 id="mp-otd-h2" style="clear:both; margin:0.5em; background:#cedff2; font-family:inherit; font-size:120%; font-weight:bold; border:1px solid #a3b0bf; color:#000; padding:0.2em 0.4em;"><span class="mw-headline" id="On_this_day">On this day</span></h2>

<h2 id="mp-tfl-h2" style="margin:0.5em; background:#f2cedd; font-family:inherit; font-size:120%; font-weight:bold; border:1px solid #bfa3af; color:#000; padding:0.2em 0.4em"><span id="From_today.27s_featured_list"></span><span class="mw-headline" id="From_today's_featured_list">From today's featured list</span></h2>

<h2 id="mp-tfp-h2" style="margin:0.5em; background:#ddcef2; font-family:inherit; font-size:120%; font-weight:bold; border:1px solid #afa3bf; color:#000; padding:0.2em 0.4em"><span id="Today.27s_featured_picture"></span><span class="mw-headline" id="Today's_featured_picture">Today's featured picture</span></h2>

<h2 id="mp-other" style="margin:0.5em; background:#eeeeee; border:1px solid #ddd; color:#222; padding:0.2em 0.4em; font-size:120%; font-weight:bold; font-family:inherit;"><span class="mw-headline" id="Other_areas_of_Wikipedia">Other areas of Wikipedia</span></h2>

<h2 id="mp-sister" style="margin:0.5em; background:#eeeeee; border:1px solid #ddd; color:#222; padding:0.2em 0.4em; font-size:120%; font-weight:bold; font-family:inherit;"><span id="Wikipedia.27s_sister_projects"></span><span class="mw-headline" id="Wikipedia's_sister_projects">Wikipedia's sister projects</span></h2>

<h2 id="mp-lang" style="margin:0.5em; background:#efefef; border:1px solid #ddd; color:#222; padding:0.2em 0.4em; font-size:120%; font-weight:bold; font-family:inherit;"><span class="mw-headline" id="Wikipedia_languages">Wikipedia languages</span></h2>

<h2>Navigation menu</h2>

<h3 id="p-personal-label">Personal tools</h3>

<h3 id="p-namespaces-label">Namespaces</h3>

<h3 id="p-variants-label">
<span>Variants</span>
</h3>

<h3 id="p-views-label">Views</h3>

<h3 id="p-cactions-label"><span>More</span></h3>

<h3>
<label for="searchInput">Search</label>
</h3>

<h3 id="p-navigation-label">Navigation</h3>

<h3 id="p-interaction-label">Interaction</h3>

<h3 id="p-tb-label">Tools</h3>

<h3 id="p-coll-print_export-label">Print/export</h3>

<h3 id="p-wikibase-otherprojects-label">In other projects</h3>

<h3 id="p-lang-label">Languages</h3>
 

Flowchart:

Python Web Scraping Flowchart: Extract and display all the header tags from en.wikipedia.org/wiki/Main_Page

Python Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Python program to extract h1 tag from example.com.
Next: Write a Python program to extract and display all the image links from en.wikipedia.org/wiki/Peter_Jeffrey_(RAAF_officer).

What is the difficulty level of this exercise?



Become a Patron!

Follow us on Facebook and Twitter for latest update.

It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.

https://198.211.115.131/python-exercises/web-scraping/web-scraping-exercise-7.php