Python Web Scraping - Exercises, Practice, Solution
Web Scraping
Web scraping or web data extraction is data scraping used for extracting data from websites. Web scraping softwares are used to access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.
Python request module :
Requests allows user to send organic, grass-fed HTTP/1.1 requests, without the need for manual labor. There’s no need to manually add query strings to your URLs, or to form-encode your POST data.
Python Web Scraping [27 exercises with solution]
[An editor is available at the bottom of the page to write and execute the scripts. Go to the editor]
1. Write a Python program to test if a given page is found or not on the server.
Click me to see the sample solution
2. Write a Python program to download and display the content of robot.txt for en.wikipedia.org.
Click me to see the sample solution
3. Write a Python program to get the number of datasets currently listed on data.gov.
Click me to see the sample solution
4. Write a Python program to convert an address (like "1600 Amphitheatre Parkway, Mountain View, CA") into geographic coordinates (like latitude 37.423021 and longitude -122.083739).
Geocodin: Geocoding is the process of converting addresses (like "1600 Amphitheatre Parkway, Mountain View, CA") into geographic coordinates (like latitude 37.423021 and longitude -122.083739), which you can use to place markers on a map, or position the map.
Click me to see the sample solution
5. Write a Python program to display the name of the most recently added dataset on data.gov.
Click me to see the sample solution
6. Write a Python program to extract h1 tag from example.com.
Click me to see the sample solution
7. Write a Python program to extract and display all the header tags from en.wikipedia.org/wiki/Main_Page.
Click me to see the sample solution
8. Write a Python program to extract and display all the image links from en.wikipedia.org/wiki/Peter_Jeffrey_(RAAF_officer).
Click me to see the sample solution
9. Write a Python program to get 90 days of visits broken down by browser for all sites on data.gov.
Click me to see the sample solution
10. Write a Python program to that retrieves an arbitary Wikipedia page of "Python" and creates a list of links on that page.
Click me to see the sample solution
11. Write a Python program to check whether a page contains a title or not.
Click me to see the sample solution
12. Write a Python program to list all language names and number of related articles in the order they appear in wikipedia.org.
Click me to see the sample solution
13. Write a Python program to get the number of people visiting a U.S. government website right now.
Source: https://analytics.usa.gov/data/live/realtime.json
Click me to see the sample solution
14. Write a Python program get the number of security alerts issued by US-CERT in the current year.
Source: https://www.us-cert.gov/ncas/alerts
Click me to see the sample solution
15. Write a Python program to get the number of Pinterest accounts maintained by U.S. State Department embassies and missions.
Source: https://www.state.gov/r/pa/ode/socialmedia/
Click me to see the sample solution
16. Write a Python program to get the number of followers of a given twitter account.
Click me to see the sample solution
17. Write a Python program to get the number of following on a Twitter account.
Click me to see the sample solution
18. Write a Python program to get the number of post on Twitter liked by a given account.
Click me to see the sample solution
19. Write a Python program to count number of tweets by a given Twitter account.
Click me to see the sample solution
20. Write a Python program to scrap number of tweets of a given Twitter account.
Click me to see the sample solution
21. Write a Python program to find the live weather report (temperature, wind speed, description and weather) of a given city.
Click me to see the sample solution
22. Write a Python program to display the date, days, title, city, country of next 25 Hackevents.
Click me to see the sample solution
23. Write a Python program to download IMDB's Top 250 data (movie name, Initial release, director name and stars).
Click me to see the sample solution
24. Write a Python program to get movie name, year and a brief summary of the top 10 random movies.
Click me to see the sample solution
25. Write a Python program to get the number of magnitude 4.5+ earthquakes detected worldwide by the USGS.
Click me to see the sample solution
26. Write a Python program to display the contains of different attributes like different attributes like status_code, headers, url, history, encoding, reason, cookies, elapsed, request and content of a specified resource.
Click me to see the sample solution
27. Write a Python program to verifiy SSL certificates for HTTPS requests using requests module.
Note: Requests verifies SSL certificates for HTTPS requests, just like a web browser. By default, SSL verification is enabled, and Requests will throw a SSLError if it's unable to verify the certificate
Click me to see the sample solution
Python Code Editor:
More to Come !
Do not submit any solution of the above exercises at here, if you want to contribute go to the appropriate exercise page.
Test your Python skills with w3resource's quiz
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics