Web Scrapping using python
Introduction
Web scrapping / web data extraction is a method used to acquire data from websites around the globe. There are numerous commercial and open source tools that can be deployed to interact with websites either directly or through a hypertext transfer protocol (HTTP). Extraction of data from websites require an understanding of html file configuration or other web-based file formats. Web pages are designed to be static or dynamic, and this configuration impacts the way a web page will be scrapped.
Here I will show examples of the web pages that I have scraped using different python packages. The commonly used python scrapping packages are scrapy, selenium, beautiful soup, and request. You can read more about the applicability of these packages within the python frame work.
Note: It should be noted that web scrapping is not the preferred method of extracting data from the web. Use of API if available or requesting the data directly from the owner would be preferred. Finally, follow the terms and conditions before you start to web scrape any site.
Coffee Shops around Chicago
Selenium and Beautiful soup python packages were used to extract the name, location, and rating of coffee shops in Chicago. A python script was written to extract the required data, and saved the final outputs to a CSV file.
Weather Data from Dark sky
Scrapy python package was employed to web scrap historical minimum and maximum air temperatures, and precipitation data from Dark Sky weather app.
Zillow Property Prices
Web scrapping property prices using selenium and Beautiful Soup python packages for any location within the US. Below is the outlook of the generated excel file spreadsheet.