Part 1: Creating database, table, and inserting data into table
Step1: Import required packages
Selenium, beautiful soup, sqlite3 are the core packages required in this project. You can install these packages using pip install package_name.
import time
from selenium import webdriver # Used in data extraction
import os
from bs4 import BeautifulSoup as btsp # Used in data extraction
import sqlite3 # Used in accessing SQLite database
Step2: Create SQLite database
There are two ways to create a database using sqlite3 module - You can create a SQLite database in memory or on the disk. We will create a database in a selected folder .
# Change to working directory
os.chdir('directory of selected folder')
# Create a database
db ='Orlando Properties.db' # Name of the database
Step3: Create a table called zillow within the database
After creating the database, now you can create a table with all the desired columns . First, we need to create a connection to the created database using as illustrated below. Generally, the syntax for creating a table within SQLite database is a follows.
SQLite statement for creating a table : '''CREATE TABLE table_name ([column1_name] datatype, [column2_name] datatype) '''
# Connect to the created database
conn = sqlite3.connect(db)
# Create a cursor object - Useful when adding contents to the database
c = conn.cursor()
# Create a table named zillow with attributes as described below
c.execute('''CREATE TABLE zillow
([Address] text PRIMARY KEY,
[Price] float,
[Bedrooms] float,
[Bathrooms] float,
[Area_sqft] float)''')
# Save changes to the database
conn.commit()
Step4: Extract data - Zillow Property Prices in Orlando FL
We will use web scraping techniques using python selenium and Beautiful packages to extract the required information from the Zillow website. Take a look at the property listings here. We will extract address, price, area, and number of bathrooms and bedrooms information for all property listings on the first page. Below is the snippet of the python code used to extract the required information.
# Use chrome as the web driver
driver = webdriver.Chrome('\\chromedriver_win32\\chromedriver.exe')
# Link to the property listings
b1 = 'https://www.zillow.com/orlando-fl/'
# Load the webpage
driver.get(b1)
# Wait for the page to fully load
time.sleep(10)
# Parse the Html from selenium to beautiful soup
spu = btsp(driver.page_source, 'html.parser')
# list of all properties on the first page
lt = spu.find_all(class_="list-card list-card_not-saved")
'''
Extract Address, Price, Bedrooms, Bathrooms, Area_sqft of each property listing and save the output to a database
'''
# Loop through all the listed properties
for p in lt:
# Bathrooms,Bedrooms and area
bb_a = p.find(class_="list-card-details")
# Extract bedroom
bdr = bb_a.text.split('bds')[0]
bedroom = float(bdr)
# Extract bathroom
bthr_area = bb_a.text.split('bds')[1]
# Number of bathrooms
bthr = bthr_area.split('ba')[0]
bathroom = float(bthr)
# Extract Area -sqft
ar = bthr_area.split('ba')[1]
# Remove comma and sqft
ar_clean= ar.replace(',','').replace('sqft','')
area = float(ar_clean)
# Get Address
adr = p.find(class_="list-card-addr")
address = adr.text.strip()
# Get Price
pr = p.find(class_="list-card-price")
# Clean - remove $ and , in between string
pr = pr.text.replace('$','').replace(',','').strip()
# Convert string to float
price = float(pr)
Step5: Insert data into the created database
At this point we have extracted the property details from Zillow, and all the required is ready to be loaded up into the database and table we created in Steps 1 and 2. To insert rows within the table, follow the syntax below.
Sql statment = '''INSERT INTO Table_name (column1, column2,.....) VALUES( value1, value2,....) '''
# sql statement - we will parse the values a tuple
sql_st = '''INSERT INTO zillow (Address,Price,Bedrooms,Bathrooms,Area_sqft)
VALUES(?,?,?,?,?)
'''
# Items to load up in the database
item_list = (address,price,bedroom,bathroom,area)
# Finally insert contents into the database
c.execute(sql_st,item_list)
# Save edits to the database
conn.commit()
Putting it all together - You can download/modify the python script using the link to my Google Colab Notebook here. Below is the snippet of the final table that was created within the database. You can visualize the database tables using sqlite DB brower.