Part 1: Creating database, table, and inserting data into table

Step1: Import required packages

Selenium, beautiful soup, sqlite3 are the core packages required in this project. You can install these packages using pip install package_name.


import time

from selenium import webdriver # Used in data extraction

import os

from bs4 import BeautifulSoup as btsp # Used in data extraction

import sqlite3 # Used in accessing SQLite database


Step2: Create SQLite database

There are two ways to create a database using sqlite3 module - You can create a SQLite database in memory or on the disk. We will create a database in a selected folder .


# Change to working directory

os.chdir('directory of selected folder')


# Create a database

db ='Orlando Properties.db' # Name of the database


Step3: Create a table called zillow within the database

After creating the database, now you can create a table with all the desired columns . First, we need to create a connection to the created database using as illustrated below. Generally, the syntax for creating a table within SQLite database is a follows.

SQLite statement for creating a table : '''CREATE TABLE table_name ([column1_name] datatype, [column2_name] datatype) '''


# Connect to the created database

conn = sqlite3.connect(db)

# Create a cursor object - Useful when adding contents to the database

c = conn.cursor()


# Create a table named zillow with attributes as described below

c.execute('''CREATE TABLE zillow

([Address] text PRIMARY KEY,

[Price] float,

[Bedrooms] float,

[Bathrooms] float,

[Area_sqft] float)''')

# Save changes to the database

conn.commit()


Step4: Extract data - Zillow Property Prices in Orlando FL

We will use web scraping techniques using python selenium and Beautiful packages to extract the required information from the Zillow website. Take a look at the property listings here. We will extract address, price, area, and number of bathrooms and bedrooms information for all property listings on the first page. Below is the snippet of the python code used to extract the required information.


# Use chrome as the web driver

driver = webdriver.Chrome('\\chromedriver_win32\\chromedriver.exe')

# Link to the property listings

b1 = 'https://www.zillow.com/orlando-fl/'

# Load the webpage

driver.get(b1)

# Wait for the page to fully load

time.sleep(10)


# Parse the Html from selenium to beautiful soup

spu = btsp(driver.page_source, 'html.parser')


# list of all properties on the first page

lt = spu.find_all(class_="list-card list-card_not-saved")


'''

Extract Address, Price, Bedrooms, Bathrooms, Area_sqft of each property listing and save the output to a database

'''

# Loop through all the listed properties


for p in lt:

# Bathrooms,Bedrooms and area

bb_a = p.find(class_="list-card-details")

# Extract bedroom

bdr = bb_a.text.split('bds')[0]

bedroom = float(bdr)

# Extract bathroom

bthr_area = bb_a.text.split('bds')[1]

# Number of bathrooms

bthr = bthr_area.split('ba')[0]

bathroom = float(bthr)

# Extract Area -sqft

ar = bthr_area.split('ba')[1]

# Remove comma and sqft

ar_clean= ar.replace(',','').replace('sqft','')

area = float(ar_clean)


# Get Address

adr = p.find(class_="list-card-addr")

address = adr.text.strip()

# Get Price

pr = p.find(class_="list-card-price")

# Clean - remove $ and , in between string

pr = pr.text.replace('$','').replace(',','').strip()

# Convert string to float

price = float(pr)


Step5: Insert data into the created database

At this point we have extracted the property details from Zillow, and all the required is ready to be loaded up into the database and table we created in Steps 1 and 2. To insert rows within the table, follow the syntax below.

Sql statment = '''INSERT INTO Table_name (column1, column2,.....) VALUES( value1, value2,....) '''

# sql statement - we will parse the values a tuple

sql_st = '''INSERT INTO zillow (Address,Price,Bedrooms,Bathrooms,Area_sqft)

VALUES(?,?,?,?,?)

'''

# Items to load up in the database

item_list = (address,price,bedroom,bathroom,area)

# Finally insert contents into the database

c.execute(sql_st,item_list)

# Save edits to the database

conn.commit()


Putting it all together - You can download/modify the python script using the link to my Google Colab Notebook here. Below is the snippet of the final table that was created within the database. You can visualize the database tables using sqlite DB brower.