• Home
  • Data Analytics Projects
    • Data Analytics Using Python
      • Mini Project: AirBnB Recommender App
      • Data Visualization
      • Text Mining (Text Nomalisation)
      • Statistics Analysis
      • A Study of Food Consumption in the World
    • DataCamp Projects
    • Tableau Visualizations
    • Experimenting with Google Teachable Machines
    • Data Analytics Using R
      • Udemy Projects
      • Using Dplyr
      • Hypothesis Testing
      • Linear Regression (Project1)
      • Linear Regression (Project 2)
      • Linear Regressions Exploring Interactions
      • Regression Models
      • Multiple Regression (Project 2)
  • Philanthropy
    • 2018 Water for Life
    • 2019 Habitat for Humanity Global Build
  • My Thoughts
  • Contact
BarbaraYam.com
  • Data Analytics Projects
    • Data Analytics Using Python
      • Mini Project: AirBnB Recommender App
      • Data Visualization
      • Text Mining (Text Nomalisation)
      • Statistics Analysis
      • A Study of Food Consumption in the World
    • DataCamp Projects
    • Tableau Visualizations
    • Experimenting with Google Teachable Machines
    • Data Analytics Using R
      • Udemy Projects
      • Using Dplyr
      • Hypothesis Testing
      • Linear Regression (Project1)
      • Linear Regression (Project 2)
      • Linear Regressions Exploring Interactions
      • Regression Models
      • Multiple Regression (Project 2)
  • Philanthropy
    • 2018 Water for Life
    • 2019 Habitat for Humanity Global Build
  • My Thoughts
  • Contact

AirBnB  Recommender App

My first Python Programming class was very hands on and we mainly did a lot of coursework. Here's a final project I did using all the skills learned in the mini 21h course.  

Barbara_Yam_Hackwagon_DS101_-_Airbnb_Project

Hackwagon Academy - DS101

AirBnB Project

Learning Outcomes:

  • Learn how to translate business requirements into workable applications
  • Declare variables, and manipulate the variables to perform arithmetic operations
  • Create a list, append new elements to a list, remove elements from list, and access elements within a list
  • Create a dictionary, access data, and update information within the dictionary
  • Be able to aptly make use of if and nested if constructs
  • Variable conversion
  • Produce visualisations
  • Able to come up with insights based on the data
In [2]:
#Before you start, please perform the following 2 steps:
#1. Rename the file to <First_Name>_<Last_Name>_DS101_Lab_1 e.g. john_doe_DS101_Lab_1

#2. Fill in your details here:
#Name                    :Barbara Yam/span>

#Start of Course Class(Edit accordingly): 15 Jan 2019, 7pm
# FOR TA/INSTRUCTOR 
# Total Marks: 100 / 100
# Part 1: 5 / 5
# Part 2: 25 / 25
# Part 3: 10 / 10
# Part 4: 60 / 60
This project is split into 4 different parts: 1. Data Cleaning (5 marks) 2. Explorator Data Analysis (25 marks) 3. Interpretation (10 marks) 4. AirBnB Visualisation and Price Recommender App (60 marks) All code must follow expected output to be awarded full marks except the following: 1. Interpretation 2. Functions creation in part 4 (all attempts regardless right or wrong count as 2 marks each)

References

Important Collections Functions

Creation

Collection Type Function Examples
list
None
new_list = []

new_list = [1,2,3,4]
dict
None
new_dict = {}

new_dict = {'a': 1, 'b':2}

Add / Appending Data

Collection Type Functions Examples Resulting Output
list
`.append()`
new_list = [1,2,3]

new_list.append(4)
[1,2,3,4]
list
`.extend()`
new_list = [1,2]

new_list.extend([3,4])
[1,2,3,4]
dict
None
new_dict = {}

new_dict['a'] = 1

new_dict['b'] = 2
{'a': 1, 'b':2}

Updating / Changing Data

Collection Type Functions Examples Resulting Output
list
None
new_list = [1,2,3]

new_list[0] = 5
[5,2,3]
dict
None
new_dict = {'a': 1, 'b':2}

new_dict['a'] = 10
{'a': 10, 'b':2}

Accessing / Taking Out Data

Collection Type Functions x to be Examples
list
None
3 new_list = [1,2,3]

x = new_list[2]
list of list
None
3 new_list = [[1,2],[3,4]]

x = new_list[1][0]
list of dict
None
2 new_list = [{'a':1},{'b':2}]

x = new_list[1]['b']
dict
None
2 new_dict = {'a': 1, 'b':2}

x = new_dict['b']

CITU Framework & Applied Iterations

  1. What variables do you need to answer this question?
  2. Create the results container
  3. Iterate the input data/list
  4. Take out the variables you needed in step 1
  5. Test conditions of each value
  6. Update the results container when condition is fulfilled

Sorting Values

x = [10,20,50,2,4]
x.sort()
print(x) # [2,4,10,20,50]
x.sort(reverse=True)
print(x) # [50,20,10,4,2]

Further explore the .sort() function in the documentation

Search up 'list .sort() python 3.0'


</hr>





Welcome to your final project of Hackwagon Academy DS101! You've come a long way since the start of this course and if you've been on track with our exercises, you should find this doable.

Airbnb is an online marketplace and hospitality service, enabling people to lease or rent short-term lodging including vacation rentals, apartment rentals, homestays, hostel beds, or hotel rooms. The company does not own any lodging; it is merely a broker and receives percentage service fees (commissions) from both guests and hosts in conjunction with every booking. In this project, we aim to use algorithms and libraries to mine the reviews people have submitted on Singapore AirBnB rentals in order to provide descriptive analytics.

Load File

Load the airbnb_data.csv as a list of dictionaries into a new variable called airbnb_data. Once you load the data, you should see something like this:

[
        {
         'listing_id': '1133718',
         'survey_id': '1280',
         'host_id': '6219420',
         'room_type': 'Shared room',
         'country': '',
         'city': 'Singapore',
         'borough': '',
         'neighborhood': 'MK03',
         'reviews': '9',
         'overall_satisfaction': '4.5',
         'accommodates': '12',
         'bedrooms': '1.0',
         'bathrooms': '',
         'price': '74.0',
         'minstay': '',
         'last_modified': '2017-05-17 09:10:25.431659',
         'latitude': '1.293354',
         'longitude': '103.769226',
         'location': '0101000020E6100000E84EB0FF3AF159409C69C2F693B1F43F'
        }
        ...
    ]
In [3]:
# Read file into a list called airbnb_data
import csv 

with open('airbnb_data.csv') as csvfile:
    data = csv.DictReader(csvfile)
    airbnb_data = []
    for row in data:
        airbnb_data.append(dict(row))

print(airbnb_data[:2])
[{'listing_id': '1133718', 'survey_id': '1280', 'host_id': '6219420', 'room_type': 'Shared room', 'country': '', 'city': 'Singapore', 'borough': '', 'neighborhood': 'MK03', 'reviews': '9', 'overall_satisfaction': '4.5', 'accommodates': '12', 'bedrooms': '1.0', 'bathrooms': '', 'price': '74.0', 'minstay': '', 'last_modified': '2017-05-17 09:10:25.431659', 'latitude': '1.293354', 'longitude': '103.769226', 'location': '0101000020E6100000E84EB0FF3AF159409C69C2F693B1F43F'}, {'listing_id': '3179080', 'survey_id': '1280', 'host_id': '15295886', 'room_type': 'Shared room', 'country': '', 'city': 'Singapore', 'borough': '', 'neighborhood': 'TS17', 'reviews': '15', 'overall_satisfaction': '5.0', 'accommodates': '12', 'bedrooms': '1.0', 'bathrooms': '', 'price': '77.0', 'minstay': '', 'last_modified': '2017-05-17 09:10:24.216548', 'latitude': '1.310862', 'longitude': '103.858828', 'location': '0101000020E6100000E738B709F7F659403F1BB96E4AF9F43F'}]

Data Cleaning (5 marks)¶

Once this is done correctly, you do not need to change the type for the remaining parts of your project.

Preview your data and clean them to appropriate type. Namely these columns:

  1. overall_satisfaction
  2. price
  3. longitude
  4. latitude
  5. reviews

Expected Output:

{
        'listing_id': '1133718',
        'survey_id': '1280',
        'host_id': '6219420',
        'room_type': 'Shared room',
        'country': '',
        'city': 'Singapore',
        'borough': '',
        'neighborhood': 'MK03',
        'reviews': 9.0,
        'overall_satisfaction': 4.5,
        'accommodates': '12',
        'bedrooms': '1.0',
        'bathrooms': '',
        'price': 74.0,
        'minstay': '',
        'last_modified': '2017-05-17 09:10:25.431659',
        'latitude': 1.293354,
        'longitude': 103.769226,
        'location': '0101000020E6100000E84EB0FF3AF159409C69C2F693B1F43F'
    }
In [4]:
#Write code below
for row in airbnb_data:
    row['overall_satisfaction'] = float(row['overall_satisfaction'])
    row['price']= float(row['price'])
    row['longitude'] = float(row['longitude'])
    row['latitude'] = float(row['latitude'])
    row['reviews'] = float(row['reviews'])

print(airbnb_data[:1])

# 5 / 5
[{'listing_id': '1133718', 'survey_id': '1280', 'host_id': '6219420', 'room_type': 'Shared room', 'country': '', 'city': 'Singapore', 'borough': '', 'neighborhood': 'MK03', 'reviews': 9.0, 'overall_satisfaction': 4.5, 'accommodates': '12', 'bedrooms': '1.0', 'bathrooms': '', 'price': 74.0, 'minstay': '', 'last_modified': '2017-05-17 09:10:25.431659', 'latitude': 1.293354, 'longitude': 103.769226, 'location': '0101000020E6100000E84EB0FF3AF159409C69C2F693B1F43F'}]

Exploratory Data Analysis (35 marks)¶

The data team at AirBnB wishes to find out the answers to a few simple questions on the existing listings in Singapore. Your goal is to manipulate the data you have stored in the list of dictionaries and understand some of the basic statistics of your dataset. The following are some of the common first questions asked.

Q1. List out each neighborhoods and their number of listings (5 marks)

Hint

  1. Counting with dictionaries </i>

Expected Output:

When you search for ['TS17'], it should give you 342 counts.
In [5]:
#Write code below
neighborhood_listing ={}

for row in airbnb_data:
    neighborhood = row["neighborhood"]
    if neighborhood in neighborhood_listing:
        neighborhood_listing[neighborhood] += 1
    else:
        neighborhood_listing[neighborhood] =1

print("When you search for ['TS17'], it should give you " + str(neighborhood_listing['TS17']) + " counts.")

# 5 / 5
#can use break to stop the iteration after one row
When you search for ['TS17'], it should give you 342 counts.

Q2. List out each neighborhood and their average overall_satisfaction (5 marks)

Note: You should filter out listings whose reviews are 0.

Hint

  1. Create dictionary where key is the neighborhood_id and value is a list of overall_satisfaction
  2. Create another dictionary to compute the average </i>

Expected Output:

When you search for ['TS17'], it should give you an average score of 2.859447004608295.
In [6]:
#Write code below
satisfaction_dictionary = {}

for row in airbnb_data:
    neighborhood_id = row["neighborhood"]
    satisfaction = row["overall_satisfaction"]
    reviews = row["reviews"]
    if neighborhood_id not in satisfaction_dictionary:
        if reviews != 0.0:
            satisfaction_dictionary[neighborhood_id] = [satisfaction]
    else:
        if reviews !=0.0:
            satisfaction_dictionary[neighborhood_id].append(satisfaction)
    
average_satis_dict = {}
for neighborhood_id, list_satis in satisfaction_dictionary.items():
    ave_satis = sum(list_satis)/len(list_satis)
    if neighborhood_id not in average_satis_dict:
        average_satis_dict[neighborhood_id] = ave_satis
    
print("When you search for ['TS17'], it should give you an average score of",average_satis_dict['TS17'],".")

# 5 / 5

#can filter for reviews != 0 first before finding creating the list
#for key, value in results.items():
# results[key] = sum(value)/len(value)
When you search for ['TS17'], it should give you an average score of 2.859447004608295 .

Q3. List out each neighborhood and their average price (5 marks)

Hint

  1. Similar to previous question </i>

Expected Output:

When you search for ['TS17'], it should give you an average price of 95.5672514619883.
In [7]:
#Write code below
price_dictionary = {}

for row in airbnb_data:
    neighborhood_id = row["neighborhood"]
    price = row["price"]
    reviews = row["reviews"]
    if neighborhood_id not in price_dictionary:
            price_dictionary[neighborhood_id] = [price]
    else:
            price_dictionary[neighborhood_id].append(price)

ave_price_dict = {}
for neighborhood_id, price in price_dictionary.items():
    ave_price = sum(price) /len(price)
    if neighborhood_id not in ave_price_dict:
        ave_price_dict[neighborhood_id] = ave_price

print("When you search for ['TS17'], it should give you an average price of",ave_price_dict['TS17'],".")

# 5 / 5
When you search for ['TS17'], it should give you an average price of 95.5672514619883 .

Q4. Plot a distribution of counts of the overall_satisfaction (5 marks)

Note: You should filter out listings whose reviews are 0.

Hint

  1. Counting with dictionaries
  2. Get a list of tuples with .items()
  3. Create 2 lists:
    • 1 for all the scores labels
    • 1 for all the counts
  4. Plot with the 2 lists </i>

Expected Output:

In [8]:
# Remember to import the relevant library/libraries!
# Write code below:

satis_count ={}
for row in airbnb_data:
    reviews = row["reviews"]
    satisfaction1 = row["overall_satisfaction"]
    if reviews !=0:
        if satisfaction1 in satis_count:
            satis_count[satisfaction1] += 1
        else:
            satis_count[satisfaction1] = 1

x= tuple(satis_count.items())

satis_list =[]
count_list = []
for row in x:
    satis_list.append(row[0])
    count_list.append(row[1])
        
import matplotlib.pyplot as plt
plt.bar(satis_list, count_list)
plt.title("Distribution of Overall Satisfaction Scores")
plt.xlabel("Overall Satisfaction Scores")
plt.ylabel("Counts")
plt.show()

# 5 / 5
<Figure size 640x480 with 1 Axes>

Q5. Plot a geographical representation of all of the listings in Singapore (5 marks)

Hint

  1. Create a list for latitude
  2. Create a list for longitude
  3. Append each listing's latitude and logitude to the lists
  4. Plot a scatter plot using both lists </i>

Expected Output:

In [9]:
#Write code below
latitude_list = []
longitude_list = []

for row in airbnb_data:
    latitude_list.append(row["latitude"])
    longitude_list.append(row["longitude"])
    
import matplotlib.pyplot as plt
plt.scatter(longitude_list, latitude_list)
plt.title("Geographical Representation of All Airbnb Listings in Singapore")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()

# 5 / 5

Interpretation (10 marks)

Answer the following questions to better understand the Airbnb dataset.

You're free to make some assumptions

Q1. Why do you think the overall_satisfaction is in intervals of 0.5 and not otherwise? (5 marks)¶

Answer:¶

Overall_satisfaction is a range from 0 to 5 in intervals of 0.5 so that the input can is limited to 10 numbers for easier statistical analysis. People cannot input numbers like 4.8; for example, they can decide if they would give the listing a 4 or 5 depending on their level of preference and if they are ambivalent between 4 and 5, they could input 4.5. This way, the preference of each consumer will be more distinct.

In [10]:
# 5 / 5
# it also gives you a nice histogram! (=

Q2. Why was there a need to filter reviews greater than 0 in question 2 and 4? (5 marks)¶

Answer:¶

Those with more than 0 reviews are likely to be more reliable listings as there is evidence that there were actual people who have stayed there.

In [11]:
# 5 / 5

AirBnB Visualisation and Price Recommender App (60 marks)

Attempts to create the functions are awarded 2 marks each Scenario: Based on the earlier EDA, the earlier codes were not modular and scalable, hence do not allow the AirBnB team to look into each neighborhood. As such, the AirBnB data team have tasked you to build a simple application to improve the earlier EDA while serving its 2 users: Guests and Hosts. Your objective: Develop an app which will serve the 2 main users: 1. Guests - Visualisation tool to recommend them the best listings based on price and overall satisfaction score in a neighborhood 2. Hosts - Recommend a price to set for their listing in a given neighborhood based on better performing listings
THIS NEXT CELL IS IMPORTANT FOR YOUR APPLICATION. Run it to install this package called mplleaflet. How do you know if you installed the library correctly? Try running the cell after this one (not the line that says "!pip install mplleaflet", its the other one), if you don't get an error, you are good to go! If you face any issues, please contact any of your TAs or Instructors.
In [12]:
!pip install mplleaflet
Requirement already satisfied: mplleaflet in c:\users\yamba\anaconda3\lib\site-packages (0.0.5)
Requirement already satisfied: six in c:\users\yamba\anaconda3\lib\site-packages (from mplleaflet) (1.12.0)
Requirement already satisfied: jinja2 in c:\users\yamba\anaconda3\lib\site-packages (from mplleaflet) (2.10.3)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\yamba\anaconda3\lib\site-packages (from jinja2->mplleaflet) (1.1.1)

How do you know if you installed the library correctly? Try running the next cell, if you don't get an error, you are good to go!

In [13]:
import mplleaflet

Building the App

To begin building the App, there are 2 things to do:

  1. Build the functions
  2. Test the functions

After we are done building the functions in part 1, we will test them in part 2

Every single function you create must have the airbnb_data variable as the first parameter so that you can use it inside the function.
def example_function_1(data, x, y, ..):
    for i in data:
        print(i)

# when using it.. notice that airbnb_data is placed first, followed by the other parameters
example_function_1(airbnb_data, some_x, some_y, ...)

There are a total of 5 functions:

  1. get_all_latitudes
  2. get_all_longitudes
  3. listings_recommender
  4. price_recommender
  5. visualise_listings

get_all_latitudes() - Functions to get all latitudes given a list of listing_ids (2 marks)¶


Input: airbnb_data as data, a list of listing_ids

Return: A list of latitudes

In [14]:
#Write code below
def get_all_latitude(data, listing):
    latitude_list = []
    for row in data:
        latitude = row["latitude"]
        for item in listing:
            if item in (row['listing_id']):
                latitude_list.append(latitude)
    return(latitude_list)

# 2 / 2

Tester Cell - To test the above function to see if it's working.

Expected Output:

[1.311147]
In [15]:
get_all_latitude(airbnb_data, ['12367758'])
Out[15]:
[1.311147]

get_all_longitudes() - Functions to get all longitudes given a list of listing_ids (2 marks)¶


Input: airbnb_data as data, a list of listing_ids

Return: A list of longitudes

In [16]:
#Write code below
def get_all_longitude(data, listing):
    longitude_list = []
    for row in airbnb_data:
        longitude = row["longitude"]
        for item in listing:
            if item in (row['listing_id']):
                longitude_list.append(longitude)
    return(longitude_list)

# 2 / 2

Tester Cell - To test the above function to see if it's working.

Expected Output:

[103.857933]
In [17]:
get_all_longitude(airbnb_data, ['12367758'])
Out[17]:
[103.857933]

listings_recommender() - Function to recommend all listings based on a given price, satisfaction score and neighborhood (2 marks)¶

Note:

  1. Lesser than or equal to that price
  2. Equal or more than that overall satisfaction score
  3. In that neighborhood

Input: airbnb_data as data, price, overall_satisfaction, neighborhood_id

Return: A list of listing_ids

In [18]:
#Write code below
def listings_recommender(data, price, overall_satisfaction, neighborhood_id):
    list_of_listings =[]
    for row in data:
        data_price = row['price']
        data_satisfaction = row['overall_satisfaction']
        data_neighborhood = row['neighborhood']
        data_listing = row['listing_id']
        if neighborhood_id == data_neighborhood:
            if data_price <= price and data_satisfaction >= overall_satisfaction:
                list_of_listings.append(data_listing)
    return(list_of_listings)

# 2 / 2

Tester Cell - To test the above function to see if it's working.

Expected Output:

['10350448',
 '13507262',
 '13642646',
 '15099645',
 '6451493',
 '4696031',
 '2898794',
 '13181050',
 '9022211',
 '5200263',
 '6529707',
 '14433262']
In [19]:
listings_recommender(airbnb_data, 60, 5, 'MK03')
Out[19]:
['10350448',
 '13507262',
 '13642646',
 '15099645',
 '6451493',
 '4696031',
 '2898794',
 '13181050',
 '9022211',
 '5200263',
 '6529707',
 '14433262']

price_recommender() - Function to recommend a price in a neighborhood based on average price and overall satisfaction (2 marks)¶

For this function, we want to build a simple price recommendation function that will give a potential host a suggested price.

To build this, these are the requirements:

  1. Take all listings in that neighborhood and check for listings with a least 1 review and an overall satisfaction score of 4 or more.
  2. From that filtered listings, calculate the average price and return that as the suggested price rounded to 2 decimal places.

Input: airbnb_data as data, a neighborhood_id

Return: A float of recommended price

In [20]:
#Write code below
def price_recommender(data, neighborhood_id):
    price_dictionary={}
    for row in data:
        data_price = row['price']
        data_satisfaction = row['overall_satisfaction']
        data_neighborhood = row['neighborhood']
        data_listing = row['listing_id']
        data_review = row['reviews']
        if neighborhood_id == data_neighborhood:
            if data_review != 0 and data_satisfaction >= 4.0:
                if neighborhood_id not in price_dictionary:
                    price_dictionary[neighborhood_id] =[data_price]
                else: 
                    price_dictionary[neighborhood_id].append(data_price)

        for neighborhood, price in price_dictionary.items():
            average = sum(price) / len(price)
    return(round((average),2))

# 2 / 2

visualise_listings() - Function to geographically visualise a given list of listings (2 marks)¶

Use the mplleaflet library. Normally you would do 'plt.show()' to show your visualisation. Do the same thing, but just do 'mplleaflet.show()', instead of 'plt.show()'.

Input: airbnb_data as data, a list of listing_ids

Output: Visualisation of locations the listings (nothing to return)

In [21]:
# Remember to import the relevant library/libraries!
import matplotlib.pyplot as plt
import mplleaflet

#Write code below
def visualise_listings(data, listing):
    plt.plot(get_all_longitude(data, listing),get_all_latitude(data, listing),'bs')
    mplleaflet.show()
    
# 2 / 2

Tester Cell - To test the above function to see if it's working.

Expected Output: A visualisation should appear as a new tab in your browser. The listing is between Kitchener Road and Somme Road.

In [22]:
visualise_listings(airbnb_data, ['12367758'])

More functions of your own if you want... (no bonus marks given :'') )¶

In [ ]:
 

Testing

Here, we will test if your functions are working as they are supposed to.

Your task: Use the functions created above, if necessary interchangeably, to answer the questions below.


User - An Airbnb Host

Imagine now you're an Airbnb host and you are going to use the app you've developed to ask for a recommended price to list your place.

Based on your assigned neighborhood, what is the recommended price for your neighborhood (15 marks)

Expected output: 66.28

In [23]:
neighborhood_to_test = 'TS17'

#Write code below
price_recommender(airbnb_data,'TS17')

# 15 / 15
Out[23]:
66.28

User - An Airbnb Guest

Imagine now you're an Airbnb guest and you are going to use the app to find a list of listings you want based on your search filter/restrictions.

Based on your assigned price, overall_satisfaction and neighborhood, using the functions created above and plot them out on a map (35 marks)

Expected output: Visualisation should show the listings are in the Boon Keng / Farrer Park areas

If it's working, a new tab will pop out. This is normal.

In [24]:
neighborhood_to_test = 'TS17'
price_to_test = 100
overall_satisfaction_to_test = 4

#Write code below
listing_visual = listings_recommender(airbnb_data, price_to_test, overall_satisfaction_to_test, neighborhood_to_test)
get_all_latitude(airbnb_data,listing_visual)
get_all_latitude(airbnb_data, listing_visual)

visualise_listings(airbnb_data, listing_visual)

# 35 / 35
In [ ]:
 

Disclaimer: 

This is a personal website. The opinions expressed here represent my own and not those of my employer. 

In addition, my thoughts and opinions change from time to time I consider this a necessary consequence of having an open mind.

All rights reserved 2024 

Privacy Policy applies 

Terms and Conditions apply.