Seattle Airbnb Exploratory Analysis

Renalka
5 min readFeb 9, 2021

Data will talk to you, if you’re willing to listen.

Image courtesy : https://www.travelandleisure.com/hotels-resorts/vacation-rentals/best-airbnb-seattle

I have been fascinated the way data — that seems obscure and dull — can provide us with very meaningful insights. As a quest for learning data analytics and data science, I enrolled in the Udacity Data Science Nanodegree. The very first project requirement is an exercise to conduct a CRISP-DM (Cross Industry Standard Process for Data Mining) analysis on housing properties listed in the Seattle area on Airbnb. The GitHub repository can be found here.

The data-set has been sourced from Kaggle, that includes housing information in Seattle Airbnb, for a period of 13 months from January ’16 to January ’17.

I will be exploring the questions mentioned below.

  1. Which months have the most Airbnb listings?
  2. Which months fetch the highest prices?
  3. What is the total revenue that can be generated each month?
  4. Which neighborhoods are the most expensive and the cheapest?
  5. How much time do the hosts take in responding?
  6. Which are the top ten neighborhoods in terms of number of listings?
  7. What are the main types of properties available?
  8. Which factors affect ratings and prices the most?

Introduction

In 2016, there were 3818 Airbnb listings in Seattle. The data available ranges from January 2016 to December 2016. The annual average price charged by a host on a listing in Seattle is $ 33,765. The average price charged by a host on a listing is $ 93.

Q1. Which months have the most Airbnb listings?

Bar graph for months vs number of available listings

December has the most number of listings — 87,061 (11.7% more than the average), followed by October with 82,438 listings and November, which has 81,780 listings. It can be concluded that October-December are the peak months.

Q2. Which months fetch the highest prices?

Bar graph of month vs average price

Based on the bar graph above, it is evident that June and December are the highest price months in Seattle. The hosts charged $ 99 in June and $ 101 in December on an average, which is approximately 7%-8% higher than the annual average price. A possible reason for this could be these months being holiday seasons. Due to the increased demand and popularity, hosts probably spike their prices.

Q3. What is the total revenue that can be generated each month?

Bar graph of month vs total revenue possible

As seen from the bar graph above, December again is the peak revenue generating month, followed by August. The total revenue possible in December is $11,949,282 while August can generate up to $11,502,179.

Q4. Which neighborhoods are the most expensive and the cheapest?

Line graph for neighborhoods vs average price
Bar graph for neighborhoods vs average price

We can see that the maximum average price is $231.705882 and it is concluded that Southeast Magnolia and Portage Bay have the most expensive listings. Rainier Beach is the cheapest neighborhood with an average price of $68.555556.

Q5. How much time do the hosts take in responding?

Bar graph for response time vs % of total hosts

Most of the hosts respond immediately with 85% of all hosts responding within the day, and approximately 45% of the hosts responding within an hour.

Q6. Which are the top ten neighborhoods in terms of number of listings?

Bar graph for neighborhoods vs % of number of listings

The most number of listings can be found in Broadway in Seattle. More than 10% of all listings spread across 87 neighborhoods are in Broadway itself. The second and third most popular neighborhoods are Belltown and Wallingford, respectively.

Q7. What are the main types of properties available?

Bar graph for property type vs % of total number of listings

As is evident from the bar graph, houses make up 45.39% of all listings, followed by apartments with 44.73%. Hence, over 90% of all listed properties are either houses or apartments.

Q8. Which factors affect ratings and prices the most?

Correlation matrix between ratings and quality of rooms

We can observe from the correlation matrix that the property’s cleanliness, value and accuracy seem to be the most influential factors affecting a rating, followed by check-in and communication. Interestingly, location is least correlated with the overall rating. Hence, to fetch good ratings, the owners must keep their properties immaculate, should not mislead the customers and should be good at communication.

Correlation matrix between price and amenities

Similarly, bedrooms offered by the host seem to be the most influential factors affecting the price, followed by beds and bathrooms. Minimum and maximum nights offered do not affect the price as such. Hence, when amenities are concerned, bedrooms affect the prices most.

Predicting Prices using Linear Regression

Considering the numerical valued columns as features, linear regression was performed. With linear regression, an r2 score of ~50% was reached after some data cleaning. Ideally, more data engineering should be done to achieve a better accuracy or other models should be employed for the prediction of prices as price doesn’t seem to have a linear relationship with these features.

Conclusions

We can see that the dataset most relevant to our analysis for this question is the listings dataset. The calendar dataset looks to be more relevant to supplement the listings dataset for our other questions on popular times and availability etc. A larger dataset with more years to include in the analysis can help determine if any seasonality exists .The third dataset with customer reviews cannot be used for much quantitative analyses. NLP can be used on the ‘comments’ column to perform sentiment analysis of various listings.

--

--