Expedia Hotel Recommendations Data Science Project

Expedia Hotel Recommendations Data Science Project

In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Nathan Elbert

Senior Data Scientist at Tiger Analytics

This was great. The use of Jupyter was great. Prior to learning Python I was a self taught SQL user with advanced skills. I hold a Bachelors in Finance and have 5 years of business experience.. I... Read More

Prasanna Lakshmi T

Advisory System Analyst at IBM

Initially, I was unaware of how this would cater to my career needs. But when I stumbled through the reviews given on the website. I went through many of them and found them all positive. I would... Read More

What will you learn

Approach to the problem statement
Data Exploration
Handling missing values
Creating heat maps
Data exploration Visualisations
Feature engineering
Visualising the engineered features
Data Cleaning
Baseline accuracy
Implementation using random forests
Implementation using Naive Bayes
Implementation using Logistic regression
Implementation using KNN
Hyperparameter tuning
Comparison of algorithms
How to approach & solutionize a problem statement

Project Description

Planning your dream vacation, or even a weekend escape, can be an overwhelming affair. With hundreds, even thousands, of hotels to choose from at every destination, it's difficult to know which will suit your personal preferences. Should you go with an old standby with those pillow mints you like, or risk a new hotel with a trendy pool bar? 

expedia icon

Expedia wants to take the proverbial rabbit hole out of hotel search by providing personalized hotel recommendations to their users. This is no small task for a site with hundreds of millions of visitors every month!

Currently, Expedia uses search parameters to adjust their hotel recommendations, but there aren't enough customer specific data to personalize them for each user. In this competition, Expedia is challenging you to contextualize customer data and predict the likelihood a user will stay at 100 different hotel groups.

Data Description:

Expedia has provided you logs of customer behavior. These include what customers searched for, how they interacted with search results (click/book), whether or not the search result was a travel package. The data in this project is a random selection from Expedia and is not representative of the overall statistics.

Expedia is interested in predicting which hotel group a user is going to book. Expedia has in-house algorithms to form hotel clusters, where similar hotels for a search (based on historical price, customer star ratings, geographical locations relative to city center, etc) are grouped together. These hotel clusters serve as good identifiers to which types of hotels people are going to book, while avoiding outliers such as new hotels that don't have historical data.

Your goal of this project is to predict the booking outcome (hotel cluster) for a user event, based on their search and other attributes associated with that user event.

The train and test datasets are split based on time: training data from 2013 and 2014, while test data are from 2015. Training data includes all the users in the logs, including both click events and booking events. Test data only includes booking events.

destinations.csv data consists of features extracted from hotel reviews text.

Note that some srch_destination_id's in the train/test files don't exist in the destinations.csv file. This is because some hotels are new and don't have enough features in the latent space. Your algorithm should be able to handle this missing information.

Field Description:

Column name Description Data type
date_time Timestamp string
site_name ID of the Expedia point of sale (i.e. Expedia.com, Expedia.co.uk, Expedia.co.jp, ...) int
posa_continent ID of continent associated with site_name int
user_location_country The ID of the country the customer is located int
user_location_region The ID of the region the customer is located int
user_location_city The ID of the city the customer is located int
orig_destination_distance Physical distance between a hotel and a customer at the time of search. A null means the distance could not be calculated double
user_id ID of user int
is_mobile 1 when a user connected from a mobile device, 0 otherwise tinyint
is_package 1 if the click/booking was generated as a part of a package (i.e. combined with a flight), 0 otherwise int
channel ID of a marketing channel int
srch_ci Checkin date string
srch_co Checkout date string
srch_adults_cnt The number of adults specified in the hotel room int
srch_children_cnt The number of (extra occupancy) children specified in the hotel room int
srch_rm_cnt The number of hotel rooms specified in the search int
srch_destination_id ID of the destination where the hotel search was performed int
srch_destination_type_id Type of destination int
hotel_continent Hotel continent int
hotel_country Hotel country int
hotel_market Hotel market int
is_booking 1 if a booking, 0 if a click tinyint
cnt Numer of similar events in the context of the same user session bigint
hotel_cluster ID of a hotel cluster int


Column name Description Data type
srch_destination_id ID of the destination where the hotel search was performed int
d1-d149 latent description of search regions double

Similar Projects

In this data science project with Python, we will complete the analysis of what sorts of people were likely to survive.You will learn to use various machine learning tools to predict which passengers survived the tragedy.

In this machine learning project, you will build a model to predict the purchase amount of customer against various products which will help the company create personalized offer for customers against different products.

Using this Kaggle dataset, you will explore which type of employees make less or more money, or which employees get normal pay hikes and promotions.

Curriculum For This Mini Project

Problem Statement
Loading the Data
Fixing Missing Values
Visualizations - Heat Map
Visualizations - Basic
Visualizations - Advanced
Feature Engineering
Visualizations on Engineered Features
Data Cleaning
Calculating Baseline Accuracy
Modelling - Random Forest
Hyperparameter Tuning
Modelling - Naive Bayes Logistic and KNN
Comparing All Algorithms
Solution and Conclusion
Modular Code walk-through