AirBnB is one of the fastest growing companies disrupting the startup space. Having scored the top spot on the “Best Places to Work for in 2016” report, by Glassdoor and with increasing demand for data science skills, from all sides of the organization from product to finance to operations – AirBnB is definitely a name to reckon in the data science technology domain. The secret behind the growth of business for AirBnB is cultivating trust. Data Science technologies are at the core of identifying drivers of trust to engage more users and find out novel ways on how to alleviate trust. Data science technology is the key differentiator for the rapid growth of AirBnB and how it is able to make better recommendations by matching the right people together. AirBnB data scientists have been in the forefront of developing unique data products and modifying existing open source technologies to perfect suit their needs.
AirBnB matches people who are looking for accommodation (guests) in a particular city with people who are willing to rent out their place. Guests can connect with the hosts based on the listings they prefer to stay in. A match is said to be successful on AirBnB only if the host is willing to accommodate the guest. With over 10 million nights booked - more than 25 million people across 192 countries and 34,000 cities having availed their services - the AirBnB revenue is rising sky-high scoring a valuation of $25.5 billion as of June 2015.
With 20 TB of data created daily and 1.4 petabytes of archived data, it has become the lifeblood of business at AirBnB. AirBnB serves approximately 10 million requests a day and processes one million search queries. Data is the voice of customers at AirBnB and is used to offer personalized services by creating a perfect match between the guests and hosts for supreme customer experience. AirBnB uses host guest interaction, current events and local market history to provide real time recommendations –which the travellers can accept or reject.
Data is the voice of your customer. Data is effectively a record of an action someone in your community performed, which represents a decision they made about what to do (or not) with your product. Data scientists can translate those decisions to stories that others can understand. - said Riley Newman, head of the data science team at AirBnB
For a completely online organization like AirBnB, data analytics plays a vital role in providing best in class customized services to customers. AirBnB uses the right set of scalable, flexible big data tools and data science techniques to continue their growth. The data science team at AirBnB believes in using data driven insights to influence decisions and make sure that the decisions have the intended effect on customers.
Data Science at AirBnB helps prioritize product decisions and is the secret behind tremendous growth of this startup. AirBnB data scientists are the loudhailers for amplifying the voice of the customers by predicting their desires from customer interaction logs and interpreting them to incorporate actionable decision for the product, customer support and the marketing team. There are several data science techniques being used by AirBnB to learn more about its users-
This is a common data science method used to find out the best product fit or market fit. Using A/B testing methodology, data science team tests various designs or configurations of a website or a product to understand how users respond to them. Data Science team at AirBnB uses A/B testing by exposing the users of their website, to various recommendation and ranking algorithms. The behaviour of the users is then correlated with the actual ratings or reviews they leave, which helps them test the effectiveness of the algorithms. The main objective of A/B Testing at AirBnB is to find out if they are doing a better job by matching the right people together.
Photos serve as the initial contact between AirBnB and its users. Guests are likely to make a decision on - if they should go with a particular listing based on what attracts their eyes. AirBnB does analysis on photos to find out which ones work best for their users, what features in the photos make them most sought after and what kind of photos on the website get more number of clicks. AirBnB is still in the initial stages of using the photo analysis machine learning technology. The motive of implementing this at AirBnB is to create a feedback loop that will help the hosts on their website, to get best in class photos for their listing. The algorithm is expected to automatically recommend the AirBnB free professional photography service that connects hosts on AirBnB with other professional photographers nearby.
At AirBnB, the host and the guest experience a real life interaction which sometimes forces them to leave better reviews even if the experience was only just satisfactory. These reviews falsely portray a positive image for the host and guest and star ratings are usually exaggerated. To interpret the true feelings of users, AirBnB uses natural language processing technology that analyses the review boards or the messages boards through sentiment analysis. This helps AirBnB understand about the true feeling behind the reviews.
Predictive modelling technique is an interesting side of data science at AirBnB to analyse how various markets will perform, so that the resources can be prioritized. Using predictive modelling, AirBnB can create market specific forecast with multiple variables. AirBnB has a devoted team that forecasts and reports to optimize the existing predictive models. Data mining at AirBnB helps the hosts to predict the best possible rates for their rentals.
AirBnB uses regression analysis technique to find out which features of a particular listing have a major impact on the bookings made. Regression analysis has helped AirBnB figure out that, the quality of visuals plays a vital role in bookings. To enhance the quality of visuals they started free professional photography for hosts and the results are amazing. This has led to a definite rise in revenue for AirBnB.
AirBnB data science team uses collaborative filtering techniques to model host preferences. Using collaborative filtering, the users (hosts) and the items (trips) data can be used to understand the preference for items by combining historical ratings through statistical learning from related hosts. However, collaborative filtering framework alone did not fit in completely into the model for host preferences. The data scientists used the multiplicity of responses for guest host interaction, for the same trip, to cut down the noise coming from the latent factors.
AirBnB is a big user of the Hadoop technology, as all the unstructured information about the rooms, room owners, locations of the room is sorted and analysed using the open source framework - Hadoop. Apache Hive data warehouse is used on top of Hadoop with 1.5 petabytes of data. To process more number of Hadoop jobs regularly at AirBnB the marketing team and all other employees also use the analysis tools.
AirBnB approximately processes 6000 hadoop tasks daily. Using only Hadoop, was causing some difficulties in maintaining the order of tasks and coordinating the results which led to the development of its own hadoop workflow system known as Airflow. Airflow is open source and is already in use at five companies. Airflow is a tool built by the data engineers for the data people, that mainly focuses on authoring and monitoring new data pipelines.
Airflow is easy to install and uses python language interface, that helps users define new classes of data, commands how to manage these classes and writes “for loop” or any other python statements which require repetition.
Airflow is used for batch processing side of Hadoop when there are several jobs to be executed. This hadoop workflow system at AirBnB ensures that all the resources are assigned correctly, executed and run in the right order and after completion their execution is not involuntarily repeated. The progress of jobs is also monitored by Airflow and results are updated to various business processes. Airflow can show how many hadoop jobs are running, what are the resources in use by those jobs, how many jobs have completed, how many jobs with errors have disturbed the multi job workflow.
AirBnB’s matching algorithm between the hosts and guests is driven by effective search. Thus, effective tuning of the search engine is important to drive growth and delight customers. Earlier, AirBnB did not have enough data that could be analysed to provide guidance to their customers so it just returned the highest quality of listings in nearby locations based on the users search.
With increased number of users, AirBnB acquired more data over time and substituted their initial search model with user data driven search model. A model was built using the huge dataset of host and guest interaction. The model was built on an estimated conditional probability of booking in a particular location, given the person searched.
A search for accommodation in San Francisco will also drive the model towards neighbouring areas typically where there is a probability for a person to likely make a booking, for example Lower Haight or Mission District.
User data driven search model led to increased number of bookings and high level of customer satisfaction. AirBnB succeeded in delivering a better product to its customers by tapping into big data technologies.
AirBnB price tip feature, is a continuously updating guide that tells hosts what is the probability of them getting a booking at the price they have chosen. Hosts can look at the calendar and see what dates they are likely to be sold out at the current price offering (highlighted in green) and which dates they aren’t (highlighted in red). If a host prices their listings within 5% of the suggested price by the price tip feature - the probability of them getting a booking increases times 4.
AirBnB price recommendation engine pulls approximately 5 billion training data points. The model is designed to pull together everything that AirBnB’s huge data set can predict about the best price of a listing depending, on various factors like the size of the listing, the neighbourhood, etc. Aerosolve, an open source machine learning package built by AirBnB data science team is the secret behind AirBnB price tips for hosts. The machine learning package helps AirBnB find more relationships between the prices and host listings.
AirBnB is driving growth by tailoring customer requirements based on different demographics. In 2014, AirBnB found that customers from particular Asian countries have a higher bounce rate when they visit the home page and most of them leave the website without making a booking. Later from data analysis, AirBnB found that users were diverted by the “Neighbourhood” link and the photos and would never return to make a booking after going through these photos.
The data science team at AirBnB redesigned the algorithm and removed the “Neighbourhood” link for visitors from Asian countries. They rather listed Top Travelling Destinations in Singapore, China, Japan and Korea. The result was astonishing - Asian visitor’s conversion rates increased by 10%.
AirBnB has taught some valuable lessons when it comes to considering big data as the voice of customers. The takeaway from the success of AirBnB for any company is to-
We would love to hear your thoughts on any other company that uses Big Data to increase their profitability and make data driven business decisions to the extent of AirBnB.