With more than 8 million users, 1 billion Uber trips and 160,000+ people driving for Uber across 449 cities in 66 countries – Uber is the fastest growing startup standing at the top of its game. Tackling problems like poor transportation infrastructure in some cities, unsatisfactory customer experience, late cars, poor fulfilment, drivers denying to accept credit cards and more –Uber has “eaten the world” in less than 5 years and is a remarkable name to reckon when it comes to solving problems for people in transportation.
If you have ever booked an Uber, you might know how simple the process is –just press a button, set the pickup location, request a car, go for a ride and pay with a click of a button. The process is simple but there is a lot going on behind the scenes. The secret key driving growth of the $51 billion start-up, is the big data it collects and leverages for insightful and intelligent decision making. While Uber moves people around the world without owning any cars, data moves Uber. With the foundation to build the most intelligent company on the planet by completely solving problems for riders –Big Data and Data Science are at the heart of everything Uber does - surge pricing, better cars, detecting fake rides, fake cards, fake ratings, estimating fares and driver ratings. Read on to understand how Uber makes clever use of big data and data science to reinvent transportation and logistics globally.
Big Data at Uber
“Uber lives or dies by data. Their overall mission and their sustainability is completely dependent on how good their data is. The more data they can collect, the more information they can derive from patterns and behaviours. Their ability to increase profits is all dependent on that.”- said Spencer, a former Uber driver.
There is no need to look for a local taxi or to tip a bellman for the ride, you are just a click away from a high quality customer experience with Uber’s revolutionizing data driven business model. Data is the biggest asset for Uber and its complete business model is based on the big data principle of crowdsourcing. Anybody with a car willing to help someone get to a desired location can offer help in getting them there.
If you would like more information about Data Science careers, please click the orange "Request Info" button on top of this page.
It is tricky to get sufficient details on Uber’s big data infrastructure but we have some interesting information here about Uber’s big data. Uber’s data is collected in a Hadoop data lake and it uses spark and hadoop to process the data. Uber’s data comes from a range of data types and databases like SOA database tables, schema less data stores and the event messaging system, Apache Kafka.
CLICK HERE to get the Data Scientist Salary Report for 2016 delivered to your inbox!
Uber is greedy about what data it collects and with many cheap relative storage options like Hadoop and Spark-it has got data about every single GPS point for every trip taken on Uber. Uber stores historic information about its system and capabilities to ease doing data science for its data scientists down the road. Keeping the change logs, versioning of database schemas helps data scientist answer every question on-hand. With the data Uber has, data scientists can answer questions like what did the Uber system look like at a particular point of time from a customer perspective, supply behaviour perspective, from inter-server communication perspective or even to the state of a database.
With a huge database of drivers, as soon as a user requests for car, their algorithms match a user with the most suitable driver within a 15 second window to the nearest driver. Uber stores and analyses data on every single trip the users take which is leveraged to predict the demand for cars, set the fares and allocate sufficient resources. Data science team at Uber also performs in-depth analysis of the public transport networks across different cities so that they can focus on cities that have poor transportation and make the best use of the data to enhance customer service experience.
In fact, uber drivers continue to generate data for Uber even when they are not carrying any passengers because they transmit data back to the central platform at Uber which is used to draw inferences on traffic patterns. The data is stored into the database for supply and demand algorithm analysis. Driver data is used for autonomous car research, surge pricing, tracking the location of drivers, monitoring driver’s speed, motion and acceleration and identifying if a driver is working for a competing cab sharing company.
Big data analysis spans across diverse functions at Uber – machine learning, data science, marketing, fraud detection and more. Uber data consists of information about trips, billing, health of the infrastructure and other services behind its app. City operations teams use uber big data to calculate driver incentive payments and predict many other real time events. The complete process of data streaming is done through a Hadoop Hive based analytics platform which gives right people and services with required data at right time.
“Whether it’s calculating Uber’s “surge pricing, “helping drivers to avoid accidents, or finding the optimal positioning of cars to maximize profits, data is central to what Uber does. All these data problems…are really crystalized on this one math with people all over the world trying to get where they want to go. That’s made data extremely exciting here, it’s made engaging with Spark extremely exciting.”- said Uber’s Head of Data Aaron Schildkrout.
Data Science at Uber
Data science is an integral part of Uber’s products and philosophy. Uber does an exceptional job of hiring data-oriented people throughout the company through its exclusive Uber Analytics test v3.1. Any individual applying a job at Uber that requires analysing back-end extract from the application, has to take the Uber Analytics Test.
CLICK HERE to get a knack of Uber Data Science Interview Questions!!!
On the product front, Uber’s data team is behind all the predictive models powering the ride sharing cab service right from predicting that “Your driver will be in here in 3 minutes.” to estimating fares, showing up surge prices and heat maps to the drivers on where to position themselves within the city.The business success of Uber depends on its ability to create a positive user experience through statistical data analysis. What make Uber unique is that the data science driven insights don’t just stay within the dashboards or company reports but they are implemented in real-time into its to create a positive user experience for customers and drivers.
Data Products at Uber - Surge Pricing
To create the most efficient market and maximize the number of rides it can provide –Uber uses surge pricing. You are running late and stressed enough to take the public transport, Uber could come to your rescue, and however you soon notice that they will charge you 1.5 times more than the usual rate.
Sometimes when you try to book an Uber, and what you thought would be a $10 ride is going to be 2 or 3 or even 4 times more – this is due to the surge pricing algorithms that Uber implements behind the scenes. Data Science is at the heart of Uber’s surge pricing algorithm. Given a certain demand, what is the right price for a car based on the economic conditions. The king of ride sharing service maintains the surge pricing algorithm to ensure that their passengers always get a ride when they need one even if it comes at the cost of inflated price. Uber has even applied for a patent on big data informed pricing i.e. surge pricing.
Most of the predictive models at Uber follow the business logic on how pricing decisions are made. For instance, the Geosurge (name for surge pricing or dynamic pricing model at Uber) looks at the data available and then compares theoretical ideals with what is actually implemented in the real world. Uber’s surge pricing model is based on both geo-location and demand (for a ride) to position drivers efficiently. Data science methodologies are extensively used to analyse the short term effects of surge pricing on customer demand and long term effects of surge pricing on retaining customers. Uber depends on regression analysis to find out which neighbourhoods will be the busiest so it can activate surge pricing to get more drivers on the roads.
Uber recently announced that it’s going to limit the use of surge pricing through machine learning. The machine learning algorithms will take multiple data inputs and predict where the highest demand is going to be so that Uber drivers can be redirected there. This will ensure that there is no supply and demand shortage so that it does not have to actually implement surge pricing. Uber has not yet confirmed as to when this new system with smart machine learning algorithms would be rolled out to reduce surge pricing.
Matching Algorithms at Uber
Timing is everything at Uber. Given a pickup location, drop off location and time of the day, predictive models developed at Uber predict how long will it take for a driver to cover the distance. Uber has sophisticated routing and matching algorithms that direct cars to people and people to places. Right from the time you open the uber app till you reach your destination, Uber’s routing engine and matching algorithms are hard at work.
Uber follows a supplier pick map matching algorithm where the customer selects the variables associated with a service (in this case Uber app) and makes a match by sending requests to the most optimal list of service providers. Any Uber ride request is first sent to the nearest available Uber driver (the nearest available Uber driver is determined by comparing the customer location with the expected time of arrival of the driver). The Uber driver then accepts or rejects a ride request. This matching algorithm works well for Uber since the transaction is highly commoditized i.e. the number of variables that the customer has to decide before a match is made are minimal.
Uber uses a mixture of internal and external data to estimate fares. Uber calculates fares automatically using street traffic data, GPS data and its own algorithms that make alterations based on the time of the journey. It also analyses external data like public transport routes to plan various services.
Uber Data Science Tools
Python is the go-to data science programming language at Uber and is extensively used by the Uber data team. Commonly used third party modules to do data science at Uber include NumPy, SciPy, Matplotlib and Pandas. Uber data team does use R programming language, Octave or Matlab occasionally for prototypes or one-off data science projects and not for production stack. D3 is the most preferred data visualization tool at Uber and Postgres, the most preferred SQL framework.
What can you expect in future from Uber‘s data driven methodologies?
With initiatives like UberFresh for grocery deliveries, UberRush for package courier service and UberChopper offering helicopter rides to the wealthy-Uber is all set to revolutionize private transportation globally. Uber knows the popular nightclubs in the city, best in class restaurants and has data about traffic patterns across different regions. Uber’s data would be soon be combined with customer specific personal data in exchange of benefits making Uber the big Big Data Company. Soon, citizens would not mind sharing their SSN with Uber if they use your data to book a restaurant for a romantic dinner date on Valentine’s Day that has good live music and arrange a pick up for you and your wife in a luxury car.
So the next time on your “Uber” ride experience, do think of some data science that is going behind the scenes. The quality of service that you are enjoying is the due to the big data being analysed and data science being applied, to create a better riding experience for you.