|Reader's Choice: The topic for this article has been recommended by one of our Blog subscribers.|
With more than-
-165 million active users as of August 2015;
-10+ million logins every day;
-13 million transactions;
- Processing more than 1.1 PB of data;
- $250 billion worth of payments processed every year;
-12.5 million worth payments processed everyday across 203 markets;
-Holding funds in 26 different currencies;
- An average of 23 transactions per PayPal account;
PayPal is ranked among the top 5 Payment Companies by Market Capitalization and is becoming the go-to place for payment transaction processing on the Web. For the leading payment network - PayPal, Big Data is an asset and is used for serious business strategies. Big Data Analytics and Data Science is at the heart of all this processing in the 17-year-old PayPal. PayPal owes its increasing market share and growth to its powerful data technology that drives innovation and overall business strategy. Here’s a big data use case into - how PayPal uses big data analytics and data science techniques to enrich customer experiences every day.
PayPal makes shopping a comfortable task by processing payments of auction websites and vendors through cloud computing, in a safe and secure manner. For PayPal the real consumers are the merchants and every customer of the merchant is indirectly a consumer of PayPal. PayPal provides advanced predictive capabilities to help its merchants improve their customer experience.
Let us suppose that Walmart has a wonderful Thanksgiving Sale and they would like to ensure that their marketing dollars are spent effectively, by having more relevant or delightful offers to advertise for their consumers. PayPal’s data science team can help them create the list and target people in a better way as it has one awesome component with it and that is the secret to its success – “Transactional Data”! Transaction Data is the solidest driving factor that helps data scientists predict people’s buying behaviour patterns. It also has online data - like how many people looked at a product, which website they visited, etc. but transactional data remains the strongest pointer in predicting customer behaviour at PayPal.
If you would like more information about Big Data careers, please click the orange "Request Info" button on top of this page.
How PayPal uses Hadoop?
Before the advent of Hadoop, PayPal just let all the data go, as it was difficult to catch-all schema types on traditional databases. Now, PayPal processes everything just through Hadoop and HBase - regardless of the data format. PayPal is tying a strong knot between traditional databases and Hadoop to become a better service provider for its customers.
Hadoop coexists with traditional data platforms at PayPal to meet various business requirements like customer sentiment analysis, fraud detection and market segmentation. The coexistence of Hadoop with traditional data platforms, helps data scientists run exploratory queries for hypothesis testing and research on the data stored in Hadoop, whereas BI analysts can find answers to their reporting questions - using in-memory systems like SAP HANA. PayPal expands its Hadoop usage into HBase to leverage HDFS. HDFS also acts as the storage layer for HBase for reading and writing - to large unstructured datasets.
PayPal uses Hadoop as a complementary and cost effective data platform for handling exponentially growing volumes and varieties of data. Considering the fact that Hadoop lacks advanced security measures, PayPal institutes strong governance and security policies by anonymizing all data before it is stored into Hadoop.
At PayPal the raw clickstream data is processed in Hadoop through a cleaning phase. PayPal uses semi-structured data in Hadoop, for predetermined business intelligence and big data analytics projects and stores it in the cloud - so that PayPal employees across the globe can access it. It collects more than 20 terabytes of log data every day for sentiment analysis, event analytics, customer segmentation, recommendation engine and sending out real-time location based offers.
PayPal’s data mining systems are built on machine learning algorithms that are written in Java and Python and run on top of Hadoop to mine complex data models for valuable insights.
Data Science – Helping Fight Fraud at PayPal
With a 17 year track record of secure online payments, PayPal is continually improving its data infrastructure to identify potential cases of fraud. PayPal is building new and modifying the existing fraud analytics system by incorporating various open source technologies like Hadoop and Spark, applying machine learning algorithms, online caching and human detectives. Once the machine learning models identify the possibility of a fraud, human detectives get to work - to find out what is real and what is not.
“Many times commercial software doesn’t meet our needs completely, so, in this case, open source really comes in handy. We are able to take them and do all kinds of adjustments ourselves. That really unleashed the power of our data scientists.”- said Hui Wang, PayPal’s senior director of global risk sciences.
Interested to know how much a data scientist at PayPal earns?
CLICK HERE to get the Data Scientist Salary Report for 2016 delivered to your inbox!
PayPal uses 3 types of Machine Learning algorithms –Neural Networks, Deep Learning and Linear Regression. Risk Management should be superfast and the algorithms have to make a decision in milliseconds on a whether it is a good transaction or bad one trying to engage in fraudulent activities. If a person is identified as good and trustworthy, he is put into an express lane for performing the transaction to make it a satisfactory experience. However, if the algorithm identifies a bad transaction, then it slows down the system to acquire additional data and perform in-depth analysis.
The data science team at PayPal analyses historical payment data to find out features that indicate an attempted scam. Different types of Machine learning algorithms analyse 1000’s of data points in real- time like - the buying history, recent activity on the merchant’s website or the PayPal site, data stored in cookies, buying history, etc. 300 variables are calculated per event for some of the machine learning models to find a potential fraudulent transaction. The results of analysis are compared with external data provided by authentication providers. For example, if the analysis shows multiple IP addresses from different locations across the globe, for a single account – then it is probably an indication that the account is hacked and it is flagged for review by human experts.
Fraud Detection is the biggest big data use case for Graph Processing at PayPal. The data scientists configure the nodes in a graph, to relate to the devices the customers use, to login to merchant accounts. If a customer uses a different IP address or a different mobile account, then PayPal ensures that they cannot draw off money from that account. There is not just one node in the graph that helps detect fraud, but 3 or 4 nodes try to withdraw and deposit money at the same time and all these are captured.
Using Advanced Big Data Analytics to Deliver Relevant Offers and Personalized Ads
PayPal addresses multi-channel communication happening on tablets, smartphones, in-store and on websites by enticing customers with location based advertisements and offers. As customers today shop in multiple ways – mobile, website and in-store, it is difficult for marketers and advertisers to decide - which is the best bet for placing personalized ads and relevant offers. PayPal is leveraging big data to send relevant customized offers and discounts from merchants to customers. The analytic algorithms use past-purchase history based on the medium of shopping - online or in-app to recommend offers that help customers save money and drive higher transaction volumes for merchants. PayPal incorporates big data analytics to tie customer preferences and tastes, location, purchase history and user activity across various sites, to send relevant offers and discounts along with personalized ads.
PayPal uses data from similar customers to predict the buying behaviour of its customers. The data models look for similar places customers visit. PayPal knows that customers who shop at Home Depot are likely to eat at Subway. Using this analytic insight, PayPal offers discounts on sandwiches at Subway outlets near Home Depot locations.
Transactional data is used to create customer genome sequences which create look-alikes and segment customers into different groups that help create strong signals for personalization, targeted advertising and recommendations. Predictive data models at PayPal make predictions with 69% accuracy on where their customers are likely to spend money.
Enriching Customer Experience through Hadoop-Based Text Mining
Natural Language Processing algorithms are behind the scenes of PayPal’s success - in enriching customer experience. Textual data alone cannot provide business insights but when used along with other data it helps extract meaning from the conversations people have online or from the transactions people do. Hadoop based text mining is an integral component for predictive modelling, clustering, influence scoring, topic modelling and diverse data science activities at PayPal.
When a customer makes a purchase with a particular merchant, it is hard to predict whether the customer likes the product he/she is buying or loves the brand of the product. Hadoop based text mining on product information helps data scientists understand whether a customer likes a particular brand or not and then use this information to suggest recommendations.
PayPal is extensively leveraging customer’s transactional data and search data to enhance their predictive capabilities to surprising levels. If you have come across any other interesting big data use cases at PayPal or any other leading payment companies-share with us in comments below!