Data Science Project-Movie Review Sentiment Analysis using R

Data Science Project-Movie Review Sentiment Analysis using R

Learn to classify the sentiment of sentences from the Rotten Tomatoes dataset. You will be asked to label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Arvind Sodhi

VP - Data Architect, CDO at Deutsche Bank

I have extensive experience in data management and data processing. Over the past few years I saw the data management technology transition into the Big Data ecosystem and I needed to follow suit. I... Read More

Camille St. Omer

Artificial Intelligence Researcher, Quora 'Most Viewed Writer in 'Data Mining'

I came to the platform with no experience and now I am knowledgeable in Machine Learning with Python. No easy thing I must say, the sessions are challenging and go to the depths. I looked at graduate... Read More

What will you learn

Understanding the problem statement
Importing the dataset and unzipping a zipped file
Loading all the necessary libraries for NLP
What is a Bag of Word model
Tokenization, N-grams and Splitting
Difference between Stemming and Lemmatization
What is part of speech tagging
VNegterms, Negterms, and VPOSterms
Installing packages for Naive Bayes and SVM
What are a sparse matrix and its application
Random Sampling
Converting from Word to Vector for prediction
Applying Naive Bayes for training model and making predictions
Applying SVM for training model and making predictions
Making predictions for test dataset

Project Description

With the increasing usage of Social Media such as Twitter and review websites like yelp and rotten tomatoes, it has become important to glean insights from the huge amounts of subjective opinionated data. The Rotten Tomatoes movie review dataset is a corpus of movie reviews used for sentiment analysis, originally collected by Pang and Lee. In their work on sentiment treebanks, Socher et al. used Amazon's Mechanical Turk to create fine-grained labels for all parsed phrases in the corpus. You will get a chance to benchmark your sentiment-analysis ideas on the Rotten Tomatoes dataset. You are asked to label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive. Obstacles like sentence negation, sarcasm, terseness, language ambiguity, and many others make this data science project  challenging.

Similar Projects

In this machine learning project, we will predict which coupons a customer will buy.

In this machine learning project, you will build predictive models to identify wine preferences of people using physiochemical properties of wines and help restaurants recommend the right quality of wine to a customer.

Given a partial trajectory of a taxi, you will be asked to predict its final destination using the taxi trajectory dataset.

Curriculum For This Mini Project

30-Jan-2016
05h 45m