Data Science Project-Movie Review Sentiment Analysis using R

Data Science Project-Movie Review Sentiment Analysis using R

Learn to classify the sentiment of sentences from the Rotten Tomatoes dataset. You will be asked to label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Hiren Ahir

Microsoft Azure SQL Sever Developer, BI Developer

I'm a Graduate student and came into the job market and found a university degree wasn't sufficient to get a good paying job. I aimed at hottest technology in the market Big Data but the word BigData... Read More

Sujit Singh

Data Engineer, SullivanCotter

This has been a motivating experience. This has helped me execute Pig Latin and Hive commands to solve data problems. They take special care in regards to answering any questions and doubts I had... Read More

What will you learn

Understanding the problem statement
Importing the dataset and unzipping a zipped file
Loading all the necessary libraries for NLP
What is a Bag of Word model
Tokenization, N-grams and Splitting
Difference between Stemming and Lemmatization
What is part of speech tagging
VNegterms, Negterms, and VPOSterms
Installing packages for Naive Bayes and SVM
What are a sparse matrix and its application
Random Sampling
Converting from Word to Vector for prediction
Applying Naive Bayes for training model and making predictions
Applying SVM for training model and making predictions
Making predictions for test dataset

Project Description

With the increasing usage of Social Media such as Twitter and review websites like yelp and rotten tomatoes, it has become important to glean insights from the huge amounts of subjective opinionated data. The Rotten Tomatoes movie review dataset is a corpus of movie reviews used for sentiment analysis, originally collected by Pang and Lee. In their work on sentiment treebanks, Socher et al. used Amazon's Mechanical Turk to create fine-grained labels for all parsed phrases in the corpus. You will get a chance to benchmark your sentiment-analysis ideas on the Rotten Tomatoes dataset. You are asked to label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive. Obstacles like sentence negation, sarcasm, terseness, language ambiguity, and many others make this data science project  challenging.

Similar Projects

In this project, we are going to work on Deep Learning using H2O to predict Census income.

In this machine learning project, you will build predictive models to identify wine preferences of people using physiochemical properties of wines and help restaurants recommend the right quality of wine to a customer.

In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Curriculum For This Mini Project

30-Jan-2016
05h 45m