Data Science Project-Movie Review Sentiment Analysis using R

Data Science Project-Movie Review Sentiment Analysis using R

Learn to classify the sentiment of sentences from the Rotten Tomatoes dataset. You will be asked to label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive.

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

What will you learn

Understanding the problem statement
Importing the dataset and unzipping a zipped file
Loading all the necessary libraries for NLP
What is a Bag of Word model
Tokenization, N-grams and Splitting
Difference between Stemming and Lemmatization
What is part of speech tagging
VNegterms, Negterms, and VPOSterms
Installing packages for Naive Bayes and SVM
What are a sparse matrix and its application
Random Sampling
Converting from Word to Vector for prediction
Applying Naive Bayes for training model and making predictions
Applying SVM for training model and making predictions
Making predictions for test dataset

Project Description

With the increasing usage of Social Media such as Twitter and review websites like yelp and rotten tomatoes, it has become important to glean insights from the huge amounts of subjective opinionated data. The Rotten Tomatoes movie review dataset is a corpus of movie reviews used for sentiment analysis, originally collected by Pang and Lee. In their work on sentiment treebanks, Socher et al. used Amazon's Mechanical Turk to create fine-grained labels for all parsed phrases in the corpus. You will get a chance to benchmark your sentiment-analysis ideas on the Rotten Tomatoes dataset. You are asked to label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive. Obstacles like sentence negation, sarcasm, terseness, language ambiguity, and many others make this data science project  challenging.

Similar Projects

In this data science project, you will learn to predict churn on a built-in dataset using Ensemble Methods in R.

Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Curriculum For This Mini Project

30-Jan-2016
05h 45m