How to do text classification in nlp

This recipe helps you do text classification in nlp
Last Updated: 18 Jul 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

How to do text classification?

text classification is nothing but the process in which the text is assigned to particaular tagg or category depending upon its content. These classification can be used in real world problem for e.g Sentimental Analysis, Spam detection, Analyzing the Customer reviews and many more.

text classification classifiers can be used in organizing, structuring and categorizing for much as any type of text. The text from documents, medical studies and files and also all over the web. For this we are going to use Naive bayes classifier which is considered to be good for text classification.

NLP Techniques to Learn for your Next NLP Project

Step 1 - Import the necessary libraries

import pandas as pd from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn.metrics import accuracy_score, precision_score, recall_score

Step 2 - Read the sample data and print it

df = pd.read_csv('/content/Customer_reviews_data.csv', encoding='cp1252') df.head()

Here we have created a sample dataset of customer reviews, the dataset contains only 10 records in it.

Step 3 - Replace the text with numbers

df['new_Labels'] = df['Labels'].apply(lambda v: 1 if v=='Positive' else 0)

Here we have created a new column as "new_Labels" which contains the integer values of "Labels" column, for "Positive" we have replaced it with "1" and for "Negative" we have replaced it with "0".

df.head() df.tail()

Step 4 - Split the data into train and test

X_train, X_test, y_train, y_test = train_test_split(df['Customer_Reviews'], df['new_Labels'], random_state=1) vectorizer = CountVectorizer(strip_accents='ascii', token_pattern=u'(?ui)\\b\\w*[a-z]+\\w*\\b', lowercase=True, stop_words='english') X_train_cv = vectorizer.fit_transform(X_train) X_test_cv = vectorizer.transform(X_test)

Step 5 - Convert the Customer_Reviews into word count vectors

Word_frequency = pd.DataFrame(X_train_cv.toarray(), columns=vectorizer.get_feature_names()) top_words = pd.DataFrame(Word_frequency.sum()).sort_values(0, ascending=False) print(Word_frequency, '\n') print(top_words)

   bad  best  buy  dont  experience  good  money  product  quality  value
0    0     1    0     0           1     0      0        0        0      0
1    0     0    0     0           0     1      0        1        0      0
2    0     0    0     0           0     1      0        0        0      0
3    0     1    0     0           0     0      0        0        1      0
4    1     0    1     1           0     0      0        1        0      0
5    1     0    0     0           0     0      0        0        0      0
6    0     1    0     0           1     0      1        0        0      1 

            0
best        3
bad         2
experience  2
good        2
product     2
buy         1
dont        1
money       1
quality     1
value       1

Here in the above we have converted the Reviews into vectors, As the naive bayes classifier needs to be able to calculate how many times each word appears in each document and how many times it appears in each category. for Conversion we have used count vectorizer, and also you can see the word frequency and top words in the above.

Step 6 - Fit the model and make the predictions

naive_bayes = MultinomialNB() naive_bayes.fit(X_train_cv, y_train) predictions = naive_bayes.predict(X_test_cv)

Step 7 - Print the results

print('Accuracy score for Customer Reviews model is: ', accuracy_score(y_test, predictions), '\n') print('Precision scorefor Customer Reviews model is: ', precision_score(y_test, predictions), '\n')

Accuracy score for Customer Reviews model is:  0.6666666666666666

Precision score for Customer Reviews model is:  0.5

As these are the results based on a sample dataset that only have 10 records, but for more data it will give us more better results. Now we will understand what accuracy and precision score tell us:

Accuracy Score will tell us that out of all the identifications that we have made how many are correct.

Precision Score will tell us that out of all the positive/negative identification we made how many are correct.

What Users are saying..

Abhinav Agarwal

Graduate Student at Northwestern University

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Deploying Machine Learning Models with Flask for Beginners

In this MLOps on GCP project you will learn to deploy a sales forecasting ML Model using Flask.

View Project Details

Build a Multi Touch Attribution Machine Learning Model in Python

Identifying the ROI on marketing campaigns is an essential KPI for any business. In this ML project, you will learn to build a Multi Touch Attribution Model in Python to identify the ROI of various marketing efforts and their impact on conversions or sales..

View Project Details

How to do text classification in nlp

Recipe Objective

Step 1 - Import the necessary libraries

Step 2 - Read the sample data and print it

Step 3 - Replace the text with numbers

Step 4 - Split the data into train and test

Step 5 - Convert the Customer_Reviews into word count vectors

Step 6 - Fit the model and make the predictions

Step 7 - Print the results

Abhinav Agarwal

Relevant Projects

You might also like

Relevant Projects