How to use count vectorizer?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to use count vectorizer?

How to use count vectorizer?

This recipe helps you use count vectorizer

0

Recipe Objective

How to use count vectorizer? Count Vectorizer is used to convert documents, text into vectors of term or token counts, it involves counting the number of occurences of words appears in a document.

for e.g "I want to go to the park and play the sea-saw".

I - 1

want - 1

to - 2

go - 1

the - 2

park - 1

and - 1

play - 1

sea-saw - 1

So, from the above example we can see it will count the occurences of wordsn appearing in the text. Lets understand with an practical example

Step 1 - Import necessary libraries

import pandas as pd from sklearn.feature_extraction.text import CountVectorizer

Step 2 - Take Sample Data

data1 = "I'm designing a document and don't want to get bogged down in what the text actually says" data2 = "I'm creating a template with various paragraph styles and need to see what they will look like." data3 = "I'm trying to learn more about some feature of Microsoft Word and don't want to practice on a real document."

Step 3 - Convert Sample Data into DataFrame using pandas

df1 = pd.DataFrame({'First_Para': [data1], 'Second_Para': [data2], 'Third_Para': [data2]})

Step 4 - Initialize the Vectorizer

count_vectorizer = CountVectorizer() doc_vec = count_vectorizer.fit_transform(df1.iloc[0])

Here we have initialized the vectorizer and fit & transformed the data

Step 5 - Convert the transformed Data into a DataFrame.

df2 = pd.DataFrame(doc_vec.toarray().transpose(), index=vectorizer.get_feature_names())

Step 6 - Change the Column names and print the result

df2.columns = df1.columns print(df2)
           First_Para  Second_Para  Third_Para
actually            1            0           0
and                 1            1           1
bogged              1            0           0
creating            0            1           1
designing           1            0           0
document            1            0           0
don                 1            0           0
down                1            0           0
get                 1            0           0
in                  1            0           0
like                0            1           1
look                0            1           1
need                0            1           1
paragraph           0            1           1
says                1            0           0
see                 0            1           1
styles              0            1           1
template            0            1           1
text                1            0           0
the                 1            0           0
they                0            1           1
to                  1            1           1
various             0            1           1
want                1            0           0
what                1            1           1
will                0            1           1
with                0            1           1

Relevant Projects

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.