How to use count vectorizer?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to use count vectorizer?

How to use count vectorizer?

This recipe helps you use count vectorizer

Recipe Objective

How to use count vectorizer? Count Vectorizer is used to convert documents, text into vectors of term or token counts, it involves counting the number of occurences of words appears in a document.

for e.g "I want to go to the park and play the sea-saw".

I - 1

want - 1

to - 2

go - 1

the - 2

park - 1

and - 1

play - 1

sea-saw - 1

So, from the above example we can see it will count the occurences of wordsn appearing in the text. Lets understand with an practical example

Step 1 - Import necessary libraries

import pandas as pd from sklearn.feature_extraction.text import CountVectorizer

Step 2 - Take Sample Data

data1 = "I'm designing a document and don't want to get bogged down in what the text actually says" data2 = "I'm creating a template with various paragraph styles and need to see what they will look like." data3 = "I'm trying to learn more about some feature of Microsoft Word and don't want to practice on a real document."

Step 3 - Convert Sample Data into DataFrame using pandas

df1 = pd.DataFrame({'First_Para': [data1], 'Second_Para': [data2], 'Third_Para': [data2]})

Step 4 - Initialize the Vectorizer

count_vectorizer = CountVectorizer() doc_vec = count_vectorizer.fit_transform(df1.iloc[0])

Here we have initialized the vectorizer and fit & transformed the data

Step 5 - Convert the transformed Data into a DataFrame.

df2 = pd.DataFrame(doc_vec.toarray().transpose(), index=vectorizer.get_feature_names())

Step 6 - Change the Column names and print the result

df2.columns = df1.columns print(df2)
           First_Para  Second_Para  Third_Para
actually            1            0           0
and                 1            1           1
bogged              1            0           0
creating            0            1           1
designing           1            0           0
document            1            0           0
don                 1            0           0
down                1            0           0
get                 1            0           0
in                  1            0           0
like                0            1           1
look                0            1           1
need                0            1           1
paragraph           0            1           1
says                1            0           0
see                 0            1           1
styles              0            1           1
template            0            1           1
text                1            0           0
the                 1            0           0
they                0            1           1
to                  1            1           1
various             0            1           1
want                1            0           0
what                1            1           1
will                0            1           1
with                0            1           1

Relevant Projects

Churn Prediction in Telecom using Machine Learning in R
Estimating churners before they discontinue using a product or service is extremely important. In this ML project, you will develop a churn prediction model in telecom to predict customers who are most likely subject to churn.

Abstractive Text Summarization using Transformers-BART Model
Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

Build a Face Recognition System in Python using FaceNet
In this deep learning project, you will build your own face recognition system in Python using OpenCV and FaceNet by extracting features from an image of a person's face.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Census Income Data Set Project - Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based on census data.

Expedia Hotel Recommendations Data Science Project
In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.