How to use count vectorizer?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to use count vectorizer?

How to use count vectorizer?

This recipe helps you use count vectorizer

0

Recipe Objective

How to use count vectorizer? Count Vectorizer is used to convert documents, text into vectors of term or token counts, it involves counting the number of occurences of words appears in a document.

for e.g "I want to go to the park and play the sea-saw".

I - 1

want - 1

to - 2

go - 1

the - 2

park - 1

and - 1

play - 1

sea-saw - 1

So, from the above example we can see it will count the occurences of wordsn appearing in the text. Lets understand with an practical example

Step 1 - Import necessary libraries

import pandas as pd from sklearn.feature_extraction.text import CountVectorizer

Step 2 - Take Sample Data

data1 = "I'm designing a document and don't want to get bogged down in what the text actually says" data2 = "I'm creating a template with various paragraph styles and need to see what they will look like." data3 = "I'm trying to learn more about some feature of Microsoft Word and don't want to practice on a real document."

Step 3 - Convert Sample Data into DataFrame using pandas

df1 = pd.DataFrame({'First_Para': [data1], 'Second_Para': [data2], 'Third_Para': [data2]})

Step 4 - Initialize the Vectorizer

count_vectorizer = CountVectorizer() doc_vec = count_vectorizer.fit_transform(df1.iloc[0])

Here we have initialized the vectorizer and fit & transformed the data

Step 5 - Convert the transformed Data into a DataFrame.

df2 = pd.DataFrame(doc_vec.toarray().transpose(), index=vectorizer.get_feature_names())

Step 6 - Change the Column names and print the result

df2.columns = df1.columns print(df2)
           First_Para  Second_Para  Third_Para
actually            1            0           0
and                 1            1           1
bogged              1            0           0
creating            0            1           1
designing           1            0           0
document            1            0           0
don                 1            0           0
down                1            0           0
get                 1            0           0
in                  1            0           0
like                0            1           1
look                0            1           1
need                0            1           1
paragraph           0            1           1
says                1            0           0
see                 0            1           1
styles              0            1           1
template            0            1           1
text                1            0           0
the                 1            0           0
they                0            1           1
to                  1            1           1
various             0            1           1
want                1            0           0
what                1            1           1
will                0            1           1
with                0            1           1

Relevant Projects

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Human Activity Recognition Using Smartphones Data Set
In this deep learning project, you will build a classification system where to precisely identify human fitness activities.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.