How to use count vectorizer in nlp

This recipe helps you use count vectorizer in nlp

Recipe Objective

How to use count vectorizer? Count Vectorizer is used to convert documents, text into vectors of term or token counts, it involves counting the number of occurences of words appears in a document.

for e.g "I want to go to the park and play the sea-saw".

I - 1

want - 1

to - 2

go - 1

the - 2

park - 1

and - 1

play - 1

sea-saw - 1

So, from the above example we can see it will count the occurences of wordsn appearing in the text. Lets understand with an practical example

NLP Techniques to Learn for your Next NLP Project

Step 1 - Import necessary libraries

import pandas as pd from sklearn.feature_extraction.text import CountVectorizer

Step 2 - Take Sample Data

data1 = "I'm designing a document and don't want to get bogged down in what the text actually says" data2 = "I'm creating a template with various paragraph styles and need to see what they will look like." data3 = "I'm trying to learn more about some feature of Microsoft Word and don't want to practice on a real document."

Step 3 - Convert Sample Data into DataFrame using pandas

df1 = pd.DataFrame({'First_Para': [data1], 'Second_Para': [data2], 'Third_Para': [data2]})

Step 4 - Initialize the Vectorizer

count_vectorizer = CountVectorizer() doc_vec = count_vectorizer.fit_transform(df1.iloc[0])

Here we have initialized the vectorizer and fit & transformed the data

Step 5 - Convert the transformed Data into a DataFrame.

df2 = pd.DataFrame(doc_vec.toarray().transpose(), index=vectorizer.get_feature_names())

Step 6 - Change the Column names and print the result

df2.columns = df1.columns print(df2)

           First_Para  Second_Para  Third_Para
actually            1            0           0
and                 1            1           1
bogged              1            0           0
creating            0            1           1
designing           1            0           0
document            1            0           0
don                 1            0           0
down                1            0           0
get                 1            0           0
in                  1            0           0
like                0            1           1
look                0            1           1
need                0            1           1
paragraph           0            1           1
says                1            0           0
see                 0            1           1
styles              0            1           1
template            0            1           1
text                1            0           0
the                 1            0           0
they                0            1           1
to                  1            1           1
various             0            1           1
want                1            0           0
what                1            1           1
will                0            1           1
with                0            1           1

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Skip Gram Model Python Implementation for Word Embeddings
Skip-Gram Model word2vec Example -Learn how to implement the skip gram algorithm in NLP for word embeddings on a set of documents.

Avocado Machine Learning Project Python for Price Prediction
In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.

Deploy Transformer-BART Model on Paperspace Cloud
In this MLOps Project you will learn how to deploy a Tranaformer BART Model for Abstractive Text Summarization on Paperspace Private Cloud

Locality Sensitive Hashing Python Code for Look-Alike Modelling
In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

Deploy Transformer BART Model for Text summarization on GCP
Learn to Deploy a Machine Learning Model for the Abstractive Text Summarization on Google Cloud Platform (GCP)

End-to-End Snowflake Healthcare Analytics Project on AWS-2
In this AWS Snowflake project, you will build an end to end retraining pipeline by checking Data and Model Drift and learn how to redeploy the model if needed

Build a Speech-Text Transcriptor with Nvidia Quartznet Model
In this Deep Learning Project, you will leverage transfer learning from Nvidia QuartzNet pre-trained models to develop a speech-to-text transcriptor.

PyCaret Project to Build and Deploy an ML App using Streamlit
In this PyCaret Project, you will build a customer segmentation model with PyCaret and deploy the machine learning application using Streamlit.

Build a Face Recognition System in Python using FaceNet
In this deep learning project, you will build your own face recognition system in Python using OpenCV and FaceNet by extracting features from an image of a person's face.

Credit Card Default Prediction using Machine learning techniques
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.