What is a vectorizer?

What is a vectorizer?

What is a vectorizer?

This recipe explains what is a vectorizer


Recipe Objective

What is a Vectorizer? Vectorization is the process of converting words into numbers is called Vectorization, It is a methodology in NLP to map words or phrases from vocabulary to a corresponding vector of real numbers which is used to find word predictions, similarities etc.

The vectorization is used in use case like:

Text classification

Compute Similar words

Document Clustering / Grouping

Natural language Processing (NLP)

feature extraction in Text Classification.

lets see a example of vectorizer by using Count Vectorizer

Step 1 - Import the necessary libraries

import pandas as pd from sklearn.feature_extraction.text import CountVectorizer

Step 2 - Store the Count vectorizer in a variable

Count_vect = CountVectorizer()

Step 3 - Take the Sample text

text1 = "jack wants to play football" text2 = "Heena also loves to play football"

Step 4 - fit_transform the text and get the feature names

vectors = Count_vect.fit_transform([text1, text2]) feature_names = Count_vect.get_feature_names()

Step 5 - Convert the vectors to dense

dense = vectors.todense()

Step 6 - Convert the dense to list and then to DataFrame

denselist = dense.tolist() df = pd.DataFrame(denselist, columns=feature_names)

Step 7 - Print the output

	also	football	heena	jack	loves	play	to	wants
0	0	1		0	1	0	1	1	1
1	1	1		1	0	1	1	1	0

Relevant Projects

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Human Activity Recognition Using Smartphones Data Set
In this deep learning project, you will build a classification system where to precisely identify human fitness activities.