What is a vectorizer?

What is a vectorizer?

What is a vectorizer?

This recipe explains what is a vectorizer


Recipe Objective

What is a Vectorizer? Vectorization is the process of converting words into numbers is called Vectorization, It is a methodology in NLP to map words or phrases from vocabulary to a corresponding vector of real numbers which is used to find word predictions, similarities etc.

The vectorization is used in use case like:

Text classification

Compute Similar words

Document Clustering / Grouping

Natural language Processing (NLP)

feature extraction in Text Classification.

lets see a example of vectorizer by using Count Vectorizer

Step 1 - Import the necessary libraries

import pandas as pd from sklearn.feature_extraction.text import CountVectorizer

Step 2 - Store the Count vectorizer in a variable

Count_vect = CountVectorizer()

Step 3 - Take the Sample text

text1 = "jack wants to play football" text2 = "Heena also loves to play football"

Step 4 - fit_transform the text and get the feature names

vectors = Count_vect.fit_transform([text1, text2]) feature_names = Count_vect.get_feature_names()

Step 5 - Convert the vectors to dense

dense = vectors.todense()

Step 6 - Convert the dense to list and then to DataFrame

denselist = dense.tolist() df = pd.DataFrame(denselist, columns=feature_names)

Step 7 - Print the output

	also	football	heena	jack	loves	play	to	wants
0	0	1		0	1	0	1	1	1
1	1	1		1	0	1	1	1	0

Relevant Projects

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.