Explain what is a hashing vectorizer in nlp

This recipe explains what is a hashing vectorizer in nlp

Recipe Objective

what is a hashing vectorizer?

hashing vectorizer is a vectorizer which uses the hashing trick to find the token string name to feature integer index mapping. Conversion of text documents into matrix is done by this vectorizer where it turns the collection of documents into a sparse matrix which are holding the token occurence counts. Advantages for hashing vectorizer are:

As there is no need of storing the vocabulary dictionary in the memory, for large data sets it is very low memory scalable. As there in no state during the fit, it can be used in a streaming or parallel pipeline. And more.

Learn How to use XLNet for Text Classification

Step 1 - Import the necessary libraries

from sklearn.feature_extraction.text import HashingVectorizer

Step 2 - Take a Sample text

Sample_text = ["Jon is playing football.","He loves to play football.","He is just 10 years old.", "His favorite player is Cristiano Ronaldo."] print(Sample_text)

['Jon is playing football.', 'He loves to play football.', 'He is just 10 years old.', 'His favorite player is Cristiano Ronaldo.']

Step 3 - Save the vectorizer in a variable

My_vect = HashingVectorizer(n_features=2**4)

Step 4 - Fit the sample text into vectorizer

Fit_text = vectorizer.fit_transform(Sample_text)

Step 5 - Print the Results

print(Fit_text, '\n') print(Fit_text.shape)

  (0, 1)	0.5
  (0, 10)	0.5
  (0, 13)	0.5
  (0, 15)	-0.5
  (1, 3)	0.5773502691896258
  (1, 7)	0.5773502691896258
  (1, 10)	0.0
  (1, 11)	0.5773502691896258
  (2, 1)	-0.4082482904638631
  (2, 3)	0.4082482904638631
  (2, 4)	-0.4082482904638631
  (2, 5)	-0.4082482904638631
  (2, 8)	-0.4082482904638631
  (2, 13)	0.4082482904638631
  (3, 0)	0.5
  (3, 2)	-0.5
  (3, 9)	-0.5
  (3, 11)	-0.5
  (3, 13)	0.0 

(4, 16)

What Users are saying..

profile image

Ameeruddin Mohammed

ETL (Abintio) developer at IBM
linkedin profile url

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

NLP Project on LDA Topic Modelling Python using RACE Dataset
Use the RACE dataset to extract a dominant topic from each document and perform LDA topic modeling in python.

AWS MLOps Project to Deploy a Classification Model [Banking]
In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

Text Classification with Transformers-RoBERTa and XLNet Model
In this machine learning project, you will learn how to load, fine tune and evaluate various transformer models for text classification tasks.

Hands-On Approach to Master PyTorch Tensors with Examples
In this deep learning project, you will learn how to perform various operations on the building block of PyTorch : Tensors.

Build an End-to-End AWS SageMaker Classification Model
MLOps on AWS SageMaker -Learn to Build an End-to-End Classification Model on SageMaker to predict a patient’s cause of death.

Langchain Project for Customer Support App in Python
In this LLM Project, you will learn how to enhance customer support interactions through Large Language Models (LLMs), enabling intelligent, context-aware responses. This Langchain project aims to seamlessly integrate LLM technology with databases, PDF knowledge bases, and audio processing agents to create a comprehensive customer support application.

Build a Speech-Text Transcriptor with Nvidia Quartznet Model
In this Deep Learning Project, you will leverage transfer learning from Nvidia QuartzNet pre-trained models to develop a speech-to-text transcriptor.

Llama2 Project for MetaData Generation using FAISS and RAGs
In this LLM Llama2 Project, you will automate metadata generation using Llama2, RAGs, and AWS to reduce manual efforts.

Forecasting Business KPI's with Tensorflow and Python
In this machine learning project, you will use the video clip of an IPL match played between CSK and RCB to forecast key performance indicators like the number of appearances of a brand logo, the frames, and the shortest and longest area percentage in the video.

Linear Regression Model Project in Python for Beginners Part 1
Machine Learning Linear Regression Project in Python to build a simple linear regression model and master the fundamentals of regression for beginners.