What is tokenization in nltk

This recipe explains what is tokenization in nltk

Recipe Objective

What is Tokenization? Tokenization is the task of chopping the text into smaller peices which are called tokens, here the tokens can be either words, characters or subwords. There are different tokenizers with different functionality lets understand them one by one.

List of Classification Algorithms in Machine Learning

Step 1 - Sentence Tokenization, Import the sent_tokenize

from nltk.tokenize import sent_tokenize

These tokenizer Splitt the Sentences into Paragraphs.

Step 2 - Take a simple text and apply sentence tokenization on that

My_text = "Hello everyone, Welcome to the session. Now your going to study about tokenization !!" sent_tokenize(My_text)

['Hello everyone, Welcome to the session.', 'Now your going to study about tokenization !', '!']

Here we can see that, the sentence has been converted into a paragraph.

Step 3 - Word Tokenization, Import the word_tokenize

from nltk.tokenize import word_tokenize

Step 4 - Apply word tokenization on simple text

word_tokenize(My_text)

['Hello',
 'everyone',
 ',',
 'Welcome',
 'to',
 'the',
 'session',
 '.',
 'Now',
 'your',
 'going',
 'to',
 'study',
 'about',
 'tokenization',
 '!',
 '!']

From the above we can see that the sentence has been converted into words

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

AWS MLOps Project to Deploy a Classification Model [Banking]
In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

ML Model Deployment on AWS for Customer Churn Prediction
MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction

Create Your First Chatbot with RASA NLU Model and Python
Learn the basic aspects of chatbot development and open source conversational AI RASA to create a simple AI powered chatbot on your own.

Forecasting Business KPI's with Tensorflow and Python
In this machine learning project, you will use the video clip of an IPL match played between CSK and RCB to forecast key performance indicators like the number of appearances of a brand logo, the frames, and the shortest and longest area percentage in the video.

Azure Text Analytics for Medical Search Engine Deployment
Microsoft Azure Project - Use Azure text analytics cognitive service to deploy a machine learning model into Azure Databricks

Deploy Transformer BART Model for Text summarization on GCP
Learn to Deploy a Machine Learning Model for the Abstractive Text Summarization on Google Cloud Platform (GCP)

Credit Card Default Prediction using Machine learning techniques
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Image Segmentation using Mask R-CNN with Tensorflow
In this Deep Learning Project on Image Segmentation Python, you will learn how to implement the Mask R-CNN model for early fire detection.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

NLP Project for Multi Class Text Classification using BERT Model
In this NLP Project, you will learn how to build a multi-class text classification model using using the pre-trained BERT model.