What is tokenization?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

What is tokenization?

What is tokenization?

This recipe explains what is tokenization

0

Recipe Objective

What is Tokenization? Tokenization is the task of chopping the text into smaller peices which are called tokens, here the tokens can be either words, characters or subwords. There are different tokenizers with different functionality lets understand them one by one.

Step 1 - Sentence Tokenization, Import the sent_tokenize

from nltk.tokenize import sent_tokenize

These tokenizer Splitt the Sentences into Paragraphs.

Step 2 - Take a simple text and apply sentence tokenization on that

My_text = "Hello everyone, Welcome to the session. Now your going to study about tokenization !!" sent_tokenize(My_text)

['Hello everyone, Welcome to the session.', 'Now your going to study about tokenization !', '!']

Here we can see that, the sentence has been converted into a paragraph.

Step 3 - Word Tokenization, Import the word_tokenize

from nltk.tokenize import word_tokenize

Step 4 - Apply word tokenization on simple text

word_tokenize(My_text)
['Hello',
 'everyone',
 ',',
 'Welcome',
 'to',
 'the',
 'session',
 '.',
 'Now',
 'your',
 'going',
 'to',
 'study',
 'about',
 'tokenization',
 '!',
 '!']

From the above we can see that the sentence has been converted into words

Relevant Projects

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.