What is tokenization?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

What is tokenization?

What is tokenization?

This recipe explains what is tokenization

0

Recipe Objective

What is Tokenization? Tokenization is the task of chopping the text into smaller peices which are called tokens, here the tokens can be either words, characters or subwords. There are different tokenizers with different functionality lets understand them one by one.

Step 1 - Sentence Tokenization, Import the sent_tokenize

from nltk.tokenize import sent_tokenize

These tokenizer Splitt the Sentences into Paragraphs.

Step 2 - Take a simple text and apply sentence tokenization on that

My_text = "Hello everyone, Welcome to the session. Now your going to study about tokenization !!" sent_tokenize(My_text)

['Hello everyone, Welcome to the session.', 'Now your going to study about tokenization !', '!']

Here we can see that, the sentence has been converted into a paragraph.

Step 3 - Word Tokenization, Import the word_tokenize

from nltk.tokenize import word_tokenize

Step 4 - Apply word tokenization on simple text

word_tokenize(My_text)
['Hello',
 'everyone',
 ',',
 'Welcome',
 'to',
 'the',
 'session',
 '.',
 'Now',
 'your',
 'going',
 'to',
 'study',
 'about',
 'tokenization',
 '!',
 '!']

From the above we can see that the sentence has been converted into words

Relevant Projects

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.