How does BertTokenizer work in transformers?

This recipe explains how does BertTokenizer work in transformers.
Last Updated: 08 Aug 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective - How does BertTokenizer work in transformers?

Subword tokenization methods work on the idea that common words should not be broken down into smaller subwords, but rare words should be broken down into meaningful subwords.

Access Avocado Machine Learning Project for Price Prediction

For more related projects -

/projects/data-science-projects/neural-network-projects
/projects/data-science-projects/tensorflow-projects

Example of BertTokenizer:

# Importing BertTokenizer from transformers import BertTokenizer bert_tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") # Passing input bert_tokenizer.tokenize("Welcome to Transformers tutorials!!!")

Output - 
['welcome', 'to', 'transformers', 'tutor', '##ials', '!', '!', '!']

The sentence was lowercased first because we're using the uncased model. We can see that the words ["welcome", "to", "transformers"] are present in the tokenizer’s vocabulary, but the word "tutorials" is not. Consequently, the tokenizer splits "tutorials" into known subwords: ["tutor" and "##ials"]. The symbol "##" indicates that the remainder of the token should be connected to the previous one without leaving any gap (for decoding or reversal of the tokenization).

In this way, we can perform BertTokenizer in transformers.

What Users are saying..

Abhinav Agarwal

Graduate Student at Northwestern University

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

MLOps using Azure Devops to Deploy a Classification Model

In this MLOps Azure project, you will learn how to deploy a classification machine learning model to predict the customer's license status on Azure through scalable CI/CD ML pipelines.

View Project Details

Ecommerce product reviews - Pairwise ranking and sentiment analysis

This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

View Project Details

How does BertTokenizer work in transformers?

Recipe Objective - How does BertTokenizer work in transformers?

Abhinav Agarwal

Relevant Projects

You might also like

Relevant Projects