How does XLNetTokenizer work in transformers?

This recipe explains how does XLNetTokenizer work in transformers.

Recipe Objective - How does XLNetTokenizer work in transformers?

Subword tokenization methods work on the idea that common words should not be broken down into smaller subwords, but rare words should be broken down into meaningful subwords.

For more related projects -

/projects/data-science-projects/keras-deep-learning-projects
/projects/data-science-projects/deep-learning-projects

Example of XLNetTokenizer:

# Importing XLNetTokenizer
from transformers import XLNetTokenizerFast
xlnet_tokenizer = XLNetTokenizerFast.from_pretrained("xlnet-base-cased")

# Passing input
xlnet_tokenizer.tokenize("Welcome to transformers tutorials!!!")

Output - 
['_Welcome', '_to', '_transform', 'ers', '_tutorial', 's', '!!!']

We can see that the words ["Welcome", "to", "tutorial"] are present in the tokenizer’s vocabulary, but the word "transformers" is not. Consequently, the tokenizer splits "transformers" into known subwords: ["transform" and "ers"]. The symbol "_" indicates the space.

In this way, we can perform XLNetTokenizer in transformers.

What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Deep Learning Project for Text Detection in Images using Python
CV2 Text Detection Code for Images using Python -Build a CRNN deep learning model to predict the single-line text in a given image.

AWS MLOps Project to Deploy a Classification Model [Banking]
In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

Loan Default Prediction Project using Explainable AI ML Models
Loan Default Prediction Project that employs sophisticated machine learning models, such as XGBoost and Random Forest and delves deep into the realm of Explainable AI, ensuring every prediction is transparent and understandable.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Time Series Project to Build a Multiple Linear Regression Model
Learn to build a Multiple linear regression model in Python on Time Series Data

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Build CI/CD Pipeline for Machine Learning Projects using Jenkins
In this project, you will learn how to create a CI/CD pipeline for a search engine application using Jenkins.

Classification Projects on Machine Learning for Beginners - 2
Learn to implement various ensemble techniques to predict license status for a given business.

Learn to Build a Siamese Neural Network for Image Similarity
In this Deep Learning Project, you will learn how to build a siamese neural network with Keras and Tensorflow for Image Similarity.