How does XLNetTokenizer work in transformers?

This recipe explains how does XLNetTokenizer work in transformers.
Last Updated: 29 Jun 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective - How does XLNetTokenizer work in transformers?

Subword tokenization methods work on the idea that common words should not be broken down into smaller subwords, but rare words should be broken down into meaningful subwords.

For more related projects -

/projects/data-science-projects/keras-deep-learning-projects
/projects/data-science-projects/deep-learning-projects

Example of XLNetTokenizer:

# Importing XLNetTokenizer from transformers import XLNetTokenizerFast xlnet_tokenizer = XLNetTokenizerFast.from_pretrained("xlnet-base-cased") # Passing input xlnet_tokenizer.tokenize("Welcome to transformers tutorials!!!")

Output - 
['_Welcome', '_to', '_transform', 'ers', '_tutorial', 's', '!!!']

We can see that the words ["Welcome", "to", "tutorial"] are present in the tokenizer’s vocabulary, but the word "transformers" is not. Consequently, the tokenizer splits "transformers" into known subwords: ["transform" and "ers"]. The symbol "_" indicates the space.

In this way, we can perform XLNetTokenizer in transformers.

What Users are saying..

Savvy Sahai

Data Science Intern, Capgemini

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Loan Eligibility Prediction using Gradient Boosting Classifier

This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

View Project Details

End-to-End Snowflake Healthcare Analytics Project on AWS-2

In this AWS Snowflake project, you will build an end to end retraining pipeline by checking Data and Model Drift and learn how to redeploy the model if needed

View Project Details

Predict Churn for a Telecom company using Logistic Regression

Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

View Project Details

Build an optimal End-to-End MLOps Pipeline and Deploy on GCP

Learn how to build and deploy an end-to-end optimal MLOps Pipeline for Loan Eligibility Prediction Model in Python on GCP

View Project Details

Time Series Analysis with Facebook Prophet Python and Cesium

Time Series Analysis Project - Use the Facebook Prophet and Cesium Open Source Library for Time Series Forecasting in Python

View Project Details

Build Portfolio Optimization Machine Learning Models in R

Machine Learning Project for Financial Risk Modelling and Portfolio Optimization with R- Build a machine learning model in R to develop a strategy for building a portfolio for maximized returns.

View Project Details

Predictive Analytics Project for Working Capital Optimization

In this Predictive Analytics Project, you will build a model to accurately forecast the timing of customer and supplier payments for optimizing working capital.

View Project Details

PyTorch Project to Build a GAN Model on MNIST Dataset

In this deep learning project, you will learn how to build a GAN Model on MNIST Dataset for generating new images of handwritten digits.

View Project Details

Word2Vec and FastText Word Embedding with Gensim in Python

In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

View Project Details

Skip Gram Model Python Implementation for Word Embeddings

Skip-Gram Model word2vec Example -Learn how to implement the skip gram algorithm in NLP for word embeddings on a set of documents.

View Project Details

How does XLNetTokenizer work in transformers?

Recipe Objective - How does XLNetTokenizer work in transformers?

Savvy Sahai

Relevant Projects

You might also like

Relevant Projects