What is Causal Language Modeling in transformers?

This recipe explains what is Causal Language Modeling in transformers.

Recipe Objective - What is Causal Language Modeling in transformers?

The task of fitting a model to a corpus, which can be domain-specific, is known as language modeling. Language modeling versions, such as BERT with masked language modeling and GPT2 with causal language modeling, are used to train all popular transformers-based models.

Language modeling is also useful outside of pre-training, for example, to transform the model distribution in a specific domain: use a trained language model on a very large corpus and then fit it to data sets from news or scientific articles, such as LysandreJik / arxivnlp.

Learn How to Build a Multi Class Text Classification Model using BERT

Causal Language Modeling:

The task of predicting the token after a sequence of tokens is known as causal language modeling. In this case, the model is just concerned with the left context (tokens on the left of the mask).

For more related projects -

/projects/data-science-projects/tensorflow-projects
/projects/data-science-projects/keras-deep-learning-projects

Example of Causal Language Model using pipeline:

# Importing libraries
from transformers import AutoModelWithLMHead, AutoTokenizer, top_k_top_p_filtering
import torch
from torch import nn

# Creating tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelWithLMHead.from_pretrained("gpt2")

# Creating context for sequence
context_sequence = f"I have never watched anything like this, and it was"

# Applying tokenizer on sequence
tokens = tokenizer.encode(context_sequence, return_tensors="pt")

# Extracting logits of last hidden state
last_logits = model(tokens).logits[:, -1, :]

# Applying top k top p filtering
filter = top_k_top_p_filtering(last_logits, top_k=50, top_p=1.0)

# Finding probabilities using softmax function
probabilities = nn.functional.softmax(filter, dim=-1)

# Applying multinomial
final_token = torch.multinomial(probabilities, num_samples=1)

# Applying cat function
output = torch.cat([tokens, final_token], dim=-1)

# Decoding
answer = tokenizer.decode(output.tolist()[0])

# Printing answer
print(answer)

Output -
I have never watched anything like this, and it was amazing

In this way, we can perform causal language modeling in transformers.

What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

OpenCV Project to Master Advanced Computer Vision Concepts
In this OpenCV project, you will learn to implement advanced computer vision concepts and algorithms in OpenCV library using Python.

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

LLM Project to Build and Fine Tune a Large Language Model
In this LLM project for beginners, you will learn to build a knowledge-grounded chatbot using LLM's and learn how to fine tune it.

Build Portfolio Optimization Machine Learning Models in R
Machine Learning Project for Financial Risk Modelling and Portfolio Optimization with R- Build a machine learning model in R to develop a strategy for building a portfolio for maximized returns.

Build a Churn Prediction Model using Ensemble Learning
Learn how to build ensemble machine learning models like Random Forest, Adaboost, and Gradient Boosting for Customer Churn Prediction using Python

ML Model Deployment on AWS for Customer Churn Prediction
MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction

GCP MLOps Project to Deploy ARIMA Model using uWSGI Flask
Build an end-to-end MLOps Pipeline to deploy a Time Series ARIMA Model on GCP using uWSGI and Flask

Build a Credit Default Risk Prediction Model with LightGBM
In this Machine Learning Project, you will build a classification model for default prediction with LightGBM.

Word2Vec and FastText Word Embedding with Gensim in Python
In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

AWS MLOps Project for ARCH and GARCH Time Series Models
Build and deploy ARCH and GARCH time series forecasting models in Python on AWS .