What are multimodal models in transformers?

This recipe explains what are multimodal models in transformers.

Recipe Objective - What are multimodal models in transformers?

Multimodal models combine text and other types of input (such as graphics, images etc.) and are more task-specific. One multimodal model in the collection has not been pre-trained in the same self-supervised manner as the others.

Build a Multi Touch Attribution Model in Python with Source Code

Type of multimodal model:
MMBT:

In multimodal environments, a transformers model is used to create predictions by merging text and image. The embeddings of the tokenized text and the final activations of a pretrained on pictures resnet (after the pooling layer) that travels via a linear layer are fed into the transformer model (to go from number of features at the end of the resnet to the hidden state dimension of the transformer). The different inputs are combined, and a segment embedding is added on top of the positional embeddings to tell the model which part of the input vector relates to the text and which to the image. Only classification is possible with the pretrained model.

For more related projects -

/projects/data-science-projects/deep-learning-projects
/projects/data-science-projects/keras-deep-learning-projects

What Users are saying..

profile image

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd
linkedin profile url

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More

Relevant Projects

Digit Recognition using CNN for MNIST Dataset in Python
In this deep learning project, you will build a convolutional neural network using MNIST dataset for handwritten digit recognition.

Build Customer Propensity to Purchase Model in Python
In this machine learning project, you will learn to build a machine learning model to estimate customer propensity to purchase.

Build an End-to-End AWS SageMaker Classification Model
MLOps on AWS SageMaker -Learn to Build an End-to-End Classification Model on SageMaker to predict a patient’s cause of death.

Build a Hybrid Recommender System in Python using LightFM
In this Recommender System project, you will build a hybrid recommender system in Python using LightFM .

Build Regression (Linear,Ridge,Lasso) Models in NumPy Python
In this machine learning regression project, you will learn to build NumPy Regression Models (Linear Regression, Ridge Regression, Lasso Regression) from Scratch.

Build Time Series Models for Gaussian Processes in Python
Time Series Project - A hands-on approach to Gaussian Processes for Time Series Modelling in Python

End-to-End Speech Emotion Recognition Project using ANN
Speech Emotion Recognition using RAVDESS Audio Dataset - Build an Artificial Neural Network Model to Classify Audio Data into various Emotions like Sad, Happy, Angry, and Neutral

Recommender System Machine Learning Project for Beginners-3
Content Based Recommender System Project - Building a Content-Based Product Recommender App with Streamlit

Build a Face Recognition System in Python using FaceNet
In this deep learning project, you will build your own face recognition system in Python using OpenCV and FaceNet by extracting features from an image of a person's face.

Learn How to Build a Linear Regression Model in PyTorch
In this Machine Learning Project, you will learn how to build a simple linear regression model in PyTorch to predict the number of days subscribed.