What are multimodal models in transformers?

This recipe explains what are multimodal models in transformers.

Recipe Objective - What are multimodal models in transformers?

Multimodal models combine text and other types of input (such as graphics, images etc.) and are more task-specific. One multimodal model in the collection has not been pre-trained in the same self-supervised manner as the others.

Build a Multi Touch Attribution Model in Python with Source Code

Type of multimodal model:
MMBT:

In multimodal environments, a transformers model is used to create predictions by merging text and image. The embeddings of the tokenized text and the final activations of a pretrained on pictures resnet (after the pooling layer) that travels via a linear layer are fed into the transformer model (to go from number of features at the end of the resnet to the hidden state dimension of the transformer). The different inputs are combined, and a segment embedding is added on top of the positional embeddings to tell the model which part of the input vector relates to the text and which to the image. Only classification is possible with the pretrained model.

For more related projects -

/projects/data-science-projects/deep-learning-projects
/projects/data-science-projects/keras-deep-learning-projects

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

NLP Project for Multi Class Text Classification using BERT Model
In this NLP Project, you will learn how to build a multi-class text classification model using using the pre-trained BERT model.

Recommender System Machine Learning Project for Beginners-4
Collaborative Filtering Recommender System Project - Comparison of different model based and memory based methods to build recommendation system using collaborative filtering.

Linear Regression Model Project in Python for Beginners Part 2
Machine Learning Linear Regression Project for Beginners in Python to Build a Multiple Linear Regression Model on Soccer Player Dataset.

Build a Text Classification Model with Attention Mechanism NLP
In this NLP Project, you will learn to build a multi class text classification model with attention mechanism.

Personalized Medicine: Redefining Cancer Treatment
In this Personalized Medicine Machine Learning Project you will learn to classify genetic mutations on the basis of medical literature into 9 classes.

Digit Recognition using CNN for MNIST Dataset in Python
In this deep learning project, you will build a convolutional neural network using MNIST dataset for handwritten digit recognition.

MLOps Project for a Mask R-CNN on GCP using uWSGI Flask
MLOps on GCP - Solved end-to-end MLOps Project to deploy a Mask RCNN Model for Image Segmentation as a Web Application using uWSGI Flask, Docker, and TensorFlow.

Expedia Hotel Recommendations Data Science Project
In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.