What is a skip gram model and when to use it?

What is a skip gram model and when to use it?

What is a skip gram model and when to use it?

This recipe explains what is a skip gram model and when to use it

Recipe Objective

What is a skip gram model and when to use it? As we have discussed earlier only about Word2vec and Skip Gram comes under Word2Vec. Skip Gram which predicts the the surrounding context words within specific window given current word. The input layer contains the current word and the output layer contains the context words. The hidden layer contains the number of dimensions in which we want to represent current word present at the input layer.

Step 1 - Import the necessary libraries

from nltk.tokenize import sent_tokenize, word_tokenize import warnings warnings.filterwarnings(action = 'ignore') import gensim from gensim.models import Word2Vec

Here we have imported the necessary packages along with the warnings and kept it as ignore because we know that there might be some warnings comming up when we run our program, but that can be ignored.

Step 2 - load the sample data

sample = open("/content/alice_in_wonderland.txt", "r") s = sample.read()

Step 3 - Replace the escape character with spaces

f = s.replace("\n", " ")

Step 4 - Iterate and tokenize

import nltk nltk.download('punkt') data = [] for i in sent_tokenize(f): temp = [] for j in word_tokenize(i): temp.append(j.lower()) data.append(temp)

Here we are taking a list as variable named data which is initially empty, after that we are going take a for loop which will iterate through each sentences present in the text file, and the second for loop will tokenize the sentences into words.

Step 5 - Create a Skip Gram model

model2 = gensim.models.Word2Vec(data, min_count = 1, size = 100, window = 5, sg = 1)

Step 6 - Print the result of Skip Gram model

print("Cosine similarity between 'alice' " + "and 'wonderland' - Skip Gram : ", model2.similarity('alice', 'wonderland')) print("Cosine similarity between 'alice' " + "and 'machines' - Skip Gram : ", model2.similarity('alice', 'machines'))
Cosine similarity between 'alice' and 'wonderland' - Skip Gram :  0.9486537
Cosine similarity between 'alice' and 'machines' - Skip Gram :  0.94141114

Relevant Projects

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Locality Sensitive Hashing Python Code for Look-Alike Modelling
In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

Multi-Class Text Classification with Deep Learning using BERT
In this project, we will cover in detail the architecture of a transformer used in natural language processing use cases. We will go through the key nlp areas in the pre-transformer stage like bow, word2vec...and then the origin and gradual refinement of transformers. Finally, we will study one of the most popular state of the art transformer models, called BERT and use it for text classification on a large dataset.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Medical Image Segmentation Deep Learning Project
In this deep learning project, you will learn to implement Unet++ models for medical image segmentation to detect and classify colorectal polyps.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Loan Eligibility Prediction in Python using H2O.ai
In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

Build OCR from Scratch Python using YOLO and Tesseract
In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.