What is a skip gram model and when to use it?

What is a skip gram model and when to use it?

What is a skip gram model and when to use it?

This recipe explains what is a skip gram model and when to use it


Recipe Objective

What is a skip gram model and when to use it? As we have discussed earlier only about Word2vec and Skip Gram comes under Word2Vec. Skip Gram which predicts the the surrounding context words within specific window given current word. The input layer contains the current word and the output layer contains the context words. The hidden layer contains the number of dimensions in which we want to represent current word present at the input layer.

Step 1 - Import the necessary libraries

from nltk.tokenize import sent_tokenize, word_tokenize import warnings warnings.filterwarnings(action = 'ignore') import gensim from gensim.models import Word2Vec

Here we have imported the necessary packages along with the warnings and kept it as ignore because we know that there might be some warnings comming up when we run our program, but that can be ignored.

Step 2 - load the sample data

sample = open("/content/alice_in_wonderland.txt", "r") s = sample.read()

Step 3 - Replace the escape character with spaces

f = s.replace("\n", " ")

Step 4 - Iterate and tokenize

import nltk nltk.download('punkt') data = [] for i in sent_tokenize(f): temp = [] for j in word_tokenize(i): temp.append(j.lower()) data.append(temp)

Here we are taking a list as variable named data which is initially empty, after that we are going take a for loop which will iterate through each sentences present in the text file, and the second for loop will tokenize the sentences into words.

Step 5 - Create a Skip Gram model

model2 = gensim.models.Word2Vec(data, min_count = 1, size = 100, window = 5, sg = 1)

Step 6 - Print the result of Skip Gram model

print("Cosine similarity between 'alice' " + "and 'wonderland' - Skip Gram : ", model2.similarity('alice', 'wonderland')) print("Cosine similarity between 'alice' " + "and 'machines' - Skip Gram : ", model2.similarity('alice', 'machines'))
Cosine similarity between 'alice' and 'wonderland' - Skip Gram :  0.9486537
Cosine similarity between 'alice' and 'machines' - Skip Gram :  0.94141114

Relevant Projects

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.