What is a skip gram model and when to use it?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

What is a skip gram model and when to use it?

What is a skip gram model and when to use it?

This recipe explains what is a skip gram model and when to use it

0

Recipe Objective

What is a skip gram model and when to use it? As we have discussed earlier only about Word2vec and Skip Gram comes under Word2Vec. Skip Gram which predicts the the surrounding context words within specific window given current word. The input layer contains the current word and the output layer contains the context words. The hidden layer contains the number of dimensions in which we want to represent current word present at the input layer.

Step 1 - Import the necessary libraries

from nltk.tokenize import sent_tokenize, word_tokenize import warnings warnings.filterwarnings(action = 'ignore') import gensim from gensim.models import Word2Vec

Here we have imported the necessary packages along with the warnings and kept it as ignore because we know that there might be some warnings comming up when we run our program, but that can be ignored.

Step 2 - load the sample data

sample = open("/content/alice_in_wonderland.txt", "r") s = sample.read()

Step 3 - Replace the escape character with spaces

f = s.replace("\n", " ")

Step 4 - Iterate and tokenize

import nltk nltk.download('punkt') data = [] for i in sent_tokenize(f): temp = [] for j in word_tokenize(i): temp.append(j.lower()) data.append(temp)

Here we are taking a list as variable named data which is initially empty, after that we are going take a for loop which will iterate through each sentences present in the text file, and the second for loop will tokenize the sentences into words.

Step 5 - Create a Skip Gram model

model2 = gensim.models.Word2Vec(data, min_count = 1, size = 100, window = 5, sg = 1)

Step 6 - Print the result of Skip Gram model

print("Cosine similarity between 'alice' " + "and 'wonderland' - Skip Gram : ", model2.similarity('alice', 'wonderland')) print("Cosine similarity between 'alice' " + "and 'machines' - Skip Gram : ", model2.similarity('alice', 'machines'))
Cosine similarity between 'alice' and 'wonderland' - Skip Gram :  0.9486537
Cosine similarity between 'alice' and 'machines' - Skip Gram :  0.94141114

Relevant Projects

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.