What is a skip gram model and when to use it? As we have discussed earlier only about Word2vec and Skip Gram comes under Word2Vec. Skip Gram which predicts the the surrounding context words within specific window given current word. The input layer contains the current word and the output layer contains the context words. The hidden layer contains the number of dimensions in which we want to represent current word present at the input layer.
from nltk.tokenize import sent_tokenize, word_tokenize
warnings.filterwarnings(action = 'ignore')
from gensim.models import Word2Vec
Here we have imported the necessary packages along with the warnings and kept it as ignore because we know that there might be some warnings comming up when we run our program, but that can be ignored.
sample = open("/content/alice_in_wonderland.txt", "r")
s = sample.read()
f = s.replace("\n", " ")
data = 
for i in sent_tokenize(f):
temp = 
for j in word_tokenize(i):
Here we are taking a list as variable named data which is initially empty, after that we are going take a for loop which will iterate through each sentences present in the text file, and the second for loop will tokenize the sentences into words.
model2 = gensim.models.Word2Vec(data, min_count = 1, size = 100, window = 5, sg = 1)
print("Cosine similarity between 'alice' " + "and 'wonderland' - Skip Gram : ", model2.similarity('alice', 'wonderland'))
print("Cosine similarity between 'alice' " + "and 'machines' - Skip Gram : ", model2.similarity('alice', 'machines'))
Cosine similarity between 'alice' and 'wonderland' - Skip Gram : 0.9486537 Cosine similarity between 'alice' and 'machines' - Skip Gram : 0.94141114