What is a cbow model and when to use it? As we have discussed earlier only about Word2vec and CBOW comes under Word2Vec. CBOW (Continuous Bag of Words) which predicts the current word given the context of words within a specific window. The output layer containing the current word and the input layer contains context words. The other layer which is called a hidden layer it contains the number of dimensions where we want to represent the current word present at the output layer.
from nltk.tokenize import sent_tokenize, word_tokenize
warnings.filterwarnings(action = 'ignore')
from gensim.models import Word2Vec
Here we have imported the necessary packages along with the warnings and kept them as ignore because we know that there might be some warnings coming up when we run our program, but that can be ignored.
sample = open("/content/alice_in_wonderland.txt", "r")
s = sample.read()
f = s.replace("\n", " ")
data = 
for i in sent_tokenize(f):
temp = 
for j in word_tokenize(i):
Here we are taking a list as variable named data which is initially empty, after that we are going to take a for loop which will iterate through each sentence present in the text file, and the second for loop will tokenize the sentences into words.
model1 = gensim.models.Word2Vec(data, min_count = 1, size = 100, window = 5)
print("Cosine similarity between 'alice' " + "and 'wonderland' - CBOW : ", model1.similarity('alice', 'wonderland'))
print("Cosine similarity between 'alice' " + "and 'machines' - CBOW : ", model1.similarity('alice', 'machines'))
Cosine similarity between 'alice' and 'wonderland' - CBOW : 0.99817955 Cosine similarity between 'alice' and 'machines' - CBOW : 0.9881186