This recipe explains Skip gram with subwords models from word2vec

Explain Skip gram with subwords models from word2vec.

As we have discussed earlier about skip gram, which predicts the the surrounding context words within specific window given current word. The input layer contains the current word and the output layer contains the context words. The hidden layer contains the number of dimensions in which we want to represent current word present at the input layer. Subwords these are the woords which uses some letters of a subject. for e.g "gi","rl" are the subwords of "girl". Lets understand the skip gram with subword practically.

Step 1 - Install the required libraries

!pip install cython !pip install pyfasttext

Step 2 - Import the necessary libraries

from pyfasttext import FastText

Step 3 - load the sample dataset

sample = open("/content/alice_in_wonderland.txt", 'r') alice_data = sample.read()

Step 4 - load the model

model = FastText()

Step 5 - Train the model using skip gram

model.skipgram(input='alice_in_wonderland.txt', output='model', epoch=2, lr=0.7)

Step 6 - Get the subwords for some sample words

print("The subword for boy are:",model.get_all_subwords('boy'),'\n') print("The subword for girl are:",model.get_all_subwords('girl'),'\n')
The subword for boy are: ['boy', '', 'boy', 'boy>', 'oy>'] 
The subword for girl are: ['girl', '', 'gir', 'girl', 'girl>', 'irl', 'irl>', 'rl>']

