What is Term frequency?
# What is Term frequency?

## Recipe Objective

What is term frequency ? term frequency is nothing but the number of times a term is occuring in a document is its term frequency.

TF(A) = (Number of times term A occuring in a document) / (Total Number of terms in a Document) For e.g In a 100 words of document the term Apple is occuring 10 times then the term frequency of Apple is = 10/100 i.e 0.1

## Step 1 - Import library and read the sample datase

`import pandas as pd` `df = pd.read_csv("/content/drive/My Drive/Data sets/test.csv")` `df.head()`

Here we have taken a Sample dataset from kaggle of twitter Sentimental Analysis which consist of all text data.

## Step 2 - Taking only text column which is required and storing it into another DataFrame

`df2 = df.iloc[:, 1:2]` `df2.head()`

## Step 3 - Import re

`import re` `letters_only = re.sub("[^a-zA-Z]", ` ` " ", ` ` str(df2))`

Now we are importing "re" for all non-letters in the data, It will search for all non letters present into the data and replace that non-letters with spaces

## Step 4 - Import word_tokenizer and convert the text data into tokens

`from nltk.tokenize import word_tokenize` `word_tokenize(letters_only)`

## Step 5 - Split the tokenizer data and store them in a DataFrame

`letters = letters_only.split()` `df3 = pd.DataFrame(letters)` `df3.value_counts()`
```to         3
right      2
my         2
the        2
..
neverre    1
nephew     1
mindset    1
x          1
a          1
Length: 69, dtype: int64```

Here we have splitted the tokens data and converted them into DataFrame Called df3, then we will see count for each word in the df3 Data like for how many times the word has been repeated.

## Step 6 - Find out TF

`result = df3.value_counts() / len(df3)` Here by using the above formula for Term Frequency (TF), we have find out the TF for the data that we have taken and processed.

## Step 7 - Print the result

`print("The TF for each word in the data is:")` `print(result)`
```The TF for each word in the data is:
to         0.040541
right      0.027027
my         0.027027
the        0.027027
...
neverre    0.013514
nephew     0.013514
mindset    0.013514
x          0.013514
a          0.013514
Length: 69, dtype: float64```

