What is Term frequency in pandas

This recipe explains what is Term frequency in pandas
Last Updated: 25 Jul 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

What is term frequency ? term frequency is nothing but the number of times a term is occuring in a document is its term frequency.

TF(A) = (Number of times term A occuring in a document) / (Total Number of terms in a Document) For e.g In a 100 words of document the term Apple is occuring 10 times then the term frequency of Apple is = 10/100 i.e 0.1

Build a Chatbot in Python from Scratch!

Step 1 - Import library and read the sample datase

import pandas as pd df = pd.read_csv("/content/drive/My Drive/Data sets/test.csv") df.head()

Here we have taken a Sample dataset from kaggle of twitter Sentimental Analysis which consist of all text data.

Step 2 - Taking only text column which is required and storing it into another DataFrame

df2 = df.iloc[:, 1:2] df2.head()

Step 3 - Import re

import re letters_only = re.sub("[^a-zA-Z]", " ", str(df2))

Now we are importing "re" for all non-letters in the data, It will search for all non letters present into the data and replace that non-letters with spaces

Step 4 - Import word_tokenizer and convert the text data into tokens

from nltk.tokenize import word_tokenize word_tokenize(letters_only)

Step 5 - Split the tokenizer data and store them in a DataFrame

letters = letters_only.split() df3 = pd.DataFrame(letters) df3.value_counts()

to         3
right      2
my         2
the        2
your       1
          ..
neverre    1
nephew     1
mindset    1
x          1
a          1
Length: 69, dtype: int64

Here we have splitted the tokens data and converted them into DataFrame Called df3, then we will see count for each word in the df3 Data like for how many times the word has been repeated.

Step 6 - Find out TF

result = df3.value_counts() / len(df3) Here by using the above formula for Term Frequency (TF), we have find out the TF for the data that we have taken and processed.

Step 7 - Print the result

print("The TF for each word in the data is:") print(result)

The TF for each word in the data is:
to         0.040541
right      0.027027
my         0.027027
the        0.027027
your       0.013514
             ...   
neverre    0.013514
nephew     0.013514
mindset    0.013514
x          0.013514
a          0.013514
Length: 69, dtype: float64

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More