What is Term frequency?

What is Term frequency?

What is Term frequency?

This recipe explains what is Term frequency


Recipe Objective

What is term frequency ? term frequency is nothing but the number of times a term is occuring in a document is its term frequency.

TF(A) = (Number of times term A occuring in a document) / (Total Number of terms in a Document) For e.g In a 100 words of document the term Apple is occuring 10 times then the term frequency of Apple is = 10/100 i.e 0.1

Step 1 - Import library and read the sample datase

import pandas as pd df = pd.read_csv("/content/drive/My Drive/Data sets/test.csv") df.head()

Here we have taken a Sample dataset from kaggle of twitter Sentimental Analysis which consist of all text data.

Step 2 - Taking only text column which is required and storing it into another DataFrame

df2 = df.iloc[:, 1:2] df2.head()

Step 3 - Import re

import re letters_only = re.sub("[^a-zA-Z]", " ", str(df2))

Now we are importing "re" for all non-letters in the data, It will search for all non letters present into the data and replace that non-letters with spaces

Step 4 - Import word_tokenizer and convert the text data into tokens

from nltk.tokenize import word_tokenize word_tokenize(letters_only)

Step 5 - Split the tokenizer data and store them in a DataFrame

letters = letters_only.split() df3 = pd.DataFrame(letters) df3.value_counts()
to         3
right      2
my         2
the        2
your       1
neverre    1
nephew     1
mindset    1
x          1
a          1
Length: 69, dtype: int64

Here we have splitted the tokens data and converted them into DataFrame Called df3, then we will see count for each word in the df3 Data like for how many times the word has been repeated.

Step 6 - Find out TF

result = df3.value_counts() / len(df3) Here by using the above formula for Term Frequency (TF), we have find out the TF for the data that we have taken and processed.

Step 7 - Print the result

print("The TF for each word in the data is:") print(result)
The TF for each word in the data is:
to         0.040541
right      0.027027
my         0.027027
the        0.027027
your       0.013514
neverre    0.013514
nephew     0.013514
mindset    0.013514
x          0.013514
a          0.013514
Length: 69, dtype: float64

Relevant Projects

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.