What is Term frequency?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

What is Term frequency?

What is Term frequency?

This recipe explains what is Term frequency

0

Recipe Objective

What is term frequency ? term frequency is nothing but the number of times a term is occuring in a document is its term frequency.

TF(A) = (Number of times term A occuring in a document) / (Total Number of terms in a Document) For e.g In a 100 words of document the term Apple is occuring 10 times then the term frequency of Apple is = 10/100 i.e 0.1

Step 1 - Import library and read the sample datase

import pandas as pd df = pd.read_csv("/content/drive/My Drive/Data sets/test.csv") df.head()

Here we have taken a Sample dataset from kaggle of twitter Sentimental Analysis which consist of all text data.

Step 2 - Taking only text column which is required and storing it into another DataFrame

df2 = df.iloc[:, 1:2] df2.head()

Step 3 - Import re

import re letters_only = re.sub("[^a-zA-Z]", " ", str(df2))

Now we are importing "re" for all non-letters in the data, It will search for all non letters present into the data and replace that non-letters with spaces

Step 4 - Import word_tokenizer and convert the text data into tokens

from nltk.tokenize import word_tokenize word_tokenize(letters_only)

Step 5 - Split the tokenizer data and store them in a DataFrame

letters = letters_only.split() df3 = pd.DataFrame(letters) df3.value_counts()
to         3
right      2
my         2
the        2
your       1
          ..
neverre    1
nephew     1
mindset    1
x          1
a          1
Length: 69, dtype: int64

Here we have splitted the tokens data and converted them into DataFrame Called df3, then we will see count for each word in the df3 Data like for how many times the word has been repeated.

Step 6 - Find out TF

result = df3.value_counts() / len(df3) Here by using the above formula for Term Frequency (TF), we have find out the TF for the data that we have taken and processed.

Step 7 - Print the result

print("The TF for each word in the data is:") print(result)
The TF for each word in the data is:
to         0.040541
right      0.027027
my         0.027027
the        0.027027
your       0.013514
             ...   
neverre    0.013514
nephew     0.013514
mindset    0.013514
x          0.013514
a          0.013514
Length: 69, dtype: float64

Relevant Projects

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.