What is Information Value in modelling

This recipe explains what is Information Value in modelling

Recipe Objective

What is Information Value in modelling?

Logistic Regression is a classification type supervised learning model. Logistic Regression is used when the independent variable x, can be a continuous or categorical variable, but the dependent variable (y) is a categorical variable. Information value is an important technique which is used to determine and select the important variables in a predictive model. It helps to rank variables on the basis of their importance. Syntax: IV = (% of non-events — % of events) * WOE where, WOE — Weight of evidence.... WOE = In (% of non-events % of events) The importance of the variables are decided according to some rules like: If, IV: Less than 0.02 – Not useful for prediction IV: 0.02 to 0.1 — Weak predictive power IV : 0.1 to 0.3 — Medium predictive Power IV : 0.3 to 0.5 — Strong predictive power IV: >0.5 — Suspicious Predictive Power This recipe demonstrates an example of Information Value in modelling in R.

Learn How to Build a Multi Class Text Classification Model using BERT

Step 1 - Install necessary libraries

install.packages("Information") library(Information)

Step 2 - Read a dataset

data <- read.csv("https://storage.googleapis.com/dimensionless/Analytics/quality.csv") # reads the dataset head(data) summary(data)

Here, the dependent variable (y) : PoorCare, with output as 0 / 1, 1 : Patient receiving PoorCare and 0 : patient receiving goodCare. All rest of the variables are independent variables(x1,x2.....)

Step 3 - Compute the Information Value

IV <- create_infotables(data=data, y="PoorCare", bins=10, parallel=FALSE) # takes the data and dependent variable as input IV_Value = data.frame(IV$Summary) # returns the summary list of IV_Values for all the independent variables IV_Value

" Output of the code is(IV_values for all indpendent variables) : "

               Variable         IV
5             Narcotics 1.03685620
4          OfficeVisits 0.98386246
13    AcuteDrugGapSmall 0.90153609
8           TotalVisits 0.80202573
9         ProviderCount 0.70094253
10        MedicalClaims 0.54688735
12 StartedOnCombination 0.40301131
11           ClaimLines 0.37714431
1              MemberID 0.28291661
7                  Pain 0.17571510
6  DaysSinceLastERVisit 0.14998611
3              ERVisits 0.11009380
2         InpatientDays 0.05148669

print(IV$Tables$Narcotics, row.names=FALSE) # returns the WOE and IV values for all the data points in any indpendent variable.

"Output of code is(WOE & IV for any indepedent variable):"
 Narcotics  N    Percent         WOE        IV
     [0,0] 49 0.37404580 -0.88098073 0.2263745
     [1,1] 26 0.19847328 -0.61628818 0.2900233
     [2,2] 16 0.12213740  0.83714549 0.3907189
     [3,3] 13 0.09923664 -0.61628818 0.4225433
    [4,10] 12 0.09160305 -0.01015237 0.4225527
   [11,59] 15 0.11450382  2.10006083 1.0368562

{"mode":"full","isActive":false}

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

ML Model Deployment on AWS for Customer Churn Prediction
MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction

Deploy Transformer-BART Model on Paperspace Cloud
In this MLOps Project you will learn how to deploy a Tranaformer BART Model for Abstractive Text Summarization on Paperspace Private Cloud

Deep Learning Project for Time Series Forecasting in Python
Deep Learning for Time Series Forecasting in Python -A Hands-On Approach to Build Deep Learning Models (MLP, CNN, LSTM, and a Hybrid Model CNN-LSTM) on Time Series Data.

Recommender System Machine Learning Project for Beginners-4
Collaborative Filtering Recommender System Project - Comparison of different model based and memory based methods to build recommendation system using collaborative filtering.

LLM Project to Build and Fine Tune a Large Language Model
In this LLM project for beginners, you will learn to build a knowledge-grounded chatbot using LLM's and learn how to fine tune it.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Build a CNN Model with PyTorch for Image Classification
In this deep learning project, you will learn how to build an Image Classification Model using PyTorch CNN

AWS MLOps Project for Gaussian Process Time Series Modeling
MLOps Project to Build and Deploy a Gaussian Process Time Series Model in Python on AWS

Image Segmentation using Mask R-CNN with Tensorflow
In this Deep Learning Project on Image Segmentation Python, you will learn how to implement the Mask R-CNN model for early fire detection.