What is Information Value in modelling

This recipe explains what is Information Value in modelling
Last Updated: 01 Aug 2022

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN R DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

What is Information Value in modelling?

Logistic Regression is a classification type supervised learning model. Logistic Regression is used when the independent variable x, can be a continuous or categorical variable, but the dependent variable (y) is a categorical variable. Information value is an important technique which is used to determine and select the important variables in a predictive model. It helps to rank variables on the basis of their importance. Syntax: IV = (% of non-events — % of events) * WOE where, WOE — Weight of evidence.... WOE = In (% of non-events % of events) The importance of the variables are decided according to some rules like: If, IV: Less than 0.02 – Not useful for prediction IV: 0.02 to 0.1 — Weak predictive power IV : 0.1 to 0.3 — Medium predictive Power IV : 0.3 to 0.5 — Strong predictive power IV: >0.5 — Suspicious Predictive Power This recipe demonstrates an example of Information Value in modelling in R.

Learn How to Build a Multi Class Text Classification Model using BERT

Recipe Objective

Step 1 - Install necessary libraries

install.packages("Information") library(Information)

Step 2 - Read a dataset

data <- read.csv("https://storage.googleapis.com/dimensionless/Analytics/quality.csv") # reads the dataset head(data) summary(data)

Here, the dependent variable (y) : PoorCare, with output as 0 / 1, 1 : Patient receiving PoorCare and 0 : patient receiving goodCare. All rest of the variables are independent variables(x1,x2.....)

Step 3 - Compute the Information Value

IV <- create_infotables(data=data, y="PoorCare", bins=10, parallel=FALSE) # takes the data and dependent variable as input IV_Value = data.frame(IV$Summary) # returns the summary list of IV_Values for all the independent variables IV_Value

" Output of the code is(IV_values for all indpendent variables) : "

               Variable         IV
5             Narcotics 1.03685620
4          OfficeVisits 0.98386246
13    AcuteDrugGapSmall 0.90153609
8           TotalVisits 0.80202573
9         ProviderCount 0.70094253
10        MedicalClaims 0.54688735
12 StartedOnCombination 0.40301131
11           ClaimLines 0.37714431
1              MemberID 0.28291661
7                  Pain 0.17571510
6  DaysSinceLastERVisit 0.14998611
3              ERVisits 0.11009380
2         InpatientDays 0.05148669

print(IV$Tables$Narcotics, row.names=FALSE) # returns the WOE and IV values for all the data points in any indpendent variable.

"Output of code is(WOE & IV for any indepedent variable):"
 Narcotics  N    Percent         WOE        IV
     [0,0] 49 0.37404580 -0.88098073 0.2263745
     [1,1] 26 0.19847328 -0.61628818 0.2900233
     [2,2] 16 0.12213740  0.83714549 0.3907189
     [3,3] 13 0.09923664 -0.61628818 0.4225433
    [4,10] 12 0.09160305 -0.01015237 0.4225527
   [11,59] 15 0.11450382  2.10006083 1.0368562

{"mode":"full","isActive":false}

What Users are saying..

Gautam Vermani

Data Consultant at Confidential

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Loan Eligibility Prediction using Gradient Boosting Classifier

This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

View Project Details

ML Model Deployment on AWS for Customer Churn Prediction

MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction

View Project Details

Deploy Transformer-BART Model on Paperspace Cloud

In this MLOps Project you will learn how to deploy a Tranaformer BART Model for Abstractive Text Summarization on Paperspace Private Cloud

View Project Details

Deep Learning Project for Time Series Forecasting in Python

Deep Learning for Time Series Forecasting in Python -A Hands-On Approach to Build Deep Learning Models (MLP, CNN, LSTM, and a Hybrid Model CNN-LSTM) on Time Series Data.

View Project Details

Recommender System Machine Learning Project for Beginners-4

Collaborative Filtering Recommender System Project - Comparison of different model based and memory based methods to build recommendation system using collaborative filtering.

View Project Details

LLM Project to Build and Fine Tune a Large Language Model

In this LLM project for beginners, you will learn to build a knowledge-grounded chatbot using LLM's and learn how to fine tune it.

View Project Details

Walmart Sales Forecasting Data Science Project

Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

View Project Details