Multiple Imputation with Chained equations in StatsModels library

This recipe describes what is Multiple Imputation with Chained Equations MICE in the StatsModels library
Last Updated: 16 Jun 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING PROJECTS IN PYTHON DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - What is Multiple Imputation with Chained Equations (MICE) in the StatsModels library?

The MICE module can be used to fit most statistical models to datasets that are missing values for independent and/or dependent variables, providing strict standard error for tuned parameters. The basic idea is to treat the variable with the missing value as the dependent variable of the regression and some or all of the remaining variables as predictive variables. The MICE process iterates through these models, approximating each model in turn, and then uses a process called Predictive Average Matching (PMM) to generate a random drawing from the predicted distribution determined by the approximated model. To do. These random plots are the attribution values of the attribution record. By default, each variable missing a variable is modeled using linear regression with the main effects of all other variables in the dataset. Note that the PMM procedure maintains the domain of each variable, even if the assignment model is linear. For example, if all observations for a particular variable are positive, then all assignments for that variable are always positive. The user also has the option to specify the model used to generate the assigned value for each variable.

For more related projects -

https://www.dezyre.com/projects/data-science-projects/deep-learning-projects
https://www.dezyre.com/projects/data-science-projects/neural-network-projects

MICE:

Multiple Imputation with Chained Equations.

Parameters:

model_formula
The model formula to be fit to the imputed data sets. This formula is for the ‘analysis model.’

model_class
The model to be fit to the imputed data sets. This model class is for the ‘analysis model.’

data
MICEData object containing the data set for which missing values will be imputed

Example:

# Importing libraries import statsmodels.api as sm import statsmodels.imputation.mice as mice from statsmodels.imputation.mice import MICEData from statsmodels.imputation.mice import MICE # Importing moore dataset from carData package X = sm.datasets.get_rdataset("flchain", "survival").data # Converting into MICEData X = MICEData(X[['creatinine','age','futime']]) # Fitting MICE model mice = MICE("creatinine ~ age + futime", sm.OLS, X) model = mice.fit() # Model summary model.summary()

Output-
Method:	MICE	Sample size:	7874
Model:	OLS	Scale	0.16
Dependent variable:	creatinine	Num. imputations	10
Coef.	Std.Err.	t	P>|t|	[0.025	0.975]	FMI
Intercept	1.1318	0.0442	25.5785	0.0000	1.0450	1.2185	0.3274
age	0.0020	0.0005	3.8530	0.0001	0.0010	0.0031	0.2427
futime	-0.0000	0.0000	-11.8670	0.0000	-0.0001	-0.0000	0.2832

In this way, we can perform MICE in the StatsModels library.

What Users are saying..

Ed Godalle

Director Data Analytics at EY / EY Tech

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Build a Credit Default Risk Prediction Model with LightGBM

In this Machine Learning Project, you will build a classification model for default prediction with LightGBM.

View Project Details

Multilabel Classification Project for Predicting Shipment Modes

Multilabel Classification Project to build a machine learning model that predicts the appropriate mode of transport for each shipment, using a transport dataset with 2000 unique products. The project explores and compares four different approaches to multilabel classification, including naive independent models, classifier chains, natively multilabel models, and multilabel to multiclass approaches.

View Project Details

Learn Object Tracking (SOT, MOT) using OpenCV and Python

Get Started with Object Tracking using OpenCV and Python - Learn to implement Multiple Instance Learning Tracker (MIL) algorithm, Generic Object Tracking Using Regression Networks Tracker (GOTURN) algorithm, Kernelized Correlation Filters Tracker (KCF) algorithm, Tracking, Learning, Detection Tracker (TLD) algorithm for single and multiple object tracking from various video clips.

View Project Details

Multiple Imputation with Chained equations in StatsModels library

Recipe Objective - What is Multiple Imputation with Chained Equations (MICE) in the StatsModels library?

MICE:

Parameters:

Ed Godalle

Relevant Projects

You might also like

Relevant Projects