HANDS-ON-LAB

Fuel Consumption Prediction Machine Learning Project

Problem Statement

Predict the fuel consumption of a vehicle based on their make, engine size, fuel type, no. of cylinders etc.

Dataset

Datasets provide model-specific fuel consumption ratings and estimated carbon dioxide emissions for new light-duty vehicles for retail sale in Canada. Complete data dictionary can be found here.

Kindly download the data from here.

Tasks

  1. Hypothesis based EDA:

    • Which brand (or make) has more share of vehicles with less than 8.0 fuel consumption?

    • Plot the correlation scatter plot for the engine size and fuel consumption.

    • Create a pairplot with all the numerical variables and note down the observations/insights

  1. Create a new column named “Transmission type” from the TRANSMISSION column by separating out the gears from the alphabets. (Hint: use regex)

  2. Build Linear regression model with only the numerical features using Statsmodels library. Note down the R2, Adjusted R2. 

  3. One-hot encode categorical variables and Build Linear regression model using Statsmodels library. Note down the R2, Adjusted R2. Compare it with the previous model.

 

Discover correlations and build regression models to predict fuel consumption accurately

 

FAQs

1) Which brand has a higher share of vehicles with fuel consumption below 8.0?

To identify the brand with a higher share, analyze the dataset and calculate the proportion of vehicles for each brand with fuel consumption below 8.0.

 

2) How can I visualize the correlation between engine size and fuel consumption?

Plot a scatter plot using engine size on the x-axis and fuel consumption on the y-axis to visualize the correlation between these variables.

 

3) What insights can I gain from the pairplot of numerical variables?

By creating a pairplot of numerical variables, you can observe relationships between different variables and identify patterns or correlations among them, aiding in understanding the data better.