How to Convert Dict to Array in Python using dictvectorizer?

Learn the efficient way to convert a Python dict to array using the powerful DictVectorizer. Explore step-by-step instructions & examples by ProjectPro

Python provides powerful tools for data manipulation and analysis, and working with dictionaries is a common practice in data handling. However, when it comes to machine learning and numerical analysis, converting dictionaries to arrays is often necessary. One useful tool for this task is the DictVectorizer class from the scikit-learn library. Check out this dictvectorizer tutorial to understand how to convert a dictionary to an array in Python using DictVectorizer, and we will cover key aspects such as converting dictionary keys to an array and vice versa.

What is DictVectorizer?

DictVectorizer is a class in scikit-learn that transforms dictionaries into NumPy arrays, suitable for machine learning algorithms. It works by converting dictionaries with categorical features into a sparse matrix representation, where each unique category becomes a feature column.

DictVectorizer Tutorial: Converting Dict to Array in Python

The DictVectorizer class is a powerful tool for converting dictionaries into arrays. It transforms a list of dictionaries into a NumPy array, making it easier to work with machine learning algorithms that require numerical input. Check below the step-by-step guide to learn How to Convert Dict to Array in Python using dictvectorizer -

Step 1: Importing Required Libraries

from sklearn.feature_extraction import DictVectorizer

Step 2: Creating a Sample Dictionary

data = [{'feature1': 10, 'feature2': 20},

        {'feature1': 15, 'feature2': 25},

        {'feature1': 18, 'feature2': 30}]

Step 3:  Create a DictVectorizer instance

vectorizer = DictVectorizer(sparse=False)

Step 4: Fit and transform the data

array_representation = vectorizer.fit_transform(data)

Step 5: Examining the Result

print(array_representation)

Convert dict to array in Python

In the above code, each dictionary represents a data point, and DictVectorizer converts it into a numerical array. The sparse=False parameter ensures that the output is a dense array instead of a sparse matrix.

How to Convert Python Dict Keys to Array?

Sometimes, you may only be interested in extracting the keys from a dictionary and converting them into an array. This can be achieved using the keys() method and converting it to a list. 

Here's how you can do it:

Convert Python dict keys to array

In this example, keys_array will contain the keys of the dictionary as elements in the array.

How to Convert Python Array to Dict?

Converting an array back to a dictionary involves creating a new dictionary and populating it with the array elements. Check out the example below:

Python array to dict

The above example involves initializing a new dictionary where each key is taken from the array, and the values are set to None. You can then update the values based on your requirements.

How to Convert a Dictionary into a Matrix? 

Step 1 - Import the library

from sklearn.feature_extraction import DictVectorizer

We have only imported the DictVectorizer which is needed.

Step 2 - Setting up the Data

We have created a dictionary of data with three features named 'Pen', 'Pencil' and 'Eraser'. Each three features have values assigned to them.

    data_dict = [{'Pen': 2, 'Pencil': 4},

                 {'Pen': 4, 'Pencil': 3},

                 {'Pen': 1, 'Eraser': 2},

                 {'Pen': 2, 'Eraser': 2}]

    print(data_dict)

Step 3 - Converting Dictionary into Matrix

So here we want to convert a dictionary into a matrix. So we have used DictVectorizer to do so, it will create a matrix such that each column will signify a feature and rows will be the samples of the dictionary. Finally we have also printed the feature name using get_feature_names.

    dictvectorizer = DictVectorizer(sparse=False)

    features = dictvectorizer.fit_transform(data_dict)

    print(features)

    feature_name =dictvectorizer.get_feature_names()

    print(feature_name)

So the output comes as

[{'Pen': 2, 'Pencil': 4}, {'Pen': 4, 'Pencil': 3}, {'Pen': 1, 'Eraser': 2}, {'Pen': 2, 'Eraser': 2}]

[[0. 2. 4.]

 [0. 4. 3.]

 [2. 1. 0.]

 [2. 2. 0.]]

['Eraser', 'Pen', 'Pencil']

Directorized Example and Use Cases 

Here are a few examples to better understand how to use DictVectorizer and why it can be beneficial.

  • Handling Categorical Data

DictVectorizer automatically handles categorical data by converting it into binary features. In our example, the 'city' key represents categorical data, and the vectorizer transforms it into binary columns.

  • Dealing with Missing Values

If a dictionary has missing values, DictVectorizer replaces them with zeros in the resulting array.

Dealing with missing values

The missing 'city' and 'age' values will be filled with zeros.

DictVectorizer vs. Other Vectorizers 

Let's now briefly compare DictVectorizer with other vectorization techniques like OneHotEncoder and simple CountVectorizer.

DictVectorizer vs. OneHotEncoder 

While both are used for handling categorical data, DictVectorizer is more flexible as it can handle mixed data types. OneHotEncoder is designed specifically for categorical variables and may not handle non-categorical data as effectively.

DictVectorizer vs. CountVectorizer

CountVectorizer is commonly used for converting text data into numerical features, whereas DictVectorizer is more general-purpose, working well with dictionaries containing various data types.

DictVectorizer in Action with ProjectPro! 

Converting dictionaries to arrays is a critical step in data preparation for machine learning models, and Python's scikit-learn library offers a powerful solution with DictVectorizer. Through our exploration of examples and comparisons with alternative vectorization techniques, DictVectorizer proves its versatility and effectiveness, particularly in managing mixed data types and categorical features during preprocessing. Beyond this, the paramount importance of gaining practical experience through real-world projects is emphasized. This hands-on approach is fundamental for true mastery, and ProjectPro  can help you achieve this. Wiith a repository of over 270+ projects focused on data science and big data, ProjectPro facilitates a seamless transition from theory to application. Engaging with ProjectPro not only enhances theoretical understanding but also cultivates essential practical skills, making it an indispensable resource for aspiring data scientists .

Download Materials

What Users are saying..

profile image

Ed Godalle

Director Data Analytics at EY / EY Tech
linkedin profile url

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Tensorflow Transfer Learning Model for Image Classification
Image Classification Project - Build an Image Classification Model on a Dataset of T-Shirt Images for Binary Classification

Time Series Classification Project for Elevator Failure Prediction
In this Time Series Project, you will predict the failure of elevators using IoT sensor data as a time series classification machine learning problem.

House Price Prediction Project using Machine Learning in Python
Use the Zillow Zestimate Dataset to build a machine learning model for house price prediction.

OpenCV Project for Beginners to Learn Computer Vision Basics
In this OpenCV project, you will learn computer vision basics and the fundamentals of OpenCV library using Python.

Deploy Transformer BART Model for Text summarization on GCP
Learn to Deploy a Machine Learning Model for the Abstractive Text Summarization on Google Cloud Platform (GCP)

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Loan Eligibility Prediction in Python using H2O.ai
In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

Build a Text Classification Model with Attention Mechanism NLP
In this NLP Project, you will learn to build a multi class text classification model with attention mechanism.

Multilabel Classification Project for Predicting Shipment Modes
Multilabel Classification Project to build a machine learning model that predicts the appropriate mode of transport for each shipment, using a transport dataset with 2000 unique products. The project explores and compares four different approaches to multilabel classification, including naive independent models, classifier chains, natively multilabel models, and multilabel to multiclass approaches.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.