How to convert an Excel document to XML format?

This recipe helps you convert an Excel document to XML format

Recipe Objective

In big data scenarios, converting data from Excel format to XML format using Python can be used to to exchange data between different applications. We process complex XML files into a relational format or a Big Data format such as Parquet/ORC . XML is user-readable and more commonly used in network communication, especially with the rise of REST-based web services. The tools needed to manage big data analytics, often in the form of NoSQL databases, only map over to XML in a partial sense. XML is considered to be at odds with relational schemas, which puts it in a camp with most of NoSQL technology

System requirements :

  • Install Yattag python module as follows: pip install yattag
  • Install OpenPyXL python module as follows:pip install openpyxl
  • The below codes can be run in Jupyter notebook , or any python console
  • In this example we are going to use the excel employee dataset

Step 1: Import the modules

import openpyxl import yattag import load_workbook import Doc, indent

Step 2: Load the Workbook

create a Yattag document using yattag module and load the workbook and the worksheet. #load the excel file along with path where exists wb = load_workbook("empdata.xlsx") ws = wb.worksheets[0] # Create Yattag doc, tag and text objects doc, tag, text = Doc().tagtext()

Step 3: Fill the headers with Yattags

Fill the headers with Yattags asis() method which enables us to input any string as the next line

xml_header = '' xml_schema = '' doc.asis(xml_header) doc.asis(xml_schema)

Step 4: Then create main Tag

create main tag Employees as below:

with tag('Employees'):

Step 5: Now start loop through the sheet

Now start loopthrough the sheet with the iter_rows method. The iter_rows method returns a generator with all the cells. We use a list comprehension to get all the values from the cells.

for row in ws.iter_rows(min_row=2, max_row=100, min_col=1, max_col=12): row = [cell.value for cell in row] with tag("Employee"): with tag("FirstName"): text(row[0]) with tag("LastName"): text(row[1]) with tag("Email"): text(row[10]) with tag("Phone"): text(row[8]) with tag("Company"): text(row[2])

Step 6: Adding the Yattag indent

Next, we are adding the Employees. When finished we indent our result with Yattags indent method.

result = indent( doc.getvalue(), indentation = ' ', indent_text = True ) print(result) with open("employee1.xml", "w") as f: f.write(result)

The complete code looks as follows :

import openpyxl, yattag from openpyxl import load_workbook from yattag import Doc, indent wb = load_workbook("empdata.xlsx") ws = wb.worksheets[0] # Create Yattag doc, tag and text objects doc, tag, text = Doc().tagtext() xml_header = '' xml_schema = '' doc.asis(xml_header) doc.asis(xml_schema) with tag('Employees'): # Use ws.max_row for all rows for row in ws.iter_rows(min_row=2, max_row=100, min_col=1, max_col=12): row = [cell.value for cell in row] with tag("Employee"): with tag("FirstName"): text(row[0]) with tag("LastName"): text(row[1]) with tag("Email"): text(row[10]) with tag("Phone"): text(row[8]) with tag("Company"): text(row[2]) result = indent( doc.getvalue(), indentation = ' ', indent_text = True ) print(result) with open("employee.xml", "w") as f: #give path where you want to create f.write(result)

Output of the above Code as follows :

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Loan Eligibility Prediction Project using Machine learning on GCP
Loan Eligibility Prediction Project - Use SQL and Python to build a predictive model on GCP to determine whether an application requesting loan is eligible or not.

Build a Text Classification Model with Attention Mechanism NLP
In this NLP Project, you will learn to build a multi class text classification model with attention mechanism.

Build a Multi Touch Attribution Machine Learning Model in Python
Identifying the ROI on marketing campaigns is an essential KPI for any business. In this ML project, you will learn to build a Multi Touch Attribution Model in Python to identify the ROI of various marketing efforts and their impact on conversions or sales..

Build a Customer Churn Prediction Model using Decision Trees
Develop a customer churn prediction model using decision tree machine learning algorithms and data science on streaming service data.

MLOps using Azure Devops to Deploy a Classification Model
In this MLOps Azure project, you will learn how to deploy a classification machine learning model to predict the customer's license status on Azure through scalable CI/CD ML pipelines.

Langchain Project for Customer Support App in Python
In this LLM Project, you will learn how to enhance customer support interactions through Large Language Models (LLMs), enabling intelligent, context-aware responses. This Langchain project aims to seamlessly integrate LLM technology with databases, PDF knowledge bases, and audio processing agents to create a comprehensive customer support application.

Locality Sensitive Hashing Python Code for Look-Alike Modelling
In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

Stock Price Prediction Project using LSTM and RNN
Learn how to predict stock prices using RNN and LSTM models. Understand deep learning concepts and apply them to real-world financial data for accurate forecasting.

Learn to Build a Polynomial Regression Model from Scratch
In this Machine Learning Regression project, you will learn to build a polynomial regression model to predict points scored by the sports team.

Build a Multi ClassText Classification Model using Naive Bayes
Implement the Naive Bayes Algorithm to build a multi class text classification model in Python.