In big data scenarios, converting data from Excel format to XML format using Python can be used to to exchange data between different applications. We process complex XML files into a relational format or a Big Data format such as Parquet/ORC . XML is user-readable and more commonly used in network communication, especially with the rise of REST-based web services. The tools needed to manage big data analytics, often in the form of NoSQL databases, only map over to XML in a partial sense. XML is considered to be at odds with relational schemas, which puts it in a camp with most of NoSQL technology
pip install yattag
pip install openpyxl
import openpyxl
import yattag
import load_workbook
import Doc, indent
#load the excel file along with path where exists
wb = load_workbook("empdata.xlsx")
ws = wb.worksheets[0]
# Create Yattag doc, tag and text objects
doc, tag, text = Doc().tagtext()
Fill the headers with Yattags asis() method which enables us to input any string as the next line
xml_header = ''
xml_schema = ' '
doc.asis(xml_header)
doc.asis(xml_schema)
create main tag Employees as below:
with tag('Employees'):
Now start loopthrough the sheet with the iter_rows method. The iter_rows method returns a generator with all the cells. We use a list comprehension to get all the values from the cells.
for row in ws.iter_rows(min_row=2, max_row=100, min_col=1, max_col=12):
row = [cell.value for cell in row]
with tag("Employee"):
with tag("FirstName"):
text(row[0])
with tag("LastName"):
text(row[1])
with tag("Email"):
text(row[10])
with tag("Phone"):
text(row[8])
with tag("Company"):
text(row[2])
Next, we are adding the Employees. When finished we indent our result with Yattags indent method.
result = indent(
doc.getvalue(),
indentation = ' ',
indent_text = True
)
print(result)
with open("employee1.xml", "w") as f:
f.write(result)
The complete code looks as follows :
import openpyxl, yattag
from openpyxl import load_workbook
from yattag import Doc, indent
wb = load_workbook("empdata.xlsx")
ws = wb.worksheets[0]
# Create Yattag doc, tag and text objects
doc, tag, text = Doc().tagtext()
xml_header = ''
xml_schema = ' '
doc.asis(xml_header)
doc.asis(xml_schema)
with tag('Employees'):
# Use ws.max_row for all rows
for row in ws.iter_rows(min_row=2, max_row=100, min_col=1, max_col=12):
row = [cell.value for cell in row]
with tag("Employee"):
with tag("FirstName"):
text(row[0])
with tag("LastName"):
text(row[1])
with tag("Email"):
text(row[10])
with tag("Phone"):
text(row[8])
with tag("Company"):
text(row[2])
result = indent(
doc.getvalue(),
indentation = ' ',
indent_text = True
)
print(result)
with open("employee.xml", "w") as f: #give path where you want to create
f.write(result)
Output of the above Code as follows :