How to Convert Excel File to XML Format Using Python?

If you are a person who is working on a data analytics project, then you might need to feed your data to a specific location. This can be done using the XML file. The main advantages of having your data in XML format are that it can be read by multiple different programs if it is delivered in the correct format, and also that the data receiver can easily read the files and store them in their database. Apart from these, one of the most popular reasons for using the XML file is that it helps to read and transfer the data quickly and helps the data analyst process large datasets. As a result, it becomes necessary to convert the Excel files to XML format. And to convert the Excel file to an XML file, we can simply use the powerful programming language Python.

Thus, here is a detailed tutorial that explains “How do you convert XLS to XML using Python?” with examples. Scroll down to find out more.

How to Create an XML file from Excel using Python?

Converting data from Excel format to XML format using Python can be used in huge data applications to exchange data between different applications. We convert complex XML files to relational or Big Data formats like Parquet/ORC. XML is more widely used in network communication, particularly with the rise of REST-based web services. The tools required to manage big data analytics, which is frequently in the form of NoSQL databases, only partially map to XML. XML is thought to be incompatible with relational schemas, putting it in the same camp as most NoSQL technology.

Why Python?

When converting a file from one format to another, Python proves to be a powerful language. It includes tools that can be used to quickly obtain the functionality. In this post, let us see how to use Python to convert an Excel file to Extensible terminology (XML) files.

Modules Used

OpenPyXL:

OpenPyXL is used to interact with Excel files. It can read and write to.xlsx and.xlsm files. Use the below command for Installation:

pip install openpyxl

Yattag:

Yattag is a Python library for generating readable HTML or XML documents with Python. This Yattag Library is straightforward and simple to use. If you are looking for a library to help you create HTML or XML documents more effortlessly then this library is helpful.

pip install yattag

Functions Used:

1)load_workbook(): The load_workbook() method of OpenPyXl is used to load the contents of an Excel file.

2)Iter_rows(): Iter_rows() with proper attributes is used to iterate through the loaded file and read data.

  • Syntax of Iter_rows():

Iter_rows(min_col, min_row, max_col, max_row, values_only)

Parameters

  • min_col: It is the minimum/smallest column value(1-based index)
  • min_row: It is the minimum/smallest row value(1-based index)
  • max_col: It is the maximum/largest column value(1-based index)
  • max_row: It is the maximum/largest row value(1-based index)
  • values_only: This returns a boolean value. It indicates whether only cell values must be returned or not.

3)tagtext() Function:

This tagtext() function is a helper method that returns a triplet made up of:
Doc instance itself, tag method of the Doc instance, text method of the Doc instance.
4) asis() Function:

The asis method adds/appends a string to the document without escaping it.

5)tag() Function:

The tag function is used to accept any string as a tag name.

6)indent() Function:

The indent function accepts a string that represents an XML or HTML document and gives a properly indented version of that document.

Here we now work on the Excel spreadsheet shown in the image below:

SampleExcelFile:
SampleExcelFile

Python Program to Convert Excel File to XML Format

Method #1: Reading an Excel File and printing all the Rows

Approach:

  • Import load_workbook function from openpyxl module using the import keyword
  • Pass the excel file path to the load_workbook() function of openpyxl module to load the excel file and store it in a variable
  • Access Sheet1 of the given Excel File and store it in another variable
  • Loop till all the cells of the worksheet by passing the arguments min,max rows and columns.
  • Print all the cell values row-by-row.
  • The Exit of the Program.

Below is the implementation:

# Import load_workbook function from openpyxl module using the import keyword
from openpyxl import load_workbook

# Pass the excel file path to the load_workbook() function of openpyxl 
# module to load the excel file and store it in a variable
workbookObj  = load_workbook("SampleExcelFile.xlsx")

# Access Sheet1 of the given Excel File and  store it in another variable
newWorksheet = workbookObj.worksheets[0]

# Loop till all the cells of the worksheet by passing the arguments min,max rows and columns
for eachrow in newWorksheet.iter_rows(min_row=1, max_row=7, min_col=1, max_col=3):
    print([cell.value for cell in eachrow])

Output:

['EmployeeId', ' EmployeeName', ' Salary']
[2121.0, 'Danny', 25000.0]
[2122.0, 'Sindhu', 50000.0]
[2123.0, 'Steve', 40000.0]
[2124.0, 'Rosy', 80000.0]
[2125.0, 'Alex', 100000.0]
[2126.0, 'Diya', 60000.0]

Method #2: Converting Excel File to XML Format

Approach:

  • Import load_workbook from openpyxl module using the import keyword
  • Import Doc, indent functions from yattag module using the import keyword
  • Pass the excel file path to the load_workbook() function of openpyxl module which is the file we want to work with and store it in a variable
  • Get/access the Sheet1 of the above workbookObj and store it in another variable
  • Get the Returning returns a triplet
  • Give the XML file header and store it in a variable
  • Give the XML file schema and store it in a variable
  • Append the above XML file header(String) to the document using the asis() function
  • Append the above XML file schema(String) to document using the asis() function
  • Give some random tag name using the above tag of document.
  • Open some random file name in write mode.
  • Write the above-obtained result into this corresponding file using the write() function.
  • The Exit of the Program.

Below is the implementation:

# Import load_workbook from openpyxl module using the import keyword
from openpyxl import load_workbook
# Import Doc, indent functions from yattag module using the import keyword
from yattag import Doc, indent

# Pass the excel file path to the load_workbook() function of openpyxl module 
# which is the file we want to work with and store it in a variable
workbookObj  = openpyxl.load_workbook('SampleExcelFile.xlsx')
# Get/access the Sheet1 of the above workbookObj and store it in another variable
newWorksheet  = workbookObj.worksheets[0]

# Get the Returning returns a triplet
document, tag_of_doc, text_of_doc = Doc().tagtext()

# Give the xml file header and store it in a variable
xmlFileHeader = '<?xml version="1.0" encoding="UTF-8"?>'
# Give the xml file schema and store it in a variable
xmlFileSchema = '<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"></xs:schema>'

# Append the above xml file header(String) to document using the asis() function
document.asis(xmlFileHeader)
# Append the above xml file schema(String) to document using the asis() function
document.asis(xmlFileSchema)

# Give some random tag name using the above tag of document
with tag_of_doc('EmployDetails'):
    for row in newWorksheet.iter_rows(min_row=1, max_row=7, min_col=1, max_col=3):
        row = [cell.value for cell in row]
        with tag_of_doc("EmployDetails"):
            with tag_of_doc("EmployeeId"):
                text_of_doc(row[0])
            with tag_of_doc(" EmployeeName"):
                text_of_doc(row[1])
            with tag_of_doc("Salary"):
                text_of_doc(row[2])

# Open some random file name in write mode. 
with open("ConvertedtXMLFile.xml", "w") as file:
  # Write the above obtained result into this corresponding file 
  # using the write() function
    file.write(document.getvalue())

Output:

<?xml version=”1.0″ encoding=”UTF-8″?> <xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema”> </xs:schema> <EmployDetails> <EmployDetails> <EmployeeId>EmployeeId</EmployeeId>< EmployeeName> EmployeeName </ EmployeeName> <Salary> Salary</Salary> </EmployDetails> <EmployDetails> <EmployeeId>2121.0</EmployeeId>< EmployeeName>Danny </ EmployeeName> <Salary>25000.0</Salary>undefined</EmployDetails>undefined<EmployDetails> <EmployeeId>2122.0</EmployeeId>< EmployeeName>Sindhuundefined</ EmployeeName>undefined<Salary>50000.0</Salary>undefined</EmployDetails>undefined<EmployDetails>undefined<EmployeeId>2123.0</EmployeeId>< EmployeeName>Steveundefined</ EmployeeName>undefined<Salary>40000.0</Salary>undefined</EmployDetails>undefined<EmployDetails>undefined<EmployeeId>2124.0</EmployeeId>< EmployeeName>Rosyundefined</ EmployeeName>undefined<Salary>80000.0</Salary>undefined</EmployDetails>undefined<EmployDetails>undefined<EmployeeId>2125.0</EmployeeId>< EmployeeName>Alexundefined</ EmployeeName>undefined<Salary>100000.0</Salary>undefined</EmployDetails>undefined<EmployDetails>undefined<EmployeeId>2126.0</EmployeeId>< EmployeeName>Diyaundefined</ EmployeeName>undefined<Salary>60000.0</Salary>undefined</EmployDetails>undefined</EmployDetails>

Now we have provided the best way to convert the Excel file to an XML file using Python. If you have better methods than the methods provided here, do let us know in the comment box below.

Leave a Comment