How to convert files from XML to CSV format in NiFi

This recipe helps you convert files from XML to CSV format in NiFi
Last Updated: 29 Aug 2022

Get access to Big Data projects View all Big Data projects

APACHE HADOOP PROJECTS DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective: How to convert files from XML format to CSV format in NiFi?

In most big data scenarios, Apache NiFi is used as open-source software for automating and managing the data flow between systems. It is a robust and reliable system to process and distribute data. It provides a web-based User Interface to create, monitor, and control data flows. Gathering data from databases is widely used to collect real-time streaming data in Big data environments to capture, process, and analyze the data. Conversion of CSV schema to XML is commonly used in big data-based large-scale environments.

Access Snowflake Real-Time Project to Implement SCD's

Recipe Objective: How to convert files from XML format to CSV format in NiFi?

System requirements :

Install Ubuntu in the virtual machine Click Here
Install Nifi in Ubuntu Click Here

Step 1: Configure the GetFile

Creates FlowFiles from files in a directory. NiFi will ignore files it doesn't have at least read permissions for. Here we are getting the file from the local directory.

We scheduled this processor to run every 60 sec in the Run Schedule and Execution as the Primary node in the SCHEDULING tab. Here we are ingesting the drivers.xml file drivers data from a local directory; for that, we configured Input Directory and provided the file name.

Step 2: Configure the UpdateAttribute

Updates the Attributes for a FlowFile using the Attribute Expression Language and/or deletes the attributes based on a regular expression.

Here we will use the UpdateAttribute to update the schema name for the Avro schema registry as below.

As shown above, we added a new attribute schema.name as drivers value.

Step 3: Configure the ConvertRecord and Create Controller Services:

Using an XMLReader controller service that references a schema in an AvroSchemaRegistry controller service

The AvroSchemaRegistry contains a "drivers" schema that defines information about each record (field names, field ids, field types)

Using a CSVRecordSetWriter controller service that references the same AvroSchemaRegistry schema.

In ConvertRecord processor, the properties tab in the RecordReader value column drop down will get as below, then click on create new service.

Then you will get the pop up as below select CSV reader in compatible controller service drop-down as shown below:

Follow the same steps to create controller service for the CSV record set writer as below

To Enable Controller Services Select the gear icon from the Operate Palette:

This opens the NiFi Flow Configuration window. Select the Controller Services tab:

Click on the "+" symbol to add the Avro schema registry; it will add the Avro schema registry as the above image. Then click on the gear symbol and config as below:

In the property, we need to provide the schema name, and in the value Avro schema, click ok and Enable AvroSchemaRegistry by selecting the lightning bolt icon/button. This will then allow you to enable the XMLReader and CSVRecordSetWriter controller services.

Configure the XMLReader as below:

And also, configure the CSVRecordsetWriter as below :

Then after that, click on apply, and then you will be able to see the XMLReader, and CSVRecordWriter controller services then Select the lightning bolt icons for both of these services. All the controller services should be enabled at this point.

Click on the thunder symbol and enable them.

Step 4: Configure the UpdateAttribute to update the filename

Updates the Attributes for a FlowFile using the Attribute Expression Language and/or deletes the attributes based on a regular expression. Here, we are going to give the name for the FlowFile.

the output of the filename

Step 5: Configure the UpdateAttribute to update file extension

Updates the Attributes for a FlowFile by using the Attribute Expression Language and/or deletes the attributes based on a regular expression

Configured the update attribute processor as below, UpdateAttribute adds the file name with the CSV extension as an attribute to the FlowFile

The output of the filename:

Step 6: Configure the PutFile

Writes the contents of a FlowFile to the local file system, it means that we are storing the converted CSV content in the local directory for that we configured as below:

As shown in the above image, we provided a directory name to store and access the file.

The output of the file stored in the local and data looks as below:

Conclusion

Here we learned to convert files from XML format to CSV format in NiFi.

Download Materials

bigdata_1

bigdata_2

bigdata_3

bigdata_4

bigdata_5

bigdata_6

bigdata_7

bigdata_8

bigdata_9

bigdata_10

bigdata_11

bigdata_12

bigdata_13

bigdata_14

bigdata_15

bigdata_16

bigdata_17

Download_and_install_VM_Ubuntu_ISO

Nifi_installation_in_ubuntu

bigdata_160 Convert_XML_to_CSV

What Users are saying..

Gautam Vermani

Data Consultant at Confidential

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Python and MongoDB Project for Beginners with Source Code-Part 2

In this Python and MongoDB Project for Beginners, you will learn how to use Apache Sedona and perform advanced analysis on the Transportation dataset.

View Project Details

PySpark Project-Build a Data Pipeline using Hive and Cassandra

In this PySpark ETL Project, you will learn to build a data pipeline and perform ETL operations by integrating PySpark with Hive and Cassandra

View Project Details

SQL Project for Data Analysis using Oracle Database-Part 3

In this SQL Project for Data Analysis, you will learn to efficiently write sub-queries and analyse data using various SQL functions and operators.

View Project Details

Deploying auto-reply Twitter handle with Kafka, Spark and LSTM

Deploy an Auto-Reply Twitter Handle that replies to query-related tweets with a trackable ticket ID generated based on the query category predicted using LSTM deep learning model.

View Project Details

Retail Analytics Project Example using Sqoop, HDFS, and Hive

This Project gives a detailed explanation of How Data Analytics can be used in the Retail Industry, using technologies like Sqoop, HDFS, and Hive.

View Project Details

Build an ETL Pipeline with Talend for Export of Data from Cloud

In this Talend ETL Project, you will build an ETL pipeline using Talend to export employee data from the Snowflake database and investor data from the Azure database, combine them using a Loop-in mechanism, filter the data for each sales representative, and export the result as a CSV file.

View Project Details

How to convert files from XML to CSV format in NiFi

Recipe Objective: How to convert files from XML format to CSV format in NiFi?

Table of Contents

System requirements :

Step 1: Configure the GetFile

Step 2: Configure the UpdateAttribute

Step 3: Configure the ConvertRecord and Create Controller Services:

Step 4: Configure the UpdateAttribute to update the filename

Step 5: Configure the UpdateAttribute to update file extension

Step 6: Configure the PutFile

Conclusion

Gautam Vermani

Relevant Projects

You might also like

Relevant Projects