Re: where to convert .xls file to .csv file inside... - Cloudera Community - 112430

Support Questions

Find answers, ask questions, and share your expertise

where to convert .xls file to .csv file inside Nifi data flow?

avatar
Contributor

I'm very newer in Nifi, I have a data flow like move files from FTP to HDFS directory on that same time if files in .xls need to convert .csv and then store in HDFS

The big problem is i used windows version of NiFi (not in VM).

if anyone have Idea please share... it will help all beginers

Thanks all

3 REPLIES 3

avatar
Contributor

You didn't mention if you already have a mechanism for converting your XLS files into CSV, but here's a way you might what to orchestrate all of this in NiFi.

  1. Use a ListFile processor to list all of the *.xls files in your input directory, it will output a flowfile for each xls file it finds.
  2. Route that to an ExecuteStreamCommand processor that runs a simple program to convert your XLS to CSV. I'd recommend a simple Python script that uses the petl module. Have that script write the output into the same directory.

Have a separate NiFi flow that:

  1. Uses a GetFile processor to do whatever it is that you want to do with CSV files. Point it at that same input directory. Make sure you configure in the File Filter property to only pick up CSV files, though.

You could use different input directories, too, of course. The trick, I think, is using ListFile and a conversion script. Here's a simple outline of a python script that should work in most cases. It assumes that you always want to convert just the first sheet.

#!/usr/bin/env python
import sys
import petl as etl
import xlrd

# Pass ${absolute.path}/${filename} as a command line argument
inputFile = sys.argv[1]

xls = etl.fromxls(inputFile)
etl.tocsv(xls, inputFile+".csv", write_header=True)

avatar
Contributor

I tried to list files and use ExecuteStreamCommand processor to convert file and store into same directory but it dsn't work for me

I tried to upload the picture here but it shows some error so i explain wih commands

-First i use ListFile process and use to properties like

Input Directory: D:/Nifi_Test (installed Windows NiFi)

Input Directory Location: Local

File Filter: *.xls

-Second i use ExecuteStreamCommand process and use properties like

I stored your python script like script.py in my local directory like C:\script.py

Command Path:C:\script.py

Command Arguments: empty

I'm having book1.xls file in D:/Nifi_Test but it doesn't list in ListFile process

please heip me to out this...

avatar
Contributor

Just double-checking the obvious: Do you have Python installed on your Windows machine? And have you installed the petl and xlrd modules using pip?

You can test the script by itself just by running it from the command line. Have you tried that?