- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
where to convert .xls file to .csv file inside Nifi data flow?
- Labels:
-
Apache NiFi
Created 06-01-2016 03:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm very newer in Nifi, I have a data flow like move files from FTP to HDFS directory on that same time if files in .xls need to convert .csv and then store in HDFS
The big problem is i used windows version of NiFi (not in VM).
if anyone have Idea please share... it will help all beginers
Thanks all
Created 06-01-2016 05:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You didn't mention if you already have a mechanism for converting your XLS files into CSV, but here's a way you might what to orchestrate all of this in NiFi.
- Use a ListFile processor to list all of the *.xls files in your input directory, it will output a flowfile for each xls file it finds.
- Route that to an ExecuteStreamCommand processor that runs a simple program to convert your XLS to CSV. I'd recommend a simple Python script that uses the petl module. Have that script write the output into the same directory.
Have a separate NiFi flow that:
- Uses a GetFile processor to do whatever it is that you want to do with CSV files. Point it at that same input directory. Make sure you configure in the File Filter property to only pick up CSV files, though.
You could use different input directories, too, of course. The trick, I think, is using ListFile and a conversion script. Here's a simple outline of a python script that should work in most cases. It assumes that you always want to convert just the first sheet.
#!/usr/bin/env python import sys import petl as etl import xlrd # Pass ${absolute.path}/${filename} as a command line argument inputFile = sys.argv[1] xls = etl.fromxls(inputFile) etl.tocsv(xls, inputFile+".csv", write_header=True)
Created 06-01-2016 07:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried to list files and use ExecuteStreamCommand processor to convert file and store into same directory but it dsn't work for me
I tried to upload the picture here but it shows some error so i explain wih commands
-First i use ListFile process and use to properties like
Input Directory: D:/Nifi_Test (installed Windows NiFi)
Input Directory Location: Local
File Filter: *.xls
-Second i use ExecuteStreamCommand process and use properties like
I stored your python script like script.py in my local directory like C:\script.py
Command Path:C:\script.py
Command Arguments: empty
I'm having book1.xls file in D:/Nifi_Test but it doesn't list in ListFile process
please heip me to out this...
Created 06-02-2016 03:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just double-checking the obvious: Do you have Python installed on your Windows machine? And have you installed the petl and xlrd modules using pip?
You can test the script by itself just by running it from the command line. Have you tried that?