Member since
08-27-2019
6
Posts
0
Kudos Received
0
Solutions
09-27-2019
11:49 AM
I have a flow like this. ListHDFS -> FetchHDFS -> PutFile -> ExecuteStreamCommand. There are 15 files that I would place in a folder and it will be copied to local FS and calls a Python script that will process all the 15 files(it's mandatory that all 15 files are processed at once as the data is merged and transformed) at once and produce a single file. As I understood, the above flow will get executed for every flowfile and hence the Python script also will run and produce multiple files. How do I make the ExecuteStreamCommand to run only once after all 15 files have been placed in the source folder so that I can get only one output file from the Pythons script.
... View more
Labels:
- Labels:
-
Apache NiFi
08-29-2019
01:56 AM
Alright, got it. Is there a way to access files on HDFS in Python without using pyspark.
... View more
08-29-2019
01:53 AM
Hi, Pretty new to NiFi and trying to understand the difference between Fetch,Get and List processors. List - As I understand, creates flow files with only metadata and not the data. This information can be further passed to downstream to read the file contents. I am pretty confused about Get/Fetch and which one to be used under what situation.
... View more
Labels:
- Labels:
-
Apache NiFi
08-29-2019
12:08 AM
Thanks, I will explore on the XLStoCSV processor. Once converted to CSV, I have to do couple of transformations for which I am using Python script. If I place the CSV in HDFS, how do I use Python script to process data from HDFS.Are you suggesting to use ExecuteStream to get the session content and process it or is there a better way to do it.
... View more
08-27-2019
06:59 AM
I'm fairly new to NiFi and trying to execute a Python script stored on Local FS using NiFi. There are couple of XLSB files stored in HDFS. I would want to build a NiFi flow that reads files from HDFS and passes the filename to Python script so that it can convert those to CSV and store it back to HDFS. What should be the flow I need to use to get the above working. I tried using ListHDFS -> ExecuteStream but dont know if that's correct. Also, how do I just test the output of ListHDFS to see the output.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache NiFi