Member since
08-27-2019
6
Posts
0
Kudos Received
0
Solutions
08-29-2019
01:56 AM
Alright, got it. Is there a way to access files on HDFS in Python without using pyspark.
... View more
08-29-2019
01:53 AM
Hi, Pretty new to NiFi and trying to understand the difference between Fetch,Get and List processors. List - As I understand, creates flow files with only metadata and not the data. This information can be further passed to downstream to read the file contents. I am pretty confused about Get/Fetch and which one to be used under what situation.
... View more
Labels:
- Labels:
-
Apache NiFi
08-29-2019
12:08 AM
Thanks, I will explore on the XLStoCSV processor. Once converted to CSV, I have to do couple of transformations for which I am using Python script. If I place the CSV in HDFS, how do I use Python script to process data from HDFS.Are you suggesting to use ExecuteStream to get the session content and process it or is there a better way to do it.
... View more
08-27-2019
06:59 AM
I'm fairly new to NiFi and trying to execute a Python script stored on Local FS using NiFi. There are couple of XLSB files stored in HDFS. I would want to build a NiFi flow that reads files from HDFS and passes the filename to Python script so that it can convert those to CSV and store it back to HDFS. What should be the flow I need to use to get the above working. I tried using ListHDFS -> ExecuteStream but dont know if that's correct. Also, how do I just test the output of ListHDFS to see the output.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache NiFi