Member since
07-28-2017
47
Posts
6
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
16610 | 03-13-2018 12:04 PM | |
10104 | 10-12-2017 08:52 AM |
12-13-2017
12:39 PM
I have a use case where JSON files are read from an API, transformed to CSV and imported to Hive tables, however my flow fails at the replace text processor. Can you give some advice on the configuration of the processor or on where my approach fails? InvokeHTTP --> EvaluateJsonPath --> ReplaceText --> MergeContent --> UpdateAttribute --> PutHDFS My flow does several HTTP calls with InvokeHTTP (Each call with different ID), extracts attributes from each JSON that is returned (each JSON is unique) and then creates the csv's in the ReplaceText processor as following: ${attribute1},${attribute2},${attribute3},${attribute4},${attribute5},${attribute6},${attribute7} However after the MergeContent processor inthe merged CSV there is really a lot of duplicate data while all incoming JSONs contain unique data.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
12-13-2017
10:36 AM
@Timothy Spann I have a similar use case, however my flow fails at the replace text processor. Can you give some advice on the configuration of the processor? InvokeHTTP --> EvaluateJsonPath --> ReplaceText --> MergeContent --> UpdateAttribute --> PutHDFS My flow does several HTTP calls with InvokeHTTP (Each call with different ID), extracts attributes from each JSON that is returned (each JSON is unique) and then creates the csv's like in your example. However after the MergeContent processor the merged CSV there is really a lot of duplicate data while all incoming JSONs contain unique data. ReplaceText conf: MergeContent conf:
... View more
11-29-2017
04:19 PM
I need to read an API where my first call will return a JSON with objectIDs in this form: {
objectIdFieldName: "ID",
objectIds: [ 64916,
67266,
67237,
64511, .... ..] } I need to use the objectIds above to send requests for each of these IDs to the API that will return the data that i need. I was thinking a flow: GetHTTP (get response JSON) --> EvaluateJSONpath (parse only objectIds field: $.objectIds) --> ? --> InvokeHTTP (new query per ID) My problem comes after that as what i get is a sort of objectIds array in the form of: [64916, 67266, 67237, 64511,...,] How to i manage to split/parse each ID from this array in a flowfile attribute so that i can send it along with other data/headers to the InvokeHTTP processor? I thought to use SplitJson processor but i am having difficulties to understand its usage in this case. Any help much appreciated!
... View more
Labels:
- Labels:
-
Apache NiFi
11-02-2017
09:55 AM
1 Kudo
My dataflow takes some files from HDFS and after some processing i want to send en email to specific accounts as notification but with the flowfile itself included as an attachment in the mail. This is the configuration of the PutEmail processor: Lets say i wanna use my gmail account to send the emails, then the SMTP authorization should be my gmail credentials and the sending address the one of my gmail account right? This is the error message i receive, any ideas on what is going wrong here?
... View more
Labels:
- Labels:
-
Apache NiFi
11-01-2017
03:14 PM
Hi @Abdelkrim Hadjidj for now i will implement it with GetFTP, Nifi is in service provider network and i cannot upgrade at will 😞 Do you maybe know a way to tell GetFTP not to download files that have already been downloaded in the past to avoid unneccesary buffers?
... View more
11-01-2017
01:48 PM
Hi @Abdelkrim Hadjidj thanks for the reply. My Nifi version is 1.1.0 so what you say makes sense. My flows are not so time-sensitive, meaning i can delay the ingestion for a couple of hours, but i want to understand a bit better the operations: This is the timestamp in the FTP server of the last file transfered by this Nifi flow (via Data Provenance) Now, if i schedule the LisFTP processor to fire e.g. today at 15:00 i would expect that the file would be parsed with no problem. This bug means that the file would be never parsed as long as it is the last modified file in this location? So in other words, ListFTP/HDFS/whatever performs a listing only if it sees that there are files with most recent timestamp than the last transfered in the directory? Also you mention to scheule cron for 3 & a bit later, is there an option to have 2 scheduling plans for one processor? As far as i know, with cron you can only say something like: run this every 5 mins of that hour or so. Thanks in advance!
... View more
11-01-2017
09:44 AM
Trying to get files from an FTP server with ListFTP/FetchFTP, but these speicifc processors are so confusing to me. I have scheduled ListFTP to fire also with Cron and time-driven every 10 secs or so, but even though it shows a task is executed in the ListFTP processor, no flowfiles come out of it! GetFTP works fine, but i wanna implement it with List/Fetch to get only the new files in the dir. Initially i scheduled the flow with cron at 03:00 last night when i knew a new file would become available in the FTP server around 1. However this morning i saw that nothing was transfered to HDFS. So i changed scheduling to every 60 secs to test it right away and guess what, i got the new file! So i thought Cron is the problem, deleted the test-file from HDFS and scheduled the flow to run at 10:00 this morning, as expected a task was created but no flowfiles were passed to FetchFTP. Switching back to timer-driven scheduling to get the file i deleted back to HDFS from the FTP server, but this time there are no flowfiles created even for this scheduling option. What is going on here guys?
... View more
Labels:
- Labels:
-
Apache NiFi
10-12-2017
03:10 PM
SplitText in between did the trick, amazing tip thanks a lot! Btw the ls -l output contains also other stuff like permissions etc so a rule for ExtractText to parse only the zip filenames is also needed, but thanks anyway!
... View more
10-12-2017
02:27 PM
Hi @Shu i ve posted a new question here, hope what i am trying to do in this use case is clear!
... View more
10-12-2017
02:25 PM
Hi @Alexandru Anghel, ive uploaded a new question with my whole use case and logic here. Any help really appreciated!
... View more