Member since
10-15-2019
12
Posts
0
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2556 | 10-24-2019 11:17 AM | |
2761 | 10-24-2019 11:10 AM | |
1182 | 10-23-2019 07:51 AM | |
5897 | 10-16-2019 05:07 AM |
08-11-2020
03:17 PM
I have a requirement to access AWS S3 bucket through NIFI and process files into HDFS from the specific subfolder Ex:- S3 bucket name: my_bucket. Folders under my_bucket(S3) have ABC, BDE,CEF,XGF,BHG,NHY. I have to process files only from BDE & CEF subfolders and ignore others. I am currently using ListS3 -> Fetch S3 -> Routeonattribute -> UpdateAttribute -> PutHDFS I am unable to filter folder name on ListS3 or FetchS3. So thought of using Routeonattribute by filtering on absolute.path as below. Could you please help if the above logic is correct? ${absolute.path:contains('DBE')}
... View more
Labels:
- Labels:
-
Apache MiNiFi
-
Apache NiFi
-
NiFi Registry
10-24-2019
11:17 AM
The current flow worked fine. listhdfs->fethchdfs->updateattribute->puthdfs->deletehdfs In the list hdfs set the Minimum File Age wait time before consuming. This will allow the process to search files recursively. I have completed all the activates except generating sequence number for each received flow file for the same date. Could you please check and help.
... View more
10-24-2019
11:10 AM
I have got the issue resolved by providing the following value by adding a new property as filename. ${filename:substringBeforeLast('.')}_${uuid}.${filename:substringAfter('.')}
... View more
10-23-2019
07:51 AM
This has to be achieved only with puthdfs and not movehdfs. The following nifi flow worked and issue resolved. listhdfs->updateattribute->puthdfs->deletehdfs
... View more
10-23-2019
07:46 AM
Hi, I am moving data between hdfs directories to pick the latest updated flow file. The code should verify on the source hdfs directories to pick the latest merged json files time greater than 2 hrs and process to the target along with the sub folders if the target directory doesn't have the directories available. Push the files and append sequence number for every new file received on the same date and delete it from the source directory after processing it. If new files received, then reprocess with new sequence number. Source hdfs path:- /data/json/incoming/year=2019/month=10/day=22/$flow-file Target hdfs path:- /data/json/final/$path/$flow-file Filename(received) :- source_es_2019_10_21.jsonl Filename(required post processing) :- source_es_2019_10_21_1.jsonl source_es_2019_10_21_2.jsonl source_es_2019_10_21_3.jsonl I am currently using nififlow listhdfs->updateattribute->puthdfs->deletehdfs I have completed all the activates except generating sequence number for each received flow file for the same date. Could you please check and help.
... View more
Labels:
- Labels:
-
Apache NiFi
10-21-2019
03:17 PM
Hi All, I am moving data between hdfs directories to pick the latest updated flow file. The requirement is to move the merged json files. The code should verify on the source hdfs directories to pick the json files and process to the target along with the sub folders if the target directory doesn't have the directories available. Source hdfs path:- /data/json/incoming/year=1970/month=01/day=18/$flow-file Target hdfs path:- /data/json/final/$path/$flow-file I am currently using listhdfs->updateattribute->movehdfs In update attribute I have provided property(filename): ${path}.substringAfterLast("/") path: ${filename:substringBeforeLast('/')} Movehdfs output directory has Output Directory - /data/json/final/${path} Could you please verify the flow and help to resolve the issue as I am unable to pull the files into target using movehdfs along with copying the sub-folders if it not already present.
... View more
Labels:
- Labels:
-
Apache NiFi
10-16-2019
01:58 PM
ListHDFS->RouteonAttribute->MoveHDFS Can the above process flow be worked by utilizing RouteonAttribute to split and process previous day pending file to HDFS path? Daily created Fileformat stream_es_2019_10_14.json1 stream_es_2019_10_15.json1 stream_es_2019_10_16.json1 If the filename is equal to today's date ignore to pick and process If the filename is lesser than today's date pick and move the file to defined folder path. Please help if this can be achieved.
... View more
10-16-2019
09:22 AM
I have a requirement to move the previous day processed and merged json files into new hdfs path. The requirement is to recursively search unprocessed files and move the pending unprocessed files.
Path 1 → /data/nifi/working/2019/10/source_2019_10_15.json — Daily processed files are merged under this path and gets added on daily basis.
Path 2 → /data/nifi/incoming/ — The code should search if folders doesn't exist then create and move the files are just move the files if the folders are already present.
Currently, I am using nifi flow -- ListHDFS→ MoveHDFS but unable to achieve it.
Need help how this can be achieved.
Thank you for the help.
... View more
Labels:
- Labels:
-
Apache NiFi
10-16-2019
05:07 AM
@Shu_ashu Thank you for the solution. I have got the issue resolved as it is working as expected.
... View more
10-15-2019
01:43 PM
nifi Json data using routeonattributeto to split a...
I am currently working to consume data using Nifi related to tealium event stream and load into HDFS. Need help in filtering the data when source misses to send data for json attribute.
{"account":"newt","twitter:description":"Discover when your favorite New TV shows and hosts are being shown. ","og:locale":"en_US","dcterms:publisher":"New TV","original-source":"www.newtv.com/","og:url":"www.newtv.com/show/program-guide"}},"post_time":"2019-10-09 11:27:46","useragent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36","event_id":"12345"}
Sample above message. I am currently stuck with filtering the data when source misses to send data for event_id attribute.
Current Nifi flow, Consume Kafka -> Evaluate Json Path -> Jolttransform Json -> Evaluate Json Path-> RouteOnAttribute -> Merge Content -> Evaluate Json Path -> Update attribute -> PutHDFS ->MoveHDFS
Need help how to split data using RouteOnAttribute to split missing event_id or event_id value to two different flows.
Flow 1 - To split available event_id attribute.
Flow 2 - Missing Event_ID or Event_ID attribute values to treat as error and load into different flow.
... View more
Labels:
- Labels:
-
Apache NiFi