Member since
06-08-2017
1049
Posts
518
Kudos Received
312
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 11279 | 04-15-2020 05:01 PM | |
| 7170 | 10-15-2019 08:12 PM | |
| 3156 | 10-12-2019 08:29 PM | |
| 11615 | 09-21-2019 10:04 AM | |
| 4394 | 09-19-2019 07:11 AM |
04-29-2018
12:07 AM
2 Kudos
@Abhinav Joshi
To filter out specific email use RouteonAttribute Processor. Flow:- Consume Imap/ConsumePop3 processors --> ExtractEmailHeader Processor --> RouteOnAttribute(filter required email) -->ExtractEmailAttachment(use attachments relation to feed) --> PutHDFS
Example:- Consume Pop3 Processors:-
Configure username,password,change the required settings in gmail and add properties to the processor as mail.pop3.socketFactory.class javax.net.ssl.SSLSocketFactory mail.pop3.socketFactory.fallback false
use this link for more details about configurations. Consume Imap processor:- Configure the processor as shown above use this link for more details about configurations. Extract Email Headers:- Change the configs of the processor if you want to add additional headers. Once the flowfile processed by this processor then below list of attributes will be added to the flowfile. Name Description email.headers.bcc.* Each individual BCC recipient (if available) email.headers.cc.* Each individual CC recipient (if available) email.headers.from.* Each individual mailbox contained in the From of the Email (array as per RFC-2822) email.headers.message-id The value of the Message-ID header (if available) email.headers.received_date The Received-Date of the message (if available) email.headers.sent_date Date the message was sent email.headers.subject Subject of the message (if available) email.headers.to.* Each individual TO recipient (if available) email.attachment_count Number of attachments of the message RouteOnAttribute:- Add new property Required mail ${anyMatchingAttribute("email.headers.from.*"):contains("test"):and(${email.attachment_count:gt(0)})}//if any of theattribute contains(i.e substring) test in it and email.attachment_count greater than 0 (or) ${anyMatchingAttribute("email.headers.from.*"):equals("test <test@gmail.com>"):and(${email.attachment_count:gt(0)})} //we are comparing from.* with equals in this case we are checking with exact email from value and attachement count is greater than 0 what is anyMatchingAttribute("email.headers.from.*")?
Checks to see if any of the given attributes, match the given condition is email.headers.from.0, email.headers.from.1..etc As per your requirement you can add condition to filter out the specific email. Refer this link for more details about NiFi expression language. ExtractEmailAttachements:- Use this process to extract the email attachements and each attachement will be splitted into individual flowfiles. Then use put hdfs processor to store the attachement into HDFS directory. In addition if you want to even filter out only the required filenames then use any of the below attributes in RouteOnAttribute processor. Below are the list of attributes that are added by ExtractEmailAttachements processor Name Description filename The filename of the attachment email.attachment.parent.filename The filename of the parent FlowFile email.attachment.parent.uuid The UUID of the original FlowFile. mime.type The mime type of the attachment.
... View more
04-26-2018
11:15 PM
1 Kudo
@Praveen Bathala
Could you try considering PublishKafka_0_10 (or) PublishKafka processors which are not going to force you to specify Record Reader/Writer.
... View more
04-26-2018
06:35 AM
@Atif
Tariq
No Probs ..!! If the Answer addressed your question, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
... View more
04-26-2018
06:00 AM
1 Kudo
@Atif
Tariq
Case1:- If your file is in local filesystem then you can use get file processor Input Directory /demo/data
File Filter cdrs.txt Keep Source File false //if set to false delete the file after fetch,if true don't delete the file Case2:- If your file is in Hadoop File system(HDFS) then you need to use GetHDFS processor to pull the file from Hdfs to NiFi. If you are using kerberos then you need to specify principal and keytab values in the properties. Hadoop Configuration Resources
<path to core-site.xml,hdfs-site.xml> //we need to provide the file path for core-site.xml,hdfs-site.xml here Directory
/demo/data Keep Source File
false //if set to false delete the file after fetch,if true don't delete the file
File Filter Regex cdrs.txt Let us know if you are having facing issues..!!
... View more
04-26-2018
05:38 AM
1 Kudo
@Abhinav Joshi Which version of NiFi are you using? In NiFi 1.1 Columns to Return property doesn't support expression language probably that's the reason why you are having issues while adding '${now():toNumber():format('yyyy-MM-dd HH:mm:ss')}' LOAD_TMS this expression. If your columns to return doesn't support expression language then one of the possible work around is convert to json then add field again convert back to avro. In New versions of i think from NiFi 1.2+ Columns to Return property supports expression language and also we are having update record processor by using this processor we can add new fields to the flowfile content without converting them.
... View more
04-25-2018
09:11 AM
1 Kudo
@Mark
Method1:- Add the validation query in your Hive Connection Pool controller service as Validation query
select current_timestamp (or) select 1
This Validation query used to validate connections before returning them. When a borrowed connection is invalid, it gets dropped and a new valid connection will be returned.
Method2:- By using restApi we can stop the processor and disable/enable the connection pool controller service as we can schedule this Script:- 1.Stop the HiveQL Processor 2.Disable Hive Connection Pool 3.Enable Hive Connection Pool 4.Start HiveQL Processor I have answered similar kind of issue in this link with all detailed explanation about restapi commands, please use the link as reference. _ If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
... View more
04-25-2018
01:18 AM
1 Kudo
@Raj ji
You can use Execute Process (or) Execute Stream Command processors to pass arguments to the shell script. Execute Process Processor:- This processor won't need any upstream connections to trigger the script i.e this processor can run its own based on the schedular. Example:- I'm having sample script which gets 2 command line arguments and echo output. bash$ cat sample_script.sh
#!/bin/bash
echo "First arg: $1"
echo "Second arg: $2" Execution in terminal:- bash$ ./sample_script.sh hello world
First arg: hello
Second arg: world 1.Execution in NiFi using ExecuteProcess Processor:- Command bash Command Arguments /tmp/sample_script.sh hello world //here we are triggering the shell script and passing arguments with space Batch Duration
No value set
Redirect Error Stream false Argument Delimiter space //by default if Argument Delimiter is ; then command arguments would be /tmp/sample_script.sh;hello;world Configs:- Success relation from ExecuteProcess will output the below as content of flowfile First arg: hello
Second arg: world 2.Execution in NiFi using ExecuteStreamCommand processor:- This processor needs some upstream connection to trigger the script. Flow:- We have used generateflowfile processor as a trigger to ExecuteStreamCommand script Generateflowfile Configs:- Added two attributes arg1,arg2 to the flowfile ExecuteStreamCommand processor:- Command Arguments
${arg1};${arg2} Command Path
/tmp/sample_script.sh
Argument Delimiter
;
Now we are using the attributes that added in generateflowfile processor and passing them to the script. Use the OutputStream relation from ExecuteStreamCommand processor and the output flowfile content would be same First arg: hello
Second arg: world By using these processors you can trigger the shell script and pass the arguments also. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
... View more
04-24-2018
01:52 PM
@Prakhar Agrawal Connect with "Group1_Port1" input port that are also placed inside my "Group1" process group? It's not possible because Group_port1 is an input port inside processor group but Root_port1 is an remote processor group because it is on root canvas level. Root_port1,Root_port2 is used to distribute the data across NiFi instance that's the reason why you are able to see all the root canvas input ports(Root_port1,Root_port2) in the drop down list. Group_port1 is an input port which is used to transfer the data into process group. Please refer to this link to understand more about process groups and remote process groups.
... View more
04-24-2018
12:04 PM
1 Kudo
@ibrahima
diattara
Could you once check the nifi-app.log to know which processor is causing the issue and 1.if issue is because of java heap space then increase the heap space in bootstrap.conf file to 8g or more(if you are having) then restart the nifi server. java.arg.2=-Xms8g java.arg.3=-Xmx8g please follow this article for more details about the configs. 2.if this issue is causing my some specific processor(like invokehttp,scripted processors ..etc) then delete those specific processors in flow.xml.gz(1.take backup of flow.xml.gz 2.gunzip <flow.xml.gz> 3.vi flow.xml 4.delete the processor causing the issue and all connections to that processor 5.gzip <flow.xml>) 3.if you are not caring about your any of the processors in your NiFi instance and need NiFi UI back to normal then delete (or) rename the flow.xml.gz to some other name not as flow.xml.gz in conf directory then restart NiFi. - If the Answer helped to resolve your issue,Click on Accept button below to accept the answer,That would be great help to Community users to find solution quickly for these kind of issues.
... View more
04-24-2018
11:36 AM
1 Kudo
@Prakhar Agrawal is there any way to connect getfile processor to remote process group using input port which I use inside my process group? Input Ports provide a mechanism for transferring data into a Process Group, if you wanto use input port to remote processor group then you need to keep another processor group inside group1 process group. Flow:- 1.Group1 Processor Group 1.1GetFile --> 2.ProcessGroup (2.1.InputPort --> 2.2.Remote Processor Group) Now we have configured Remote Processor Group using InputPort. Coming Back to your original Question Inside Group1 process group use GetFile processor --> Remote processor Group
Add the myport inputport at root canvas level, in remote processor group select the remote port as myport Now we have configured the Remote port of remote processor group as myport on root canvas. To get the data from myport remote port use input port inside group1 processor group as shown in the first screenshot and connect input port to PutFile processor. Now goto root canvas and connect the myport remote port to inputport By using these configurations we are getting the file and feeding that to Remote Processor Group and again using input port inside processor group we are putting file in desired location.
... View more