About Shu_ashu

faitamrou_90 · ‎07-07-2018

@ShuThank you very much

Shu_ashu · ‎07-05-2018

@Markus Wilhelm I don't think we can make NiFi to read kerberos configs to read by default but you can make use of Process group variables in your HDFS processor configs and define the variables scope as NiFi Flow so that you can use same variables across all the processors in NiFi instance. You can copy hdfs-site.xml,core-site.xml to nifi lib path and restart nifi, then you don't have to specify the path because nifi will load all the .xml from lib path, but it's not recommended way of approach because if you want to change some configs in either of these two xml files then we need to restart NiFi to take those changes in to effect in NiFi instance. Refer to this link regarding Process Group variables in NiFi and refer to this link regarding copying xml files into nifi lib.

loveseabetter · ‎05-29-2019

How to split complexed json arrays into individual json objects with SplitJson processor in NIFI? I don't know how to configure the relationship original, split, failure. Json arrays is below { "scrollId1": "xyz", "data": [ { "id": "app-server-dev-glacier", "uuid": "a0733c21-6044-11e9-9129-9b2681a9a063", "name": "app-server-dev-glacier", "type": "archiveStorage", "provider": "aws", "region": "ap-southeast-1", "account": "164110977718" }, { "id": "abc.company.archive.mboi", "uuid": "95100b11-6044-11e9-977a-f5446bd21d81", "name": "abc.company.archive.mboi", "type": "archiveStorage", "provider": "aws", "region": "us-east-1", "account": "852631421774" } ] } I need to split it into { "id": "app-server-dev-glacier", "uuid": "a0733c21-6044-11e9-9129-9b2681a9a063", "name": "app-server-dev-glacier", "type": "archiveStorage", "provider": "aws", "region": "ap-southeast-1", "account": "164110977718" }, { "id": "abc.company.archive.mboi", "uuid": "95100b11-6044-11e9-977a-f5446bd21d81", "name": "abc.company.archive.mboi", "type": "archiveStorage", "provider": "aws", "region": "us-east-1", "account": "852631421774" } Next, I need to insert another field "time" in front of "id", the first attribute of individual object. I used SplitJson processor, and JSON Path Expression is $.data.id.*, but the relationship reports error. Don't know how to config relationship branches, original, split and failure. Any one have any advice? @Shu

MattWho · ‎07-09-2018

@Derek Calderon - Short answer is no. The ExecuteSQL processor is written to write the output to the FlowFile's content. - There is an alternative solution. You have some processor currently feeding FlowFiles to your ExecuteSQL processor via a connection. My suggestion would be to feed that same connection to two different paths. The first connection feeds to a "MergeContent" processor via a funnel and the second feeds to your "ExecuteSQL" processor. The ExecuteSQL processor performs the query and retrieves the data you are looking for writing it to the content of the FlowFile. You then use a processor like "ExtractText" to extract that FlowFIles new content to FlowFile Attributes. Finally you use a processor like "ModifyBytes" to remove all content of this FlowFile. Finally you feed this processor to the same funnel as the other path. The MergeContent processor could then merge these two flowfiles using the "Correlation Attribute Name" property (assuming "filename" is unique, that could be used), min/max entries set to 2, and "Attribute Strategy" set to "Keep All Unique Attributes". The result should be what you are looking for. - Flow would look something like following: Having multiple identical connections does not trigger NiFi to write the 200 mb of content twice to the the content repository. a new FlowFile is created but it points to the sam content claim. New content is only generated when the executeSQL is run against one of the FlowFiles. So this flow does not produce any additional write load on the content repo other then when the executeSQL writes its output which i am assuming is relatively small? - Thank you, Matt

Shu_ashu · ‎07-04-2018

@Vengai Magan Please refer to this and this links describes how to install NiFi as Service and this link to setup high performance NiFi.

vscherbakov · ‎06-27-2018

Perfect! Thanks! I'll try QueryDatabaseTable for it. It'll be better!

Shu_ashu · ‎06-27-2018

@Vladislav Shcherbakov Before ReplaceText processor use EvaluateJsonPath processor to extract the json values, keep as flowfile attributes. Add all your properties(case sensitive) in this processor and keep the destination as flowfile-attribute then feed the success relationship from EvaluateJsonpath to Replace text processor. Flow: --- --- other processors 3.SplitJson 5.EvaluateJsonPath 6.ReplaceText

Shu_ashu · ‎06-27-2018

@Raj ji You can use ExecuteProcess (doesn't allow any incoming connections) (or) ExecuteStreamCommand processors to trigger the shell script. ExecuteProcess configs: As your executable script is on Machine 4 and NiFi installed on Machine1 so create a shell script on Machine 1 which ssh into Machine 4 and trigger your Python Script. Refer to this and this links describes how to use username/password while doing ssh to remote machine. As you are going to store the logs into a file, so you can use Tail file processor to tail the log file and check is there any ERROR/WARN, by using RouteText Processor then trigger mail. (or) Fetch the application id (or) application name of the process and then use yarn rest api to get the status of the job Please refer to how to monitor yarn applications using NiFi and Starting Spark jobs directly via YARN REST API and this link describes yarn rest api capabilities.

Shu_ashu · ‎06-29-2018

@Murat Menteşe As your xml doc having array [] in it and i'm not sure how to write matching xslt. As the current xslt converts the array xml into object/element and adding "" for array[]. In case of large data you have to increase the Maximum Buffer Size 1 MB //increase based on your flowfile size as this processor works takes the whole flowfile into memory and does all replacing based on our configs.

Shu_ashu · ‎06-29-2018

@Amira khalifa Use one of the way from the above shared link to take out only the header from the csv file then in replace text keep the then search for (&|$|$|\/_|\s) and in Replacement value keep as empty string, now we are searching for all the special characters in the header flowfile then replacing with empty string.Now add this header flowfile with the other non header flowfile. all the explanation and template.xml are shared in this link.

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: replace an empty values with "N/A" using repla...

Re: Using HDFS processors without additional confi...

Re: Split JSON flow file into JSON objects

Re: Is there any way to route the result of Execut...

Re: Best way to parse Fixed width file using Nifi....

Re: ExecuteSQL reads data from database an infinit...

Re: ReplaceText Processor

Re: Executing Shell script to a remote machine in ...

Re: nifi - incorrect json format correction

Re: ReplaceText Processor edit only the header of ...