About Shu_ashu

Shu_ashu · ‎06-27-2018

@Raj ji You can use ExecuteProcess (doesn't allow any incoming connections) (or) ExecuteStreamCommand processors to trigger the shell script. ExecuteProcess configs: As your executable script is on Machine 4 and NiFi installed on Machine1 so create a shell script on Machine 1 which ssh into Machine 4 and trigger your Python Script. Refer to this and this links describes how to use username/password while doing ssh to remote machine. As you are going to store the logs into a file, so you can use Tail file processor to tail the log file and check is there any ERROR/WARN, by using RouteText Processor then trigger mail. (or) Fetch the application id (or) application name of the process and then use yarn rest api to get the status of the job Please refer to how to monitor yarn applications using NiFi and Starting Spark jobs directly via YARN REST API and this link describes yarn rest api capabilities.

Shu_ashu · ‎06-27-2018

@Cody kamat While running hive import target-dir argument value controls where the data needs to store temporarily before loading into Hive table, but target-dir doesn't create hive table in that location. If you want to import to specific directory then use target-dir without hive-import argument and create hive table on top of HDFS directory. (or) Create Hive external table pointing to your target-dir then in sqoop import remove --create-hive-table argument and --target-dir. For more info refer to this HCC thread regarding the same issue.

Shu_ashu · ‎06-27-2018

@Amira khalifa As suggested by @anarasimham With Start of the string(^) in Replace text processor should match only the first line. Make sure you are having matching regex that exclude special character. Example: in the above configs i'm matching only the first line in the flowfile and adding new to the first line only and all the other contents will be untouched. input: hi hello Output: newhi hello (or) You can use one of the way that i have suggested in this link, please refer to the shared link and choose the method that will best fit for your case. If you are still having issues please share some sample data with header and the expected output?

Shu_ashu · ‎06-26-2018

@Marco Springer There is similar Hcc thread regarding list processors in NiFi, please refer to this link for more details. Let us know if you have additional questions..!!

Shu_ashu · ‎06-26-2018

Even though you are having reduced json still you can use MergeRecord processor to merge single json messages into an array of json messages by using MergeRecord processor with JsonTreeReader/JsonSetWriter controller services, Configure Min/Max number of records per flowfile and use Max Bin Age property as wildcard to eligible bin to merge. Then feed the Merged Relationship to PutHBaseRecord processor(give the row identifier field name from your json message) as the purpose of Record oriented processor is to work with Chunks of data to get good performance instead of working with one record at a time.

Shu_ashu · ‎06-26-2018

@Murat Menteşe You can use ReplaceText processor after TransformXml processor then add the matching regex excluding "(quotes) before/after array []. ReplaceText Configs:- Search Value (.*)"(\[.*\])"(.*) Replacement Value $1$2$3 Character Set UTF-8 Maximum Buffer Size 1 MB //increase the size according to your flowfile size Replacement Strategy Regex Replace Evaluation Mode Entire text Input:- { "soap:Envelope": { "soap:Body": { "Musteri_Hiyerarsi_TablosuResponse": { "Musteri_Hiyerarsi_TablosuResult": "[ { "UNIQ_KEY": 740281.0, "TTALT": 112.0, "TTAD": "TEST" } ]" } } } } Output: valid json { "soap:Envelope": { "soap:Body": { "Musteri_Hiyerarsi_TablosuResponse": { "Musteri_Hiyerarsi_TablosuResult": [ { "UNIQ_KEY": 740281.0, "TTALT": 112.0, "TTAD": "TEST" } ] } } } }

Shu_ashu · ‎06-25-2018

@Ferrero Rocher Use InvokeHTTP processor which allows incoming connections instead of GetHTTP processor Flow:- 1.GetHTTP 2.SplitJson //split array $.* 3.EvaluateJsonPath //to extract id value and keep as state attribute 4.InvokeHTTP //http://${DOMAIN}/api/states/${state}/municipalities

Shu_ashu · ‎06-25-2018

@Faisal Durrani Use Record oriented processor PutHbaseRecord instead of PutHbaseJson. PutHBaseRecord processor works with chunks of data based on the Record Reader(Json Tree Reader) specified and you can send array of json messages/records to the processor, based on the record reader controller service processor reads and put the json messages/records into HBase. Adjust the batch size as you can get good performance Batch Size 1000 The maximum number of records to be sent to HBase at any one time from the record set. Refer to this link to configure Record Reader controller service.

Shu_ashu · ‎06-24-2018

@Gourav Bhattacharya GetFTP processor once triggered then it will get all the files from the directory based on your confiurations. To control the rate of Fetching files use ListFtp processor(lists 0 byte flowfiles) instead of GetFtp processor then use Control rate processor to control the rate of flowfiles processing through the Control Rate Processor then feed the Success relation to FetchFTP processor. Flow: 1.ListFTP 2.ControlRate //control the rate of flowfiles 3.FetchFTP in addition refer to this link to get the data and process one flowfile at a time without using Control Rate Processor.

Shu_ashu · ‎06-21-2018

@rajat puchnanda Method 1: You can use Query record processor and add new dynamic property with value as sql query like select id,count(*) from flowfile group by id and the processor will run the sql query on the flowfile content then gives the result as output flowfile. Refer to this link to get more details regarding configuration of Query Record processor. Method2: If you are thinking to Group by all like records then you can use Partition Record processor and specify the record path that you want to group by then the processor will groups all the like records into inidividual groups. Refer to this link to get more details regarding configuration of Partition Record processor.

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	512

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: Executing Shell script to a remote machine in ...

Re: Running a sqoop job with a --target-dir destin...

Re: ReplaceText Processor edit only the header of ...

Re: Why do the List type processors not accept inc...

Re: Puthbasejson performance optimization

Re: nifi - incorrect json format correction

Re: Once I split a JSON Array, how to use each spp...

Re: Puthbasejson performance optimization

Re: How to use GETFTP processor in cloud installed...

Re: How to apply group by on json or csv records i...