About Shu_ashu

Shu_ashu · ‎04-17-2018

@Rahoul A You can use Control Rate processor before PutHDFS processor and configure Control Rate Processor to release flowfile for the desired time like 1 min ..etc. if you need to append data to the file then we need to make sure we are having same filename to get same filename every time we can use Update attribute processor to change the filename and in PutHDFS processor we need to configure the below property Conflict Resolution Strategy append //if processor finds same filename it appends the data to the file. Control Rate Processor configs:- By using these configurations we are releasing 1 flowfile for every one minute so at any point of time we are going to have one node write/append data to the file. Flow:- other Processors --> ControlRate Processor --> PutHDFS

Shu_ashu · ‎04-14-2018

@Laurie McIntosh If you are having more than one file to process then use Merge Content processor after GetFile processor,Merge Content processor merges more than one file into one file. Flow:- Get File --> MergeContent -->Truncate Table --> insert csv into Table --> clean up by using merge content processor we are processing one file at a time even though you are having more than one file. PutDatabaseRecord processor(all record based processors) are pretty powerful in NiFi which can handle millions of records. Please refer to below links to know how to configure merge content processor https://community.hortonworks.com/questions/64337/apache-nifi-merge-content.html https://community.hortonworks.com/questions/161827/mergeprocessor-nifi-using-the-correlation-attribut.html https://community.hortonworks.com/questions/149047/nifi-how-to-handle-with-mergecontent-processor.html in addition there are wait and notify processors in NiFi, which Routes incoming FlowFiles to the 'wait' relationship until a matching release signal is stored in the distributed cache from a corresponding Notify processor. When a matching release signal is identified, a waiting FlowFile is routed to the 'success' relationship. http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/

Shu_ashu · ‎04-14-2018

@Mahendra Hegde Increase the below properties to at least 10,15 secs and try to run the processor again because as you have configured the processor with 3,5 secs now which is pretty low value for timeouts, there is a chance when you are making a call the client probably client is busy with handling other requests and if you are not able to connect to the client with in 3 secs then the request will be routed to NoRetry relation. Connection Timeout 10 secs Max wait time for connection to remote service. Read Timeout 15 secs Max wait time for response from remote service. The flowfile will be transferred to NoRetry relationship when there are 1xx, 3xx, 4xx status codes as you can view the attributes in the flowfiles that routed to NoRetry relation you will have the status code and error that you are getting when the call made to the service. RestApi Error codes: CATEGORY DESCRIPTION 1xx: Informational Communicates transfer protocol-level information. 2xx: Success Indicates that the client’s request was accepted successfully. 3xx: Redirection Indicates that the client must take some additional action in order to complete their request. 4xx: Client Error This category of error status codes points the finger at clients. 5xx: Server Error The server takes responsibility for these error status codes. Please refer to this link for more details about restapi error codes. if you still having issues with the above configs and you are getting responses from invoke http processor after couple of retries then try to increase the values again ,try to make call again.

Shu_ashu · ‎04-14-2018

@Laurie McIntosh If you are using NiFi1.5+ then you can use PutSQL processor with SQL Statement property as your Truncate statement,as this processor won't change the contents of flowfile, so we can use PutDatabaseRecord processor to prepare SQL statements and finally use PutSQL processor for post insert. jira addressing SQL statement property in PutSQL processor, https://issues.apache.org/jira/browse/NIFI-4522 Flow:- If you are running Prior version of NiFi1.5 then use executescript processor to run truncate statement on the target database and then use PutDatabaseRecord and PutSQL processors. Please refer to this link for more details. . If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎04-14-2018

@Merlin Sundar One way of doing this is by using publish kafka record processor which can read the incoming flowfile data and writes the required fileds into topic. Fork the success relation from GetFile processor and then use two Publish Kafka Record processors to Publish messages into Kafka topic1 and kafka topic2. Flow:- First PublishKafkaRecord processor configure the Record reader as CsvReader to read your incoming file(with 36 fields) and Record Writer as CsvSetWriter to write only to required four fields to KafkaTopic1. Second PublishKafkaRecord processor configure the Record reader as Csv Reader same as above to read your 36 fields and Record Writer as CsvSetWriter to write only to required thirty fields to KafkaTopic2. By using this way we are splitting the fields and publishing them to the desired kafka topics. (or) if you want to publish all the fields into one topic using publishkafka_0_11 processor then want to split into two, then Use ConsumeKafka Processor to consume the 36 fileds message and use two PublishKafkaRecord_0_11 processors with Same CsvReader as record reader and different CSVSetWriters with 4fields and 30fileds. Now we are Consuming the messages and writing them to two different kafka topics that are configured with different csvsetwriter controller services. Flow:- (or) if you want to publish all the fields into one topic using publishkafka_0_11 processor then want to split into two, then Once you publish the messages to Kafka topic by using Publish Kafka 0_11 processor then use Consume kafka processor to consume the published kafka messages from the topic. Then use Two Convert record processors in parallel with same csv reader controller service and two different Csv Set Writer controller services because we need to write 4 fields to kafka1 topic and 30 fields to kafka2 topic. then use two PublishKafka_0_11 processors in parallel to publish the prepared messages from the two convertrecord processors. Flow:- As you can choose which is the best fit for your case from the above three approaches. Please refer to below links to configure/use the Record Reader and Record writer properties https://community.hortonworks.com/articles/115311/convert-csv-to-json-avro-xml-using-convertrecord-p.html

Shu_ashu · ‎04-13-2018

Thanks @jwitt, i missed the first step of splitting, i thought its new flowfile for each message.

Shu_ashu · ‎04-13-2018

@Saikrishna Tarapareddy Step1:Split Text processor:- Configure split text processor with Line Split Count 1 (or) Use Split Content processor with Byte Sequence Format Text Byte Sequence shift+enter Use splits relation from the above processors feed to merge content processor. Splits relation will have each line of your json file as a new flowfile content then you can use Merge Content processor to merge the json messages and make them as an array of json. Step2:Merge Content Processor configs:- Delimiter Strategy Text Header [ Footer ] Demarcator , Configure the processor for your desired Minimum number of entries and use Max Bin age as wild card. Please refer to below link for merge content processor configs. https://community.hortonworks.com/questions/64337/apache-nifi-merge-content.html https://community.hortonworks.com/questions/161827/mergeprocessor-nifi-using-the-correlation-attribut.html https://community.hortonworks.com/questions/149047/nifi-how-to-handle-with-mergecontent-processor.html

Shu_ashu · ‎04-12-2018

@JAy PaTel The issue is with you are storing the split results into splitlvl relation, However by using split function we are splitting out rawlvl relation into one,two relations and then you are keeping the results into splitvl relation. grunt> splitlvl = SPLIT rawlvl into one if(no>2andno<5),two if(no>5); Storing split function results into another relation(splitvl), is not a valid syntax for split function in pig Change your script to grunt> rawlvl = load '~/file'usingPigStorage(',')as(no:int,name:chararray,phno:int,add:chararray); grunt> SPLIT rawlvl into one if(no>2andno<5),two if(no>5); grunt> dump one; grunt> dump two; For more details about split function please refer to below link. http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#SPLIT Example:- Step1:- Loaded input file into pig grunt> rawlvl = load '/t.txt' using PigStorage(',') as (no:int,name:chararray,phno:int,add:chararray); grunt> dump rawlvl (1,aaa,123456,annotation) (2,bbb,234567,barber) (4,ddd,456789,federal) (3,ccc,345678,code) (4,ddd,456789,definition) (5,asd,545645,AcsToGlRestServices) (6,date,58314,filterlevel) (7,kssa,22334,timefield) (8,Bhi,2236,context) data is loaded into rawlvl relation. Step2:- Now split rawlvl relation into two relations i.e one,two grunt> SPLIT rawlvl into one if (no>2 and no<5),two if (no>5); Dump one relation grunt> dump one; (4,ddd,456789,federal) (3,ccc,345678,code) (4,ddd,456789,definition) Dump two relation grunt> dump two; (6,date,58314,filterlevel) (7,kssa,22334,timefield) (8,Bhi,2236,context) As you can view the output of one,two relations matching with your conditions specified. . If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎04-10-2018

@Nikhil R {"component":{"id":'controllerID',"name":"DBCPConnectionPool", 'state': 'DISABLED'},"revision":{"clientId":client,"version":11}} Change the quotes in your json payload keep everything in double quotes don't use single quotes. {"revision":{"clientId":"af22b3ab-0162-1000-9b81-cf7b337dfc47","version":1},"component":{"id":"c8b1c581-47f0-3773-e7a5-3eb739246f99",'state':'DISABLED'}}

Shu_ashu · ‎04-10-2018

@Nikhil R In your pay load you have to include your client id number for the controllerservice. Example:- I'm having a controller service in my nifi instance and we need to get the clientId for the controller service. Step1:How to find clientId value? To find out client id and version number use Developer tools (chrome, firefox etc) and perform any action (start, stop ...etc) for the controller service and look at the calls made for the Controller service id. for reference take a look in the below screenshot Click on Network 2.In filter keep your controller service id 3.Click on Response then you can find clientid,version Once you get all the values prepare your curl command to stop Controller Service. As my controllerservice id is c8b1c581-47f0-3773-e7a5-3eb739246f99 so i filtered out only this specific id and then go to any of the Put methods and go to Params tab on right side then you can get the clientId associated with the controllerservice. Step2:Prepare PayLoad Now include your clientid value in your payload, from the above screenshot my clientId value is af22b3ab-0162-1000-9b81-cf7b337dfc47 so i have included the same clientId in my payload and included my componentid i.e our controller service id {"revision":{"clientId":"af22b3ab-0162-1000-9b81-cf7b337dfc47","version":1},"component":{"id":"c8b1c581-47f0-3773-e7a5-3eb739246f99","state":"DISABLED"}} Step3:Prepare Curl api call:- Now we need to include payload and end point in one curl call. Use put method and the state as DISABLED(as we need to disable the controller service) in curl call. Curl api call:- curl -i -X PUT -H 'Content-Type:application/json' -d '{"revision":{"clientId":"af22b3ab-0162-1000-9b81-cf7b337dfc47","version":1},"component":{"id":"c8b1c581-47f0-3773-e7a5-3eb739246f99","state":"DISABLED"}}' http://localhost:9090/nifi-api/controller-services/c8b1c581-47f0-3773-e7a5-3eb739246f99 Once the api call is success then you can receive 200 response code and the controller service will be disabled. For your case change the the clientid and controllerserviceid make rest api call to disable the controller service. . If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: How to append HDFS file using putHDFS where Ni...

Re: nifi truncate before insert

Re: 'InvokeHTTP' processor issue

Re: nifi truncate before insert

Re: Split Attributes and pass into different Kafka...

Re: Modifying JSON file

Re: Modifying JSON file

Re: Closed: Pig SPLIT Syntax error, unexpected s...

Re: Stop Controller Service in nifi using rest api

Re: Stop Controller Service in nifi using rest api