About Shu_ashu

Shu_ashu · ‎03-24-2018

@Shantanu kumar 1/1 is nothing but daily it is similar to * i.e 0 0-1 10 * * ? * (or) 0 0-1 10 1/1 * ? *. It runs daily at 10:00,10:01 am, you can create/check all cron tab expressions in the following link. http://www.cronmaker.com/ For more details about cron expression https://community.hortonworks.com/questions/63513/helping-setting-up-cron-based-nifi-processor.html https://nifi.apache.org/docs/nifi-docs/html/user-guide.html

Shu_ashu · ‎03-24-2018

@Mark McGowan You can do merging by using Correlation attribute name property in merge content processor, to use this property we cannot use convert record processor because each flowfile we need to keep date,hour attrbiutes as one attribute and use the attribute in merge content correlation attribute name. Then the merge content processor will merge with same attribute flowfiles into one flowfile. Please refer to below links for more details about correlation attribute usage https://community.hortonworks.com/questions/161827/mergeprocessor-nifi-using-the-correlation-attribut.html https://community.hortonworks.com/questions/55926/using-the-mergecontent-processor-in-nifi-can-i-use.html https://community.hortonworks.com/questions/87178/merge-fileflow-files-based-on-time-rather-than-siz.html For reference Convert record xml 178086-json-to-csv.xml

Shu_ashu · ‎03-24-2018

@Shantanu kumar Yes, if you change threshold value to 1 min then we need to run Monitor Activity processor for 2 mins i.e 0 0-1 10 1/1 * ? *, for the first minute checks flow's activity for 1 minute and second minute(10:01am) to trigger a inactivity flowfile. Now processor will run at 10:00am with this configuration window will be 1 min and checks if there are any flowfiles and 10:01 am(sends an flowfile for inactivity relation if there are no flow files).

Shu_ashu · ‎03-24-2018

@Shantanu kumar You have to run Monitor Activity processor until 10:05am UTC because threshold value is set to 5 min, if you run monitor activity processor at 10 am then processor run at 10:00am only not at 10:01am, so there is no window has been created for 5 min for monitor activity processor to know is there any flow files for last 5 min (or) not. Change the schedule of Monitor Activity processor to 0 0-5 10 1/1 * ? * now processor runs at 10:00 am,10:01am,10:02am,10:03am,10:04am after 10:04am processor has been ran for 5 mins,now at 10:05am run processor sends an inactivity flow file and triggers your ExecuteSql Processor.

Shu_ashu · ‎03-22-2018

@Sami Ahmad Query Database table processor stores the last state value(if we mention Maximum-value Columns) and run incrementally based on your run schedule.If no columns are provided in Maximum value columns then all rows from the table will be considered, which could have a performance impact. NOTE: It is important to use consistent max-value column names for a given table for incremental fetch to work properly. So let's consider you have some incremental column in your source table and you have mentioned incremental column as Maximum value column in query databasetable processor. For the first time this processor pulls all the records and updates the state (consider your last state is 2018-03-22 19:11:00), for the next run this processor only pulls the columns that have incremental column value more than stored state value i.e 2018-03-22 19:11:00. if there are no records that have new updated incremental column value then this processor won't return any flowfiles(because no records got added/updated). for more reference about querydatabasetable https://community.hortonworks.com/articles/51902/incremental-fetch-in-nifi-with-querydatabasetable.html In your screenshot you have connected success and failure relationships again to puthdfs processor ,even if the flowfiles has successfully stored into hdfs then those flowfiles route to success relation and you have looped back it to same processor again which will try to keep the same file again and again if the conflict resulution strategy is set to fail then you will end up with filling logs with this error. it's better to use retry loop for failure relation and for success relation just auto terminate (or) drag a funnel and feed success relation to funnel. Retry loop references https://community.hortonworks.com/articles/51902/incremental-fetch-in-nifi-with-querydatabasetable.html There are some links which can gives insights how put hive streaming processor works https://community.hortonworks.com/questions/84948/accelerate-working-processor-puthivestreaming.html https://issues.apache.org/jira/browse/NIFI-3418 https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest

Shu_ashu · ‎03-22-2018

@Sami Ahmad If you are using NiFi 1.2+ then the below link help you to speed up the process, https://community.hortonworks.com/questions/108049/puthiveql-and-puthivestreaming-processors-in-apach.html (or) if you are running Querydatabase table processor more often(0 sec,1 sec..<15 mins) then we are going to end up with lot of smaller files in hdfs directory. in this case it's better to use merge content processor with merge format as avro after query database table and merge the small files into one big file keep age off duration to flush out the merged files after certain amount of time Merge content configs reference:- https://community.hortonworks.com/questions/149047/nifi-how-to-handle-with-mergecontent-processor.html Hive to do the conversion from Avro to ORC:- You can store into tmp location(as avro format) once you pull the data by using PutHDFS after Querydatabase table processor and use puthiveql processor to get the data from temp location to insert into final location(orc format). https://community.hortonworks.com/questions/135824/in-nifi-the-convertavrotoorc-processor-is-extremel.html (or) NiFi to do the conversion from Avro to ORC:- After Querydatabase table processor use SplitAvro Processor if you want to split into chunks of data then use ConvertAvroToOrc processor then use PutHDFS processor to store the orc files into HDFS directory. Create an external table on the HDFS directory. https://community.hortonworks.com/articles/87632/ingesting-sql-server-tables-into-hive-via-apache-n.html Use the attached xml for reference for this method store-hdfs-178391.xml Let us know if you are facing any issues ..!!

Shu_ashu · ‎03-22-2018

@umang s That's expected behavior from Hbase table because we are having unique Row Key for each record in Hbase table now. But your data that you are writing to this category1 table is duplicated data and Hbase only overwrites the existing data if it found same Row key already exists in the table. But in your case we are using UUID i.e unique id for the each flowfile in NiFi, so we will have unique id for each flowfile (although the content of the flowfile is same).

Shu_ashu · ‎03-22-2018

@umang s Are you using UUID as Row Identifier? Could you please share your PutHbaseCell processor configs..

Shu_ashu · ‎03-22-2018

@umang s I think your input json messages are enclosed in an array [] like [{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}] In this case use Split Json processor before PutHbasecell processor with below configs Use Splits relation from splitjson processor to PutHbase cell processor in this case Split json processor splits array of json messages to individual messages. Input:- [{"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"},{"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}] Output:- flowfile1:- {"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"} Flowfile2:- {"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}

Shu_ashu · ‎03-22-2018

@umang s You can use PutHbasecell processor for this use case and keep the Row Identifier as UUID then you can get json format message inserted for the uuid. Example:- my input json document is {"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"} PutHbasecell configs:- as you can see in the above screenshot i'm having Row Identifier as ${UUID()} because this uuid is unique for each flowfile in NiFi so that we are not overwriting any existing data in hbase table. Output:- hbase(main):008:0> scan 'test' ROW COLUMN+CELL c7ca74ad-4933-4340-a9c7-e55370a4501b column=category:category:details, timestamp=1521711352302, value={"id" : "1334134","name" : "Apparel Fabric","path" : "Arts, Crafts & Sewin g/Fabric/Apparel Fabric"} 1 row(s) in 0.1130 seconds Case2:- If your input json document is {"id":"1334134","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"}, {"id":"412","name":"Apparel Fabric","path":"Arts, Crafts & Sewing/Fabric/Apparel Fabric"} Then in hbase the document looks like

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: Monitor Activity Processor Not generating any ...

Re: JSON split and then converted to CSV before be...

Re: Monitor Activity Processor Not generating any ...

Re: Monitor Activity Processor Not generating any ...

Re: hive table loading in NIFI extremely slow

Re: hive table loading in NIFI extremely slow

Re: How to put Json data as a Json format in HBase

Re: How to put Json data as a Json format in HBase

Re: How to put Json data as a Json format in HBase

Re: How to put Json data as a Json format in HBase