About Shu_ashu

Shu_ashu · ‎10-08-2018

@Suresh Dendukuri EvaluateJsonPath processor accepts only valid json path's(ex:$.<key>..) as you are specifying Regex if you want to extract the values using Regex then use ExtractText processor and add your regex path in that processor. Configure EvaluateJsonPath processor as shown below and keep Destination as flowfile-attributes and add new properties projectid $.[0].projectid projetname $.[0].projetname startdate $.[0].startdate Output: As you see in the above screenshot attributes are added to the flowfile, This way we are assuming there are only one json message in array and extracting the first message values and storing as flowfile attributes. Cleaner way to do this task as follows: as you are having array of json messages and want's to extract the values of the json key's so split the array of json messages into individual messages by using SplitJson processor. SplitJsonConfigs: Then use EvaluateJsonPath processor: Configure the processor as show above this will work if you are having more than one json message in an array also.

Shu_ashu · ‎10-03-2018

@sadapa Try with \u003F(unicode for ?) hive> select('Hi How Are You\u003F'); +------------------+--+ | _c0 | +------------------+--+ | Hi How Are You? | +------------------+--+

Shu_ashu · ‎10-02-2018

@Pepelu Rico Use MergeContent and RouteOnAttribute processors before MonitorActivity Processor. Configure MergeContent processor Minimum Number of Entries 5 So now processor will wait for 5 flowfiles and only if there are 5 flowfiles then processor merges all the flowfiles into 1 and transfer the flowfile to merged relationship. Use Max Bin Age property as a wildcard to force the bin to be merged, if we won't configure this property then processor will wait infinite time until it reaches 5 flowfiles. Then use the Merged Relationship from MergeContent processor to RouteOnAttribute processor RouteOnAttribute Configurations: MergeContent processor adds merge.count attribute to the flowfile so use that attribute to check if the value is 5 then give this relationship to MonitorActivity(run this processor at 12:00) processor. Add new property in RouteOnAttribute processor ${merge.count:equals(5)} Flow: 1.other processors 2.MergeContent -->use merged relation 3.RouteOnAttribute -->use new property relation 4.MonitorActivity 5.PutEmail in case if you want sequential merging then refer to this link for flow and configurations.

Shu_ashu · ‎10-01-2018

@Pepelu Rico Use MonitorActivity processor for this case and Configure the processor as Cron Driven to run at 10:00AM,10:01AM Processor configs: CronDriven * 0,1 10 1/1 * ? * Threshold Duration 1 min //consider how much time elapse before considering the flow is inactive Continually Send Messages false Inactivity Message Lacking activity as of time: ${now():format('yyyy/MM/dd HH:mm:ss')}; flow has been inactive for ${inactivityDurationMillis:toNumber():divide(60000)} minutes //change this message as per your requirements Activity Restored Message Activity restored at time: ${now():format('yyyy/MM/dd HH:mm:ss')} after being inactive for ${inactivityDurationMillis:toNumber():divide(60000)} minutes Copy Attributes false //include all the flowfile attributes in the content of flowfile Monitoring Scope cluster //determine activeness of flow Reporting Node primary //specify which node to send mail in this way we are running Monitor Activity processor at 10:00AM,10:01AM, at 10:00AM processor runs every second and check is there any flowfile. at 10:01AM is to sent out mail for inactivity. In the same way use another monitorActivity processor to run at 12 to send alert mail. Flow: Use inactivity relationship to send out the alert email and Configure PutEmail processor. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎09-30-2018

@Thuy Le Another way is to write a script to extract all the values and keep them as attributes to the flowfile. Then use UpdateAttribute processor to delete all the Delete Attributes Expression <not_required_attrbutes>.*|<not_required_attrbutes>.* Then use AttributesToCSV processor with Regex Attributes Regular Expression .* Destination flowfile-content Refer to this link for more details regards to extraction of attributes dynamically from the json data. **keep in mind if we add significant number of attributes dynamically to the flow file, As attributes are hold in memory this will cause performance issues.

Shu_ashu · ‎09-28-2018

@Suhas Reddy All list processors in NiFi are stateful processors i.e. these processors stores the state until last time of execution and then the next run it will pull only the delta files(files added after the stored state) from the S3 buckets/directories. To check the state RightClick on the processor and go to view state then you will find the stored state of the processor and to clear off the state click on clear state button then the processor will run from start(list all the files from the bucket in the first run). Then in the next run will pull only the newly added files in the bucket. `Configure the ListS3 processor with all the mandatory properties then processor and schedule the processor to run then processor will get the files incrementally.` ListS3 Description: 1.Retrieves a listing of objects from an S3 bucket. For each object that is listed, creates a FlowFile that represents the object so that it can be fetched in conjunction with FetchS3Object. 2.This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎09-28-2018

@Gayathri Devi Yes, It's possible. Try with below query: hive> select concat("Week ",weekofyear(current_timestamp)) as weeknumber,date_format(date_add(date_sub(current_timestamp,pmod(datediff(current_timestamp,'1900-01-07'),7)),1),"MMMMM dd,yyyy")startday,date_format(date_add(current_timestamp,7 - pmod(datediff(current_timestamp,"1900-01-07"),7)),"MMMMM dd,yyyy") endday; +-------------+--------------------+--------------------+--+ | weeknumber | startday | endday | +-------------+--------------------+--------------------+--+ | Week 39 | September 24,2018 | September 30,2018 | +-------------+--------------------+--------------------+--+ if you want full name of day also then hive> select concat("Week ",weekofyear(current_timestamp)) as weeknumber,date_format(date_add(date_sub(current_timestamp,pmod(datediff(current_timestamp,'1900-01-07'),7)),1),"MMMMM EEEEE dd,yyyy")startday,date_format(date_add(current_timestamp,7 - pmod(datediff(current_timestamp,"1900-01-07"),7)),"MMMMM EEEEE dd,yyyy") endday; +-------------+---------------------------+---------------------------+--+ | weeknumber | startday | endday | +-------------+---------------------------+---------------------------+--+ | Week 39 | September Monday 24,2018 | September Sunday 30,2018 | +-------------+---------------------------+---------------------------+--+

Shu_ashu · ‎09-28-2018

@Hariprasanth Madhavan There are lot of ways to insert data into `HiveORC` table from NiFi. Method1: Using PutHiveStreaming Processor: Create transactional table and then feed the avro data to PutHivestreaming table. As HiveStreaming processor converts the avro format data into ORC format and regards to all delta files you can use major compaction to create one base file. Method2: ConvertAvroToORC in NiFi and store into HDFS: Use ConvertAvroToORC processor to convert the avro format data into ORC format. Store the data into HDFS and create an External hive table pointing to the same HDFS directory. Method3: Create Avro table and load from Avro table to ORC table: Based on the avro file we are having in NiFi we can create avro tables dynamically based on avro.schema. Create an orc table and after storing the avro data into HDFS use PutHiveQL processor to run insert into ORC table select * from Avro table Refer to this link for more details regards to create avro table dynamically. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎09-28-2018

@Gayathri Devi As you are having timestamp column so pass that column in the below query. Hive in built have weekofyear function to get week number We can create start day and end of day for the given timestamp column. Try with below query hive> select concat("Week ",weekofyear(current_timestamp)) as weeknumber,date_format(date_sub(current_Timestamp,pmod(datediff(current_timestamp,'1900-01-07'),7)),"MMMMM dd,yyyy") as startday,date_format(date_add(current_timestamp,6 - pmod(datediff(current_timestamp,"1900-01-07"),7)),"MMMMM dd,yyyy") as endday; +-------------+--------------------+--------------------+--+ | weeknumber | startday | endday | +-------------+--------------------+--------------------+--+ | Week 39 | September 23,2018 | September 29,2018 | +-------------+--------------------+--------------------+--+ (or) if you don't need formatting then use below query. hive> select weekofyear(current_timestamp) as weeknumber,date_sub(current_Timestamp,pmod(datediff(current_timestamp,'1900-01-07'),7))as startday,date_add(current_timestamp,6 - pmod(datediff(current_timestamp,"1900-01-07"),7))as endday; +-------------+-------------+-------------+--+ | weeknumber | startday | endday | +-------------+-------------+-------------+--+ | 39 | 2018-09-23 | 2018-09-29 | +-------------+-------------+-------------+--+ Just replace current_timestamp with your timestamp column.

Shu_ashu · ‎09-28-2018

@Pepelu Rico You can achieve this by using ReplaceText processor (or) UpdateRecord processor Method1:Using UpdateRecord processor: If you are working on larger number of rows then use UpdateRecord processor and define CSV Reader controller service with ; as value seperator. Define CSV writer controller service add new property in the Processor as /link3column as concat(/col_4,/col5,/col6) Refer to this link for configure/Usage of UpdateRecord processor. Method2:Using ReplaceText processor: Use this method if you are dealing with less number of records Search Value ^((?:[^;]+;\s*){3})(.*) Replacement Value $1${'$2':replace(";","")} Character Set UTF-8 Maximum Buffer Size 1 MB //change this value according to your flowfile size Replacement Strategy Regex Replace Evaluation Mode Line-by-Line In this method we are using regex to match until third ; to first capture group($1) and rest of the data into $2 capture group. Making use of NiFi expression language we are replacing ";" in $2 capture group with "" OutputFlowfile: column1;column2;column3;20180927column1_1;column2_1;column3_1;20180927column1_2;column2_2;column3_2;20180927 - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: how to extract fields in flow file which are s...

Re: how to insert semicolon in hive table

Re: NIFI - Create alerts

Re: NIFI - Create alerts

Re: NiFi, Attribute to csv

Re: How can I List S3 processor to list only recen...

Re: Week Aggregation in Hive

Re: hdfs Avro format file need to insert into Hive...

Re: Week Aggregation in Hive

Re: NIFI - link 3 columns