About Shu_ashu

Shu_ashu · ‎09-28-2018

@Thuy Le Attributes to CSV processor works on attributes associated with the flowfile. We first need to extract the attributes from the content of the flowfile then use AttributeToCSV processor to create an csv file based on the attributes list. As you have mentioned in the question you are having dynamic array json it will be hard to extract the values of the json keys and create csv file dynamically. Refer to this link for more details regards to AttributeToJSON processor and the usage/configuration of AttributesToCSV processor also the same. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎09-27-2018

@A C To change the format of the input time we need to use unix_timestamp along with from_unixtime funcitons. Try with below syntax hive> select from_unixtime(unix_timestamp("Tue Sep 26 22:02:11 CDT 2018",'EEE MMM dd HH:mm:ss z yyyy'),'yyyy-MM-dd HH:mm:ss'); +----------------------+--+ | _c0 | +----------------------+--+ | 2018-09-26 22:02:11 | +----------------------+--+ - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎09-27-2018

@Ingrid Justen 1.after 1 day when the ExecuteSQL is still active and will be triggered again. I will have to test it. Yes processor will trigger after 1 day then it will execute the sql statement again. 2.The option "Execution" (value "Primary node") does not appear in my NiFi. Is this a version thing (working with 1.6.0)? -> Stop ExecuteSQL processor -> Right click on the processor -> Goto Configure -> Click on Scheduling tab -> Select PrimaryNode in Execution dropdown.(this property only applicable if you are having more than one node for NiFi instance). 1.PrimaryNode //processor is scheduled to run in only primary node 2.AllNodes //if you are having 3 node nifi cluster then the processor will scheduled to run all the nodes i.e. instead of one output flowfile we are going to have 3 flowfiles with same size(this is just all nodes are doing same work results data duplication) this and this links explains more regards to the modes of Executions in NiFi - For this case keep the blue outlined processors into one processor group and use NiFi RestAPI to start the processor group. Once the processing is completed then By using RestAPI stop the processor group. Following this way we are going to trigger ExecuteSQL only once, then we are stopping the group as soon as all the processing is done. Refer to this, this,this links regards to stop/start processor group using NiFi RestAPI. Refer to this link regards to stop processor group once the execution is completed.

Shu_ashu · ‎09-26-2018

@Ram G In NiFi we are having partition record processor, based on the content of the flowfile processor creates dynamic partitions and adds the partition_field_name and value as the attribute to the flowfile. By using these attributes we can store the data into HDFS directories dynamically. To read the content of the flowfile you need to define RecordReader Controller service as CSV Reader and value seperator as \t(as you are having tab delimited file), define RecordWriter controller service as per your requirements(like avro,json..etc) But keep in mind as you mentioned you are having more than 100 GB file and thinking to split the file, For this case i believe Hive will work much better to create Dynamic partitions.Store the file into HDFS then create Hive External table with tab delimiter and create partition table and insert into Partition table select from non_partition_table. How ever if you want to do this in NiFi make sure you are having sufficient memory in your NiFi instance once you pull the file into NiFi use SplitRecord processor to Split the Huge file into reasonable smaller chunks then feed the splitted flowfiles to PartitionRecord processor.Once you have created partitions then store the flowfiles into HDFS. Refer to this link for more details regards to PartitionRecord processor Usage/Configurations. Refer to this link for Jvm OutofMemory issues in NiFi. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎09-26-2018

@Ingrid Justen In your flow ExecuteSql is the trigger processor if you want to run the processor only once then Schedule ExecuteSQL processor as Now the processor Run Schedule is 1 day so this will run as soon as you start for the first time then after 1 day this processor triggers again and all the other processor except of ExecuteSQL(trigger) you can schedule them 0 sec(default) so that when there some data then processor are going to triggered to process them. Refer to this, this and this links for more details regards to Scheduling strategies of NiFi processors. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎09-25-2018

@Pepelu Rico Yes we can validate the name of the file and size of file before ingesting into HDFS by using RouteOnAttribute processor. in NiFi we are having attributes to the flowfile as 1.${filename} //gets the filename of the flowfile 2.${fileSize} //gets the size in bytes of the flowfile In RouteOnAttribute processor add new property as valid_files ${filename:equlas("<required_file_name>"):and(${fileSize:gt(0)})} In the above expression we are using NiFi expression language and checking the filename value equlas(expression language function) to <required_file_name> and checking fileSize value is greater than 0, Then only the flowfiles will be tranferred into valid_files relation. Feed only the valid_files relation to PutHDFS processor, By using this relation we are storing only the files that satisfies the condition will be stored into HDFS. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎09-25-2018

@yazeed salem By using Extract Text processor we are able to work only on flowfile content. Use UpdateAttribute processor as this processor meant for updating the values of Existing attributes (or) add new attributes based on the existing/new values. We can also use NiFi expression language in UpdateAttribute processor for regex use replaceAll .. etc string manupulation functions and apply your regex on the attributes of the flowfile. - if the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎09-22-2018

@Oliver Queen Glad to hear that.. 🙂 Could you Click on Accept button below to accept the answer to close the thread as accepted, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎09-22-2018

@Ahmar Khan Update <table_name> Set <col_name>=(<Query>) is not possible with Hive. Update <table_name> Set <col_name>="<value>" is possible with Hive. You could try using Hive ACID Merge approach to update the values But not sure is merge will work with setting out values based on the query.

Shu_ashu · ‎09-22-2018

@Krishna Sreenivas Could you add more details regards to your use case for extracting 100 filed values? If you are extracting the values and preparing csv file then sending the csv file into Downstream applications? if this is the case then -> if you are using NiFi-1.2+ use ConvertRecord processor to convert the Json Format data into CSV format then send to your down stream systems.

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: NiFi, Attribute to csv

Re: Format date in HIVE

Re: NiFi - Scheduling - parts of the processors wi...

Re: Split huge file, one file for each day - based...

Re: NiFi - Scheduling - parts of the processors wi...

Re: NIFI - Validate vame of file

Re: ExtractText From Flow File Attribute

Re: How to use Hive update with select statement t...

Re: How to use Hive update with select statement t...

Re: Multiple properties in EvaluateJsonPath proces...