About Shu_ashu

Shu_ashu · ‎12-13-2018

@Praveenesh Kumar It's possible with Yarn RESTAPI(SubmitApplicationAPI) and in response we are going to have submittedTime and startTime the difference between these two unix timestamps would be the wait time. Api: GET http://<history server http address:port>/ws/v1/history/mapreduce/jobs Refer to this link for more details regards to the Api documentation.

Shu_ashu · ‎12-13-2018

@Lior Sela If you use "MergeRecord" processor then define RecordReader/Writer controller services avro schema with 2 columns (i.e. col1,col2). Avro Schema: { "type": "record", "name": "path_sch", "fields": [ { "name": "col1", "type": ["null","string"]}, { "name": "col2", "type": ["null","string"]} ] } I tried with same case as mentioned in the question and it worked as expectedly, Please find the attached templated here. mergerecord-avro.xml

Shu_ashu · ‎12-13-2018

@Julio Gazeta Weird, i'm able to get the state value if i keep store state locally. Regards to GetMongo processor the flowfile attributes issue got resolved in NiFi-1.8 NIFI-5334 addressing this issue. As a word around to get required attribute refer to this link.

Shu_ashu · ‎12-12-2018

@Julio Gazeta The reason for not getting the state value is in UpdateAttribute processor you have selected Store State property value as "Do not store state" in this case processor doesn't get the state value. To resolve this issue: Change the UpdateAttribute processor configs to Store State Store state locally Then auto terminate (or) connect set state fail connection to get notified in case of an failures happend. Try to run and check are you able to get the state value or not. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎12-11-2018

@Saurav Ranjit If your table is text format then the table won't have any delete/update capabilities. The work around for this case as follows, If your table is partitioned: 1.Then select the partition that you want to delete rows from and make sure any new data is not writing into this partition. 2.Take the specific partition data into temp table hive> create table <db_name>.<temp_table_name> as select * from <db_name>.<partition_table_name> where <partition_field_name>="<desired_partition_value>"; 3.Overwrite the same partition by excluding the unnecessary rows hive> insert overwrite <db_name>.<partition_table_name> partition(<partition_field_name>) select * from <db_name>.<temp_table_name> where <field_name> not in (<values_to_exclude>); 4. once you make sure that the data is correct then drop the temp table hive> drop table <db_name>.<temp_table_name>; These are the steps we need to follow for deleting specific rows in case of non-transactional table. In addition if you are having non partitioned table then we need to get full dump of existing(target) table into temp table and overwrite the target table by excluding the unnecessary rows from the temp table and most important until this process is finished make sure you are not writing any new data into target table. - Even hive supports select and overwrite the same table at same time but any wrong queries will lead to loose data completely so it's better to use temp table in place and drop the table when we make sure the data is correct. Example: insert overwrite table <db_name>.<partition_table_name> partition(<partition_field_name>) select * from <db_name>.<partition_table_name> where <field_name> not in (<values_to_exclude>);

Shu_ashu · ‎12-01-2018

@Hemanth Vakacharla i think for this case we need to split the records one line each by using SplitRecord/SplitText processor. Then Using MergeContent processor we can do 500 MB splits by using this way we are not going to have splitting records in between. Flow: 1.SplitRecord/SplitText //split the flowfile 1 line each 2.MergeRecord/MergeContent //to get 500MB filesize To force merge flowfiles use MaxBigAge property like 30 mins..etc. In case if you are using Record oriented processors we need to define Record Writer/Reader with avro schema to read/write the flowfile. Refer to this link for more details regards to merge content processor.

Shu_ashu · ‎12-01-2018

@n c To define a variable in hive we need to use hivevar and hiveconf is used to set hive configurations Please follow the below steps: hive> set hivevar:id=1; //define id variable with 1 value hive> create view testview as select * from test1 where id = ${hivevar:id}; //create view hive> select * from testview; //select from view for more details regards to hivevar vs hiveconf refer to this link.

Shu_ashu · ‎11-27-2018

@Julio Gazeta I don't think NiFi won't store the reference once we clear off the states in the processor. In your "d:\\tmp\\input" directory have only one file then clear off all states in ListFile processor then 1.start the processor once and then stop the processor and 2.start the ListFile processor again then you are going to list the file from the directory. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎11-27-2018

@Nawnath Hande You can use Split (or) regexp_extract hive functions for this case. 1.Regexp_extract function: hive> select trim(regexp_extract('string("Room no 601, Sayali Nivas , MG Road Delhi")', ',(.*?)(Nivas)', 1)); +---------+--+ | _c0 | +---------+--+ | Sayali | +---------+--+ 2.Split Function: hive> select trim(split(split(string("Room no 601, Sayali Nivas , MG Road Delhi"),",")[1],"Nivas")[0]); +---------+--+ | _c0 | +---------+--+ | Sayali | +---------+--+ - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎11-25-2018

@Sudhakar Reddy Thanks for updating all details regards to the flow :). Configure ReplaceText processor as Search Value (?s)(^.*$) Replacement Value Insert into test_schema.table1(topic, partition, offset, key, value) values ('${topic}','${partition}','${offset}','${key}','${'$1':replace("'","\"")}') Character Set UTF-8 Maximum Buffer Size 10 MB //change this value as per your flowfile size Replacement Strategy Regex Replace Evaluation Mode Entire text In replacement value we are replacing all single quotes(') with double quotes(") in the captured group $1.

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: Yarn Identity job in wait time?

Re: nifi merge avro with different schemas

Re: Apache-Nifi - Problem with getStateValue

Re: Apache-Nifi - Problem with getStateValue

Re: delete query from hive table (with partition) ...

Re: NiFi size based File Split

Re: hive view parameter and union all

Re: Nifi - How to process one file at a time

Re: My sql query having SUBSTRING_INDEX(string, de...

Re: Replace single quote with two single quotes in...