About Shu_ashu

Shu_ashu · ‎11-22-2017

@Michael LY I don't think there is a processor which we can merge parquet files into one but we can achieve by using PutHiveQL processor. Flow:- ListFile > FetchFile > mergeContent > convertCSVtoAvro > UpdateAttribute > PutParquet(success relation) > ReplaceText(success) > PutHiveQL PutParquet:- Store the parquet files into Temporary HDFS directory and Create a table on top of this temp directory. Use the success relation of PutParquet processor to Replace Text Replace Text Processor:- Create another final (or) target table in hive Configs:- Replacement Value property as insert overwrite table <final-table-name> select * from <temp-table-name> The above insert overwrite statement creates one parquet file in final table by selecting N files from temp table. Connect the success relation from ReplaceText to PutHiveQL processor.

Shu_ashu · ‎11-22-2017

@Michael LY Before PutParquet processor use UpdateAttribute processor and add new property as filename as ${UUID()} //changing the filename to UUID value The FlowFile will also have an attribute named UUID, which is a unique identifier for this FlowFile. In this processor we are changing the filename to UUID, It will help you to not overwrite the existing files. Processor Configs:- Flow:- ListFile > FetchFile > mergeContent >convertCSVtoAvro >UpdateAttribute > PutParquet

Shu_ashu · ‎11-20-2017

@Paramesh malla Try to change the dynamic property as initial.maxvalue.<max-value-column-name> = 3600000 Example i tried below:- In this processor i am giving initial value as 524 and my maximum value column name is dmacode. Dynamic property i have added Property name initial.maxvalue.dmacode Value 524 When we run the processor again it will fetch only the records after 524.

Shu_ashu · ‎11-17-2017

@Sanaz Janbakhsh, Use the below script to add the encoded payload attribute to the flow file. Script:- import base64 flowFile = session.get() if flowFile is not None: myAttr = flowFile.getAttribute('payload') out=base64.b64decode(myAttr) out1=base64.b16encode(out) flowFile = session.putAttribute( flowFile, 'attr', out1) session.transfer(flowFile, REL_SUCCESS) diff between 'out1',out1 flowFile = session.putAttribute( flowFile, 'attr', 'out1') //script adds attribute attr value as out1(because out1 is in single quotes). if you want to add the out1 actual encoded value we need to use out1 without single quotes. Here is an example i tried:- i am having payload attribute value as asdf. Python shell output:- >>> import base64 >>> out=base64.b64decode('asdf') >>> out1=base64.b16encode(out) >>> print out1 6AC75F NiFi Script:- Output:- As you can see above payload attribute value is asdf and script attr value is 6AC75F, Python shell output and script outputs are matching.

Shu_ashu · ‎11-17-2017

@Sanaz Janbakhsh Try to run the below script flowFile = session.get() if flowFile is not None: myAttr = flowFile.getAttribute('payload') session.transfer(flowFile, REL_SUCCESS) ExecuteScript Configs:- In addition if you want to add an attribute, Below script adding attribute named attr value 1 to the flowfile. flowFile = session.get() if flowFile is not None: myAttr = flowFile.getAttribute('payload') flowFile = session.putAttribute( flowFile, 'attr', '1' ) //adding attribute attr with value 1 to the flowfile session.transfer(flowFile, REL_SUCCESS)

Shu_ashu · ‎11-17-2017

@rakesh chow Use PutSQL processor instead of ExecuteSQL processor, PutSQL processor does inserts and updates to mysql. You can have refer below links https://community.hortonworks.com/articles/91849/design-nifi-flow-for-using-putsql-processor-to-per.html https://community.hortonworks.com/questions/54538/nifi-ingesting-a-file-from-sftp-and-insert-into-my.html

Shu_ashu · ‎11-16-2017

@Gayathri Devi Instead of cast you need to use from_unixtime Try the below query will result correct outputs as you are expecting hive# select from_unixtime(unix_timestamp('161223000001' ,'yyMMddhhmmss'), 'yyyy-MM-dd HH:mm:ss'); +----------------------+--+ | _c0 | +----------------------+--+ | 2016-12-23 00:00:01 | +----------------------+--+ hive# select from_unixtime(unix_timestamp('161223000001' ,'yyMMddhhmmss'), 'yyyy-MM-dd hh:mm:ss'); +----------------------+--+ | _c0 | +----------------------+--+ | 2016-12-23 12:00:01 | +----------------------+--+

Shu_ashu · ‎11-16-2017

@dhieru singh Failure relation messages won't have any write attribute(failure attribute) that is added to the flow files when the ff are transferred to failure queue. But you can use either of below methods to check the length or size of the flow files and route to SplitText processors. 1.Checking size of the flow file and routing to split text processor. 2.Check the length of the message and route to split text processor 3.Using Route On Content processor 1.Checking size of the flow file and routing to split text processor:- If you know the size of the flow file that will be having Message is too long then you can use Route on Attribute processor and add new property size more than 1 byte as ${fileSize:gt(1)} //Here we are checking the flow file size is greater than 1 byte So like this way you can filter out the flow files based on the size of the flow file but first you need to know the size of the flow files which are routing to failure. Configs:- Flow:- Failure relation --> RouteonAttribute(check filesize) -->SplitText processor--> 2.Check the length of the message and route to split text processor:- In this method we are using the same failure relation to Extract Text Processor. Add new property in Extract Text Processor to extract all the content of the flow file to an attribute Content as (.*) Configs:- If you are following this method then you need to change the highlighted properties in above screenshot according to your flow file size. As we are capturing everything by using (.*) regex so you need to change the 1.Max capture group length is Specifies the maximum number of characters a given capture group value can have. Any characters beyond the max will be truncated. 2.Maximum Buffer Size is Specifies the maximum amount of data to buffer (per file) in order to apply the regular expressions. Files larger than the specified maximum will not be fully evaluated. So once we are done with this step all the contents of ff is now added as an content attribute to the ff. RouteonAttribute Processor:- Use RouteonAttribute processor and check for length of the content attribute by using NiFi expression language. length more than 1 as ${content:length():gt(1)} //we are having content attribute from extract text processor and using that attribute and checking length of attribute and checking is it greater than 1. Configs:- Example:- if your content attribute having value as hi hello then length will be 8. Flow:- Failure Relation --> Extract Text(add property to capture all contents and change buffer size and capture length as per your flow file size) --> RouteOnAttribute(Check the length of attribute and route)-->SplitText processor 3.Using Route On Content processor:- You can use RouteonContent processor to check the content of flow file by changing the properties 1.Match Requirement property change to content must contain match 2.Buffer size depends on your flow file size 3. add property more than 1 length [a-zA-Z0-9\s+]{1} //check content of flow file with space and Matches exactly 1 times including spaces Configs:- If you want to route the messages having more than 1000 length then change the regex to [a-zA-Z0-9\s+]{1000} //Checks flow file content, matches if ff having length of messages more than 1000 including spaces matches with that message. Flow:- Failure relation --> RouteonContent -->SplitText These are the ways we can filter out the failure relation messages, As you can choose which method best fit for your Case.

Shu_ashu · ‎11-14-2017

@dhieru singh In Maximum-value columns use cast or some other functions that can change the format of 2017-11-11 20:30:32.0 to 2017-11-11 20:30:32 in this screenshot i did cast on the filed name to date and when i see the view state then the processor having just date as the last value. State:- Like this way you need to prepare the max value column that can store as 2017-11-11 20:30:32 not with the milliseconds.

Shu_ashu · ‎11-14-2017

@Salda Murrah if you want to filter out 2016 and 2017 records then in route on content processor change the below properties Routeoncontent with Contains as Matching Strategy:- Keep Matching Strategy as Contains Route Strategy as Route to each matching Property Name add new properties 1.2016 as 2016 //check for content if it contains 2016 then route to this relation 2.2017 as 2017 //check for content if it contains 2017 then route to this relation Routeoncontent configs:- (or) Routeoncontent with RegularExpression as Matching Strategy:- If you want to check the contents of flow file with regular expressions then Keep Matching Strategy as Matches Regular Expression Route Strategy as Route to each matching Property Name add new properties 1.2016 as ^.*;.*2016.*;.*$ //check for content if it contains 2016 then route to this relation 2. 2017 as ^.*;.*2017.*;.*$ //check for content if it contains 2017 then route to this relation Routeoncontent with RegularExpression Config:- Flow:- ListFile --> FetchFile --> SplitText //split into 1 line --> RouteonContent //you can use either Contains (or) matches regular expression As Matching Strategies --> .... -->PutCassandraQL

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: How to append flowfiles into parquet instead o...

Re: How to append flowfiles into parquet instead o...

Re: How to set "Initial Max Value" for QueryDataba...

Re: Issue with ScriptExecute Processor

Re: Issue with ScriptExecute Processor

Re: MySQL Update Query with where clause in Nifi

Re: Date Format in Impala

Re: nifi handling failures in failure realtionship...

Re: Querydatabase processor on teradata db adds mi...

Re: Deleting specific lines of an csv file using A...