About Shu_ashu

Shu_ashu · ‎01-03-2018

@Simon Jespersen If you know the time for 1 flowfile processing then Try with Control Rate processor with below properties Rate Control Criteria flowfile count Maximum Rate 1 Time Duration 1 min Based on above property values Control Rate processor will release 1 flowfile per 1 minute. Change the Time Duration value based on your one flowfile processing time. Configs:- (or) You need to use Wait and Notify processors to release flowfiles if the flowfile reaches certain processor. Refer the below link about how to use wait and notify processors http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/

Shu_ashu · ‎12-30-2017

@Gonzalo Salvia Scenario 1:- In your PutFTP processor change Last Modified Time property to ${file.lastModifiedTime} If we mention the property value as above in putftp processor configurations then the new file(push test directory) will have same lastmodifiedtime as the sourcefile. Configs:- Scenario 2:- If we keep the Last Modified Time property as Blank(same configurations as mentioned in your question) Then putftp processor will set last modified time of the file as new time(not the source file last modified time). When we pulls the file from the Destination directory(push test) we are going to see the Different last modified times compared to source file(pull test) last modified time. I think you are facing scenario 2 to get same last modified time as source file just change the property as suggested above in Scenario 1. Then compare the Last Modified Times between source file(pull test) with destination file(push test), they are going to be same. If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of errors.

Shu_ashu · ‎12-28-2017

@Paresh Baldaniya We cannot change the existing filename in HDFS but we can do alternate solution in NiFi as below. You need to use UpdateAttribute processor before PutParquet processor. Update Attribute we are going to update the filename before putparquet processor so that everytime when file goes to putparquet processor will have same file name every time. Add new property to Update attribute processor filename desired_parquet_filename.prq Configs:- PutParquet:- So we are going to have same filename everytime and in this processor we need to change Overwrite Files property to True //if the same filename exists in the directory processor will replace the existing file with new file Configs:- Flow:- 1.GetFile 2.UpdateAttribute //change the filename by adding filename property 3.PutParquet //change the Overwrite files property to true If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of errors.

Shu_ashu · ‎12-27-2017

@Alex M, You need to change Minimum Group Size as per your requirement like (1 B,1 KB,1 MB,1 GB..) Example:- As you can see below configs i changed Minimum Group Size as 10 MB //The minimum size of for the bundle. let's consider your each flow files size is 1 MB each so the processor will wait until the group size reaches to 10 MB and then bundles all the flowfiles as 1(i.e 10 flowfiles merged as 1 flowfile after merge content processor). if the flowfiles won't meet the minimum group size requirement then the flowfiles are going to wait before merge content processor until it reaches the minimum group size. How to force merge flowfiles? By specifying Max Bin Age property No matter how many Flowfiles have been assigned to a given bin, that bin will be merged once the bin has existed for this amount of time. let's consider if i set Max Bin Age property to 10 min and i had only 5 flowfiles having 5 MB over all queue size before merge content processor and our minimum group size property is 10 MB. The queue will never meet the minimum group size requirement that means flowfiles will be queued for ever there to over come this situation we have added 10 min as max bin age so once the flowfile been in the queue for 10 min then the processor going to merge the flowfiles although they haven't meet the minimum group size requirement also. About all the other properties in Merge Content processor please refer to the links that i mentioned above answer. Let me know if you are having any questions..!!

Shu_ashu · ‎12-27-2017

@Alex M Can you please share more details of your configurations of MergeContent Processor. Refer to below community links How to configure Merge Content processor. https://community.hortonworks.com/questions/148294/nifi-problems-with-emply-queue.html?childToView=148310#comment-148310 https://community.hortonworks.com/questions/149047/nifi-how-to-handle-with-mergecontent-processor.html?childToView=148309#answer-148309

Shu_ashu · ‎12-24-2017

@Mohammed Syam If you want to compare the response time of select queries 1.By using Ambari Hive View won't display the execution time of the query. Hive View Execution:- Logs Tab Hive View:- If you click on Logs tab also there is no logs for the query. It is impossible compare the time taken for select query to execute, because for select queries there is no map reduce job is going to be initialized, if application id is not created then we cannot compare them programatically. 2.Another way to get at least the execution times for select query is running from hive from command line(not from ambari hive view) Open Hive from command line and execute the select query. Once select query finishes at bottom of the results hive shows how many rows has been selected with the response time will be displayed. Example:- hive# select * from text_table; +----------------+----------------+------------------+----------------+------------------------+--+ | text_table.id | text_table.dt | text_table.name | text_table.ts | text_table.dtts | +----------------+----------------+------------------+----------------+------------------------+--+ | 1 | 2017-10-10 | hcc | 12:00:00 | 2017-10-10 12:00:00.0 | | 1 | 2017-10-11 | foo | 12:00:00 | 2017-10-11 12:00:00.0 | | 1 | 2017-10-12 | foo | 12:00:00 | 2017-10-12 12:00:00.0 | | 2 | 2017-11-10 | bar | 09:00:00 | 2017-12-23 09:00:00.0 | +----------------+----------------+------------------+----------------+------------------------+--+ 4 rows selected (0.105 seconds) Scroll down to the last line in the above snippet and you can view that 4 rows and execution time is 0.105 seconds. If you are running from Hive from command line hive displays how many rows are displayed and what is the execution time for the query as a last line in results. If you are using Ambari Hive View won't display these stats also. Only way to compare results of select query is Execute your select from Hive command lines and compare them manually.

Shu_ashu · ‎12-23-2017

@Mohammed Syam Select statement is just selecting the data(schema on read) from the HDFS it doesn't perform any map reduce job underneath. But when you do insert statement it will trigger an Map only job with an application id, when ever an application id is created that means for those jobs we can view from TEZ view (or) from Resource Manager. Example:- Insert a record into Table:- I'm having a text_table with 5 columns in it. I'm trying to insert a record into the table and an app id application_1508861912312_4126, so when i go to TEZ view (or) Resource Manager search for the applicationid and we can view the application, time taken to complete the application. hive# insert into text_table values('2','2017-11-10','bar','09:00:00','2017-12-23 09:00:00.0'); INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: insert into text_table values...09:00:00.0')(Stage-1) INFO : Status: Running (Executing on YARN cluster with App id application_1508861912312_4126) INFO : Map 1: -/- INFO : Map 1: 0/1 INFO : Map 1: 0(+1)/1 INFO : Map 1: 1/1 INFO : Loading data to table default.text_table from hdfs:/user/yashu/text_table/.hive-staging_hive_2017-12-23_12-31-19_205_322427679820504794-22055/-ext-10000 INFO : Table default.text_table stats: [numFiles=4, numRows=4, totalSize=184, rawDataSize=180] No rows affected (11.679 seconds) Tezview log:- Select from Table:- when we run select statement there is no application id got created and this job status we cannot view either from tez view (or) resource manager hive# select * from text_table; +-----+-------------+-------+-----------+------------------------+--+ | id | dt | name | ts | dtts | +-----+-------------+-------+-----------+------------------------+--+ | 1 | 2017-10-10 | hcc | 12:00:00 | 2017-10-10 12:00:00.0 | | 1 | 2017-10-11 | foo | 12:00:00 | 2017-10-11 12:00:00.0 | | 1 | 2017-10-12 | foo | 12:00:00 | 2017-10-12 12:00:00.0 | | 2 | 2017-11-10 | bar | 09:00:00 | 2017-12-23 09:00:00.0 | +-----+-------------+-------+-----------+------------------------+--+ Please refer to below link for more details https://community.hortonworks.com/questions/141606/hive-queries-use-only-mappers-or-only-reducers.html

Shu_ashu · ‎12-23-2017

@Mohammed Syam 1.Click on Tez View underneath Hive View button will display the execution time for job. (or) 2.You can view all information about logs and computation time in Resource manager UI 1.Click on Yarn in Ambari UI 2.Go to Quick Links 3.Click on Resource Manager UI Search for your application ID then you can get all logs and execution times. 3.You can also get all logs from command line using below command yarn logs -applicationId <Application ID> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_yarn-resource-management/content/ch_yarn_cli_view_running_applications.html If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of errors.

Shu_ashu · ‎12-22-2017

@Rustam kapuria I suspect in your input record having \n for one of the value, so when we do Replace Text processor the resultant content is writing into 2 line. To resolve this issue you need to use escape json function in replace text processor. Replacement Value ${ID_NO}|${CHANGE_TITLE:escapeJson()} Same way as above you need to mention escape json function where you are having \n in json message. Configs:- Make sure which json value having \n in it and use escapeJson function for that attribute in Replace Text processor. Refer below link for more details https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#escapejson Here is what i tried:- Input record:- {"ID_NO": "8.92" ,"TITLE": "HCC\n"} Evaluate Json Path:- Destination flowfile-attribute Return Type auto-detect Path Not Found Behavior ignore Null Value Representation empty string CHANGE_TITLE $.TITLE ID_NO $.ID_NO Output of Eval json path processor:- Even though i'm having \n after HCC for CHANGE_TITLE,\n is not viewed in attributes tab but when we use CHANGE_TITLE in Replace Text processor it will replace the value with new line. Replace text processor Configs without escapeJson Function:- Search Value (?s)(^.*$) Replacement Value ${ID_NO}|${CHANGE_TITLE} Maximum Buffer Size 10 MB Replacement Strategy Always Replace Evaluation Mode Entire text Output of replace Text processor without escapejson function:- As you can see the new line is appeared after replace text processor. Replace text processor Configs with escapeJson Function:- Search Value (?s)(^.*$) Replacement Value ${ID_NO}|${CHANGE_TITLE:escapeJson()} Maximum Buffer Size 10 MB Replacement Strategy Always Replace Evaluation Mode Entire text Output of replace Text processor with escapejson function:- as you can see the output is in one line because we have used escapeJson function so the processor escaped the \n and replaced whole content into one line. Try with escapejson function, if the issue is still not resolved let me know..!!

Shu_ashu · ‎12-22-2017

@Bala S I think you don't have extract_date attribute associated with the flowfile. Attribute screenshot:- Make sure your each file that are processing from Update Attribute processor are having extract_date attribute associate with them.To check attribute Right Click on queue click on list queue select any of the flowfile click on Attributes tab then you need to have extract_date attribute with the value. If you don't have any value then the directory name would be empty(same issue you are facing now). How to add attribute to the flowfile? You need to extract the extract_date value from your content and add that value as attribute associated with the flowfile Example:- if you are having json message then use evaluate json path processor(or) if content is csv then use extract text processor and extract the date value and keep that value to the flowfile. Once you are done with extracting the value and the attribute is associated with the flowfile.

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: nifi process one flow file at the time

Re: NiFi PutFile manipulate lastModifiedTime

Re: Nifi : Could not rename file

Re: How to create only one file by MergeContent pr...

Re: How to create only one file by MergeContent pr...

Re: How I can measure reponse time for a query exe...

Re: How I can measure reponse time for a query exe...

Re: How I can measure reponse time for a query exe...

Re: Replace Text search with large file in NIFI

Re: NiFi - Creating the output directory from the ...