Member since 
    
	
		
		
		06-08-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                1049
            
            
                Posts
            
        
                518
            
            
                Kudos Received
            
        
                312
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 11189 | 04-15-2020 05:01 PM | |
| 7087 | 10-15-2019 08:12 PM | |
| 3087 | 10-12-2019 08:29 PM | |
| 11390 | 09-21-2019 10:04 AM | |
| 4290 | 09-19-2019 07:11 AM | 
			
    
	
		
		
		01-03-2018
	
		
		02:59 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							@Simon Jespersen If you know the time for 1 flowfile processing then  Try with Control Rate processor with below properties  Rate Control Criteria
  flowfile count  Maximum Rate
  1  Time Duration
  1 min  Based on above property values Control Rate processor will release 1 flowfile per 1 minute.  Change the Time Duration value based on your one flowfile processing time.  Configs:-      (or)  You need to use Wait and Notify processors to release flowfiles if the flowfile reaches certain processor.  Refer the below link about how to use wait and notify processors  http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/ 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-30-2017
	
		
		06:09 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@Gonzalo Salvia Scenario 1:-  In your PutFTP processor change   Last Modified Time  property to  ${file.lastModifiedTime}  If we mention the property value as above in putftp processor configurations then the new file(push test directory) will have same lastmodifiedtime as the sourcefile.  Configs:-      Scenario 2:-  If we keep the Last Modified Time property as Blank(same configurations as mentioned in your question)  Then putftp processor will set last modified time of the file as new time(not the source file last modified time).  When we pulls the file from the Destination directory(push test) we are going to see the Different last modified times compared to source file(pull test) last modified time.  I think you are facing scenario 2 to get same last modified time as source file just change the property as suggested above in Scenario 1.  Then compare the Last Modified Times between source file(pull test) with destination file(push test), they are going to be same.  If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of errors. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-28-2017
	
		
		06:41 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							@Paresh Baldaniya We cannot change the existing filename in HDFS but we can do alternate solution in NiFi as below.  You need to use UpdateAttribute processor before PutParquet processor.  Update Attribute   we are going to update the filename before putparquet processor so that everytime when file goes to putparquet processor will have same file name every time.  Add new property to Update attribute processor  filename
  desired_parquet_filename.prq  Configs:-      PutParquet:-  So we are going to have same filename everytime and in this processor we need to change   Overwrite Files property to True //if the same filename exists in the directory processor will replace the existing file with new file  Configs:-      Flow:-  1.GetFile
2.UpdateAttribute //change the filename by adding filename property
3.PutParquet //change the Overwrite files property to true  If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of errors. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-27-2017
	
		
		03:10 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 @Alex M,  You need to change Minimum Group Size as per your requirement like (1 B,1 KB,1 MB,1 GB..)  Example:-  As you can see below configs i changed Minimum Group Size as 10 MB //The minimum size of for the bundle.  let's consider your each flow files size is 1 MB each so the processor will wait until the group size reaches to 10 MB and then bundles all the flowfiles as 1(i.e 10 flowfiles merged as 1 flowfile after merge content processor).  if the flowfiles won't meet the minimum group size requirement then the flowfiles are going to wait before merge content processor until it reaches the minimum group size.      How to force merge flowfiles?  By specifying Max Bin Age property  No matter how many Flowfiles have been assigned to a given bin, that bin will be merged once the bin has existed for this amount of time.  let's consider if i set Max Bin Age property to 10 min and i had only 5 flowfiles having 5 MB over all queue size before merge content processor and our minimum group size property is 10 MB.  The queue will never meet the minimum group size requirement that means flowfiles will be queued for ever there to over come this situation we have added 10 min as max bin age so once the flowfile been in the queue for 10 min then the processor going to merge the flowfiles although they haven't meet the minimum group size requirement also.  About all the other properties in Merge Content processor please refer to the links that i mentioned above answer.  Let me know if you are having any questions..!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-27-2017
	
		
		01:01 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							@Alex M Can you please share more details of your configurations of MergeContent Processor.  Refer to below community links How to configure Merge Content processor.  https://community.hortonworks.com/questions/148294/nifi-problems-with-emply-queue.html?childToView=148310#comment-148310  https://community.hortonworks.com/questions/149047/nifi-how-to-handle-with-mergecontent-processor.html?childToView=148309#answer-148309 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-24-2017
	
		
		06:06 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Mohammed Syam   If you want to compare the response time of select queries  1.By using Ambari Hive View won't display the execution time of the query.  Hive View Execution:-      Logs Tab Hive View:-      If you click on Logs tab also there is no logs for the query.  It is impossible compare the time taken for select query to execute, because for select queries there is no map reduce job is going to be initialized, if application id is not created then we cannot compare them programatically.  2.Another way to get at least the execution times for select query is running from hive from command line(not from ambari hive view)   Open Hive from command line and execute the select query.  Once select query finishes at bottom of the results hive shows how many rows has been selected with the response time will be displayed.  Example:-   hive# select * from text_table;
+----------------+----------------+------------------+----------------+------------------------+--+
| text_table.id  | text_table.dt  | text_table.name  | text_table.ts  |    text_table.dtts     |
+----------------+----------------+------------------+----------------+------------------------+--+
| 1              | 2017-10-10     | hcc              | 12:00:00       | 2017-10-10 12:00:00.0  |
| 1              | 2017-10-11     | foo              | 12:00:00       | 2017-10-11 12:00:00.0  |
| 1              | 2017-10-12     | foo              | 12:00:00       | 2017-10-12 12:00:00.0  |
| 2              | 2017-11-10     | bar              | 09:00:00       | 2017-12-23 09:00:00.0  |
+----------------+----------------+------------------+----------------+------------------------+--+
4 rows selected (0.105 seconds)  Scroll down to the last line in the above snippet and you can view that 4 rows and execution time is 0.105 seconds.  If you are running from Hive from command line hive displays how many rows are displayed and what is the execution time for the query as a last line in results.   If you are using Ambari Hive View won't display these stats also.  Only way to compare results of select query is Execute your select from Hive command lines and compare them manually. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-23-2017
	
		
		05:45 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Mohammed Syam
  Select statement is just selecting the data(schema on read) from the HDFS it doesn't perform any map reduce job underneath.  But when you do insert statement it will trigger an Map only job with an application id, when ever an application id is created that means for those jobs we can view from TEZ view (or) from Resource Manager.  Example:-  Insert a record into Table:-  I'm having a text_table with 5 columns in it.  I'm trying to insert a record into the table and an app id application_1508861912312_4126, so when i go to TEZ view (or) Resource Manager search for the applicationid and we can view the application, time taken to complete the application.  hive# insert into text_table values('2','2017-11-10','bar','09:00:00','2017-12-23 09:00:00.0');
INFO  : Tez session hasn't been created yet. Opening session
INFO  : Dag name: insert into text_table values...09:00:00.0')(Stage-1)
INFO  : Status: Running (Executing on YARN cluster with App id application_1508861912312_4126)
INFO  : Map 1: -/-
INFO  : Map 1: 0/1
INFO  : Map 1: 0(+1)/1
INFO  : Map 1: 1/1
INFO  : Loading data to table default.text_table from hdfs:/user/yashu/text_table/.hive-staging_hive_2017-12-23_12-31-19_205_322427679820504794-22055/-ext-10000
INFO  : Table default.text_table stats: [numFiles=4, numRows=4, totalSize=184, rawDataSize=180]
No rows affected (11.679 seconds)  Tezview log:-      Select from Table:-  when we run select statement there is no application id got created and this job status we cannot view either from tez view (or) resource manager  hive# select * from text_table;
+-----+-------------+-------+-----------+------------------------+--+
| id  |     dt      | name  |    ts     |          dtts          |
+-----+-------------+-------+-----------+------------------------+--+
| 1   | 2017-10-10  | hcc   | 12:00:00  | 2017-10-10 12:00:00.0  |
| 1   | 2017-10-11  | foo   | 12:00:00  | 2017-10-11 12:00:00.0  |
| 1   | 2017-10-12  | foo   | 12:00:00  | 2017-10-12 12:00:00.0  |
| 2   | 2017-11-10  | bar   | 09:00:00  | 2017-12-23 09:00:00.0  |
+-----+-------------+-------+-----------+------------------------+--+  Please refer to below link for more details  https://community.hortonworks.com/questions/141606/hive-queries-use-only-mappers-or-only-reducers.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-23-2017
	
		
		04:56 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Mohammed Syam  1.Click on Tez View underneath Hive View button will display the execution time for job.      (or)  2.You can view all information about logs and computation time in Resource manager UI  1.Click on Yarn in Ambari UI
2.Go to Quick Links 
3.Click on Resource Manager UI  Search for your application ID then you can get all logs and execution times.  3.You can also get all logs from command line using below command  yarn logs -applicationId <Application ID>  https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_yarn-resource-management/content/ch_yarn_cli_view_running_applications.html  If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of errors. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-22-2017
	
		
		10:45 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 @Rustam kapuria   I suspect in your input record having \n for one of the value, so when we do Replace Text processor the resultant content is writing into 2 line.  To resolve this issue you need to use escape json function in replace text processor.  Replacement Value
  ${ID_NO}|${CHANGE_TITLE:escapeJson()}  Same way as above you need to mention escape json function where you are having \n in json message.  Configs:-      Make sure which json value having \n in it and use escapeJson function for that attribute in Replace Text processor.  Refer below link for more details  https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#escapejson  Here is what i tried:-  Input record:-  {"ID_NO": "8.92" ,"TITLE": "HCC\n"}  Evaluate Json Path:-  Destination
   flowfile-attribute 
  Return Type
   auto-detect 
  Path Not Found Behavior
   ignore 
  Null Value Representation
   empty string 
  CHANGE_TITLE
   $.TITLE 
  ID_NO
   $.ID_NO   Output of Eval json path processor:-      Even though i'm having \n after HCC for CHANGE_TITLE,\n is not viewed in attributes tab but when we use CHANGE_TITLE in Replace Text processor it will replace the value with new line.  Replace text processor Configs without escapeJson Function:-  Search Value
   (?s)(^.*$) 
  Replacement Value
   ${ID_NO}|${CHANGE_TITLE}   Maximum Buffer Size
   10 MB 
  Replacement Strategy
   Always Replace 
  Evaluation Mode
   Entire text   Output of replace Text processor without escapejson function:-      As you can see the new line is appeared after replace text processor.  Replace text processor Configs with escapeJson Function:-  Search Value  (?s)(^.*$)  Replacement Value  ${ID_NO}|${CHANGE_TITLE:escapeJson()}  Maximum Buffer Size  10 MB  Replacement Strategy  Always Replace  Evaluation Mode  Entire text  Output of replace Text processor with escapejson function:-      as you can see the output is in one line because we have used escapeJson function so the processor escaped the \n and replaced whole content into one line.  Try with escapejson function, if the issue is still not resolved let me know..!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-22-2017
	
		
		07:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							@Bala S I think you don't have extract_date attribute associated with the flowfile.  Attribute screenshot:-      Make sure your each file that are processing from Update Attribute processor are having extract_date attribute associate with them.To check attribute    Right Click on queue  click on list queue   select any of the flowfile   click on Attributes tab then you need to have extract_date attribute with the value.   If you don't have any value then the directory name would be empty(same issue you are facing now).  How to add attribute to the flowfile?  You need to extract the extract_date value from your content and add that value as attribute associated with the flowfile  Example:- if you are having json message then use evaluate json path processor(or) if content is csv then use extract text processor and extract the date value and keep that value to the flowfile.    Once you are done with extracting the value and the attribute is associated with the flowfile. 
						
					
					... View more