Member since 
    
	
		
		
		10-03-2020
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                236
            
            
                Posts
            
        
                15
            
            
                Kudos Received
            
        
                18
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1715 | 11-11-2024 09:31 AM | |
| 2084 | 08-28-2023 02:13 AM | |
| 2547 | 12-15-2021 05:26 PM | |
| 2315 | 10-22-2021 10:09 AM | |
| 6166 | 10-20-2021 08:44 AM | 
			
    
	
		
		
		10-28-2021
	
		
		02:57 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @uygg,  Please check if 3rd party jars like Bouncy castle jars are added. If that is the cause please remove them then restart RM.     Thanks,  Will 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-22-2021
	
		
		10:09 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Rjkoop      Visibility labels are not officially supported by Cloudera, please refer to this link:  https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_620_unsupported_features.html#hbase_c6_unsupported_features     Regards,  Will 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-20-2021
	
		
		08:44 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @DA-Ka,  SUM and JOIN won't change the timestamp of the underlying file.  Example:  create table mytable (i int,j int,k int);  insert into mytable values (1,2,3),(4,5,6),(7,8,9);  create table mytable2 (i int,j int,k int);  insert into mytable2 values (1,2,6),(3,5,7),(4,8,9);  select * from mytable;  +------------+------------+------------+  | mytable.i | mytable.j | mytable.k |  +------------+------------+------------+  | 1 | 2 | 3 |  | 4 | 5 | 6 |  | 7 | 8 | 9 |  +------------+------------+------------+  select * from mytable2;  +-------------+-------------+-------------+  | mytable2.i | mytable2.j | mytable2.k |  +-------------+-------------+-------------+  | 1 | 2 | 6 |  | 3 | 5 | 7 |  | 4 | 8 | 9 |  +-------------+-------------+-------------+  # sudo -u hdfs hdfs dfs -ls -R /warehouse/tablespace/managed/hive/mytable  drwxrwx---+ - hive hive 0 2021-10-20 15:11 /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000  -rw-rw----+ 3 hive hive 743 2021-10-20 15:12 /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000/bucket_00000_0  # sudo -u hdfs hdfs dfs -ls -R /warehouse/tablespace/managed/hive/mytable2  drwxrwx---+ - hive hive 0 2021-10-20 15:23 /warehouse/tablespace/managed/hive/mytable2/delta_0000001_0000001_0000  -rw-rw----+ 3 hive hive 742 2021-10-20 15:23 /warehouse/tablespace/managed/hive/mytable2/delta_0000001_0000001_0000/bucket_00000_0  1. Sum, timestamp is unchanged  select pos+1 as col,sum (val) as sum_col  from mytable t lateral view posexplode(array(*)) pe  group by pos;  +------+----------+  | col | sum_col |  +------+----------+  | 2 | 15 |  | 1 | 12 |  | 3 | 18 |  +------+----------+  # sudo -u hdfs hdfs dfs -ls -R /warehouse/tablespace/managed/hive/mytable  drwxrwx---+ - hive hive 0 2021-10-20 15:11 /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000  -rw-rw----+ 3 hive hive 743 2021-10-20 15:12 /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000/bucket_00000_0  2. Inner Join, timestamp is unchanged  select * from  (select * from mytable)T1  join  (select * from mytable2)T2  on T1.i=T2.i  +-------+-------+-------+-------+-------+-------+  | t1.i | t1.j | t1.k | t2.i | t2.j | t2.k |  +-------+-------+-------+-------+-------+-------+  | 1 | 2 | 3 | 1 | 2 | 6 |  | 4 | 5 | 6 | 4 | 8 | 9 |  +-------+-------+-------+-------+-------+-------+  sudo -u hdfs hdfs dfs -ls -R /warehouse/tablespace/managed/hive/mytable  drwxrwx---+ - hive hive 0 2021-10-20 15:11 /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000  -rw-rw----+ 3 hive hive 743 2021-10-20 15:12 /warehouse/tablespace/managed/hive/mytable/delta_0000001_0000001_0000/bucket_00000_0  sudo -u hdfs hdfs dfs -ls -R /warehouse/tablespace/managed/hive/mytable2  drwxrwx---+ - hive hive 0 2021-10-20 15:23 /warehouse/tablespace/managed/hive/mytable2/delta_0000001_0000001_0000  -rw-rw----+ 3 hive hive 742 2021-10-20 15:23 /warehouse/tablespace/managed/hive/mytable2/delta_0000001_0000001_0000/bucket_00000_0     Regards,  Will 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-20-2021
	
		
		01:01 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @DA-Ka,  Below example is inspired by this link  1)  use -t -R to list files recursively with timestamp:  # sudo -u hdfs hdfs dfs -ls -t -R /warehouse/tablespace/managed/hive/sample_07  drwxrwx---+ - hive hive 0 2021-10-20 06:14 /warehouse/tablespace/managed/hive/sample_07/.hive-staging_hive_2021-10-20_06-13-50_654_7549698524549477159-1  drwxrwx---+ - hive hive 0 2021-10-20 06:13 /warehouse/tablespace/managed/hive/sample_07/delta_0000001_0000001_0000  -rw-rw----+ 3 hive hive 48464 2021-10-20 06:13 /warehouse/tablespace/managed/hive/sample_07/delta_0000001_0000001_0000/000000_0  2) filter the files older than a timestamp:  sudo -u hdfs hdfs dfs -ls -t -R /warehouse/tablespace/managed/hive/sample_07 |awk -v dateA="$date" '{if (($6" "$7) <= "2021-10-20 06:13") {print ($6" "$7" "$8)}}'  # sudo -u hdfs hdfs dfs -ls -t -R /warehouse/tablespace/managed/hive/sample_07 |awk -v dateA="$date" '{if (($6" "$7) <= "2021-10-20 06:13") {print ($6" "$7" "$8)}}'  2021-10-20 06:13 /warehouse/tablespace/managed/hive/sample_07/delta_0000001_0000001_0000  2021-10-20 06:13 /warehouse/tablespace/managed/hive/sample_07/delta_0000001_0000001_0000/000000_0     Regarding your last question, if sum or join could change the timestamp, I'm not sure, please try and then use above commands to see the timestamps.     Regards,  Will  If the answer helps, please accept as solution and click thumbs up. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-19-2021
	
		
		04:57 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hi @kras,     From the evidences you provided, the most frequent warning is:  WARN [RpcServer.default.FPBQ.Fifo.handler=10,queue=10,port=16020] regionserver.RSRpcServices: Large batch operation detected (greater than 5000) (HBASE-18023). Requested Number of Rows: 12596 Client: svc-stats//ip first region in multi=table_name,\x09,1541077881948.9bcc8cee00ab92b2402730813923c2f6.  which indicates when an RPC is received from a client that has more than 5000 "actions" (where an "action" is a collection of mutations for a specific row) in a single RPC. Misbehaving clients who send large RPCs to RegionServers can be malicious, causing temporary pauses via garbage collection or denial of service via crashes. The threshold of 5000 actions per RPC is defined by the property "hbase.rpc.rows.warning.threshold" in hbase-site.xml.  Please refer to this jira: https://issues.apache.org/jira/browse/HBASE-18023 for detailed explanation.     We can identify the table name is "table_name", please check which application is writing / reading this table. Simplest way is to halt this application, to see if performance is improved. If you identified the latency spike is due to this table, please improve your application logic, control your batch size.     If you have already improved the "harmful" applications but still see performance issues, I would recommend you read through this article which include most common performance issues and tuning suggestions:  https://community.cloudera.com/t5/Community-Articles/Tuning-Hbase-for-optimized-performance-Part-1/ta-p/248137  This article has 5 parts, please read through it you will have ideas to tune your hbase.     This issue looks like a little complex, there will be multi-factors to impact your hbase performance. We encourage you to raise support cases with Cloudera.     Regards,  Will  If the answer helps, please accept as solution and click thumbs up. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-17-2021
	
		
		06:45 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @dzbeda,  The definition of "dfs.balancer.getBlocks.min-block-size" is "Smallest block to consider for moving".  What is the version of hadoop? Is it CDH or HDP? What is the version of CDH / HDP?  For CDH please refer to:   https://docs.cloudera.com/documentation/enterprise/latest/topics/admin_hdfs_balancer.html#cmug_topic_5_14__section_lqb_rzp_x2b  https://docs.cloudera.com/documentation/enterprise/6/properties/6.1/topics/cm_props_cdh5160_hdfs.html#concept_6.1.x_balancer_props  HDFS Balancer and DataNode Space Usage Considerations:  https://my.cloudera.com/knowledge/HDFS-Balancer-and-DataNode-Space-Usage-Considerations?id=73869     Regards,  Will 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-13-2021
	
		
		08:00 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @kras,  1. Is it CDH or HDP, what is the version.  2. In regionserver logs is there “responseTooSlow” or “operationTooSlow” or any other WARN/ERROR messages. please provide log snippets.  3. How is the locality of the regions (check locality on hbase webUI, click on table, on right side there is a column shows each region locality.)  4. How many regions deployed on each RegionServer.  5. Any warning / errors in RS log around the spike?  6. Is any job trying to scan every 10 min? Which table contribute most I/O? Is there any hotspot.  7. is HDFS healthy? check DN logs, is there any slow messages around the spike? Refer to https://my.cloudera.com/knowledge/Diagnosing-Errors-Error-Slow-ReadProcessor-Error-Slow?id=73443     Regards,  Will 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-02-2021
	
		
		04:19 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Tamiri , Please click on your avatar and check My settings > SUBSCRIPTIONS&NOTIFICATIONS  Another place is when you reply to post, on the top right select "Email me when someone replies".              Regards,  Will 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-01-2021
	
		
		07:01 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello @rahuledavalath,  What HDP version and what CDP version are you using?     Regards,  Will 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-29-2021
	
		
		09:50 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Then above solutions meet your needs. 
						
					
					... View more