Member since 
    
	
		
		
		03-29-2020
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                110
            
            
                Posts
            
        
                10
            
            
                Kudos Received
            
        
                16
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1219 | 01-08-2022 07:17 PM | |
| 3672 | 09-22-2021 09:39 AM | |
| 15567 | 09-14-2021 04:21 AM | |
| 2966 | 09-01-2021 10:28 PM | |
| 3947 | 08-31-2021 08:04 PM | 
			
    
	
		
		
		06-29-2021
	
		
		07:32 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello @K_K      Once you run a query in beeline pick the queryID and trace the queryID in Hiveserver2 logs to figure out how much time it takes in the HTTP handler thread and the background thread to figure out any slowness in this part.  Once the job goes through this it reaches YARN so you need to check the YARN application log of the query about where it is getting slow whether at AM level/container assigning level or task level. In this way, you can see where it is taking time.  If it is a managed table you can run major compaction in the table to compress all the delta files into a single base file, in this way you can eliminate multiple HDFS scanning while running the query.  You can also run explain plan against the query to figure out the flow and how much data it is processing.  You can also run analyze query against the table to collect the column stats and table stats that will increase the query performance.  All the jobs cannot be completed in lesser than 4 seconds.    Reference:  https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ANALYZETABLE%3Ctable1%3ECACHEMETADATA  https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/performance-tuning/content/hive_query_result_cache_ms_cache.html      https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/using-hiveql/content/hive_hive_3_tables.html    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-26-2021
	
		
		09:40 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hi @PURUSHOTHAMAN_S      I can see there are a lot of alerts(28) in Ambari, if I were you I will start checking with HDFS service at first like namenode are up and running because it is vital for other services to come up. Then you may need to check YARN and then you can concentrate on others.     Check out the Ambari startup logs to see why and where it is getting failed.     Hope it helps. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-23-2021
	
		
		10:50 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello @K_K   Hope you are doing great.  MapReduce2 and TEZ can provide an output of lesser than 4 seconds but it is DEPENDS upon so many factors. Namely query complexity, queue sizing, input data, resource availability, and so on.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-20-2021
	
		
		06:55 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Bryan_zh I believe HDP 3.1.5 supports Spark 2.X only. Please check the below link  https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/spark-overview/content/analyzing_data_with_apache_spark.html     How to integrate Hive and Spark?  https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-20-2021
	
		
		06:38 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello @prasanna06   Could you check the below link and see it helps.  https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/auto_tls.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-20-2021
	
		
		06:16 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello @vidanimegh      Error: Error while compiling statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Application application_1623850591633_0042 failed 2 times due to AM Container for appattempt_1623850591633_0042_000002 exited with  exitCode: -104
Failing this attempt.Diagnostics: [2021-06-18 18:32:52.722]Container [pid=32822,containerID=container_e49_1623850591633_0042_02_000001] is running 34230272B beyond the 'PHYSICAL' memory limit. Current usage: 2.0 GB of 2 GB physical memory used; 3.9 GB of 4.2 GB virtual memory used. Killing container.  As I can see your jobs are getting failed with PHYSICAL memory limit error.  Could you set the below property in beeline session level and re-run the analysis query and see how it goes.  set hive.tez.container.size=8192;  set hive.tez.java.opts=-Xmx6553;  set tez.runtime.io.sort.mb=3072;  set tez.task.resource.memory.mb=8192;  set tez.am.resource.memory.mb=8192;  set tez.am.launch.cmd-opts=-Xmx6553m; 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-19-2021
	
		
		11:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello @Bryan_zh      Hive 3 is the default version in HDP 3.1.5 and you cannot degrade the version to Hive 2.3.7. It is also not recommended to degrade Hive from 3.X to 2.X       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-16-2021
	
		
		10:05 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello @ryu      Could you take a screenshot of the message and share it with us.  What is the HDP and Ambari version you are using? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-16-2021
	
		
		10:03 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello @Enigmat     Could you try DISTINCT to remove similar entries?    https://dwgeek.com/identify-and-remove-duplicate-records-from-hive-table.html/    https://stackoverflow.com/questions/43280052/how-to-delete-duplicate-records-from-hive-table 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-16-2021
	
		
		09:56 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello @bsaad      1. Could you check whether you are able to connect to internet from the Oracle VM, using a ping test to google.com  2. Could you cross-check the port number 8889 is up and listening by using the following command as the root user  #netstat -ntpla | grep 8889 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













