Member since 
    
	
		
		
		01-19-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                3676
            
            
                Posts
            
        
                632
            
            
                Kudos Received
            
        
                372
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 472 | 06-04-2025 11:36 PM | |
| 998 | 03-23-2025 05:23 AM | |
| 531 | 03-17-2025 10:18 AM | |
| 1869 | 03-05-2025 01:34 PM | |
| 1240 | 03-03-2025 01:09 PM | 
			
    
	
		
		
		08-29-2021
	
		
		07:19 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @npr20202   I am sorry but you will have to continue being a magician 🙂  If you don't want then you have to teach your users the secret sauce or magic wand.  We too face the same problems with multiple users Spark/Impala/PySpark I have made them add the INVALIDATE METADATA and REFRESH (spark) )command at the start of their  queries and that works perfectly  Else the automatic invalidate/refresh of metadata is enabled and available in CDP 7.2.10. As long as Impala depends on HMS that issue will exist 🙂      Happy hadooping    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-29-2021
	
		
		12:42 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @npr20202   That makes sense that the problem only crops up after the maintenance "reboot" of the Metasore host. Once the server is rebooted the metadata is purged from memory that explains the slowness of querries after a cluster restart.  Automatic Invalidation/Refresh of Metadata   Now an available option in CDP 7.2.10 When automatic invalidate/refresh of metadata is enabled, the Catalog Server polls Hive Metastore (HMS) notification events at a configurable interval and automatically applies the changes to Impala catalog.  Impala Catalog Server polls and processes the following changes.     Invalidates the tables when it receives the ALTER TABLE event.  Refreshes the partition when it receives the ALTER, ADD, or DROP partitions.  Adds the tables or databases when it receives the CREATE TABLE or CREATE DATABASE events.  Removes the tables from catalogd when it receives the DROP TABLE or DROP DATABASE events.   The HMS stores metadata for Hive tables schema, permissions, location, and partitions in a relational database providing clients access to this information by using metastore service API.  Hive Metastore is a component in Hive that stores the catalog of the system that contains the metadata about Hive create columns, Hive table creation, and partitions.    Impala uses the HIVE metastore to read the data created in hive, it is possible to read the same and query the same using Impala. All you need is to refresh the table or trigger INVALIDATE METADATA in impala to read the data. Hive and impala are two different query engines.  Impala can interoperate with data stored in Hive, and uses the same infrastructure as Hive for tracking metadata about schema objects such as tables and columns.   Virtualization  Discoverability  Schema Evolution  Performance   Hive utilizes execution engines (like Tez, Hive on Spark, and LLAP) to improve query performance without low-level tuning approaches. Leveraging parallel execution whenever sequential operations are not needed is also wise. The amount of parallelism that your system can perform depends on the resources available and the overall data structure. Proper Hive tuning allows you to manipulate as little data as possible. One way to do this is through partitioning, where you assign “keys” to subdirectories where your data is segregated.  Impala uses Hive metastore and can query the Hive tables directly. Unlike Hive, Impala does not translate the queries into MapReduce jobs like hive but executes them natively using its daemons running on the data nodes to directly access the files on HDFS .  Created metadata is stored in the Hive Metastore‚ and is contained in an RDBMS such as MySQL/Oracle, MSSQL or MariaDB. Hive and Impala work with the same data tables in HDFS, metadata in the Metastore.  Metadata information of tables created in Hive is stored in Hive "Meta storage database".     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-28-2021
	
		
		01:30 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @hadoclc   Can you share more details? What job spark/hive?  Can you share some information about your environment and the code submitted that fails?  What is the permission of /user/yarn  Who and how was the job executed in 7.1.4? Is the same user running the job in 7.1.6?  Please share the logs? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-27-2021
	
		
		11:47 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @vciampa   The log clearly shows that the Address is already in use  Caused by: java.net.BindException: Port in use: 0.0.0.0:8042  Caused by: java.net.BindException: Address already in use  Can you proceed by locating the pid  # lsof -i -P -n | grep LISTEN | grep 8042  Example  # lsof -i:8042  COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME  java 9322 yarn 475u IPv4 294790 0t0 TCP *:fs-agent (LISTEN     Kill using the PID  $ kill -9 9322  Restart the service  Please revert 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-26-2021
	
		
		12:52 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @npr20202   To return accurate query results, Impala needs to keep the metadata current for the databases and tables queried. Therefore, if some other entity modifies information used by Impala in the metastore, the information cached by Impala must be updated via INVALIDATE METADATA or REFRESH.  Difference between INVALIDATE METADATA and REFRESH  INVALIDATE METADATA  is an asynchronous operation that simply discards the loaded metadata from the catalog and coordinator caches. After that operation, the catalog and all the Impala coordinators only know about the existence of databases and tables and nothing more. Metadata loading for tables is triggered by any subsequent queries.  REFRESH  Just reloads the metadata synchronously. REFRESH is more lightweight than doing a full metadata load after a table has been invalidated. REFRESH cannot detect changes in block locations triggered by operations like HDFS balancer, hence causing remote reads during query execution with negative performance implications.  Syntax  INVALIDATE METADATA [[db_name.]table_name]  You can run it in the HUE or impala-shell  i.e  INVALIDATE METADATA product.customer  By default, the cached metadata for all tables is flushed. If you specify a table name, only the metadata for that one table is flushed. Even for a single table.  INVALIDATE METADATA is more expensive than REFRESH, so prefer REFRESH in the common case where you add new data files for an existing table. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-26-2021
	
		
		12:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @dv_conan   I think your issue should be resolved with this posting  Hive on tez issue  Please let me know if that resolves your problem or not  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-24-2021
	
		
		02:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Nitin0858   Did you enable your Ambari manually  because those parameters you are referring to are set automatically when enabling through Ambari else if you did it manually as I suspect you need to perform the steps mentioned  Set Up Kerberos for Ambari Server     Please revert  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-24-2021
	
		
		01:05 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @lyash   There are a couple of things members need to be able to help with your case.      CDH or CDP version?  OS?  Your Postgres version and document followed for setup or steps executed?  Memory allocated to the hive  Can you connect to Postgres locally? ie   sudo --login --user=postgres   Can you change the hive.metastore.schema.verification to false in hive-site.xml   Please revert 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-21-2021
	
		
		04:11 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @mike_bronson7   Can you share your capacity scheduler , total memory and vcores configs ?     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-21-2021
	
		
		03:56 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Nitin0858   Can you share the 2 contents so we can help with the analysis? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













