Member since 
    
	
		
		
		03-18-2025
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                2
            
            
                Posts
            
        
                0
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		03-20-2025
	
		
		05:20 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Boris G, I literaly started my thread explaining why I need Impala. Problem solved by the way. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-18-2025
	
		
		12:46 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Due to data masking, I can't read tables directly using 'vanilla' Spark. The workaround is connecting Spark to Impala via JDBC and the problem is: when I use reserved words or some operations like `+ INTERVAL 1 DAY` Impala returns the column names as values in the DataFrame.  That's how I start the Spark session:  spark = (
    SparkSession
    .builder
    .config("spark.jars", "/home/cdsw/ImpalaJDBC42.jar")
    .getOrCreate()
)  and how I query data:  (
    spark
    .read
    .format("jdbc")
    .option("driver", "com.cloudera.impala.jdbc.Driver")
    .option("url", "jdbc:impala://MY_IMPALA_HOST:443/default;AuthMech=3;transportMode=http;httpPath=cliservice;ssl=1")
    .option("PWD", "MY_PASSWORD")
    .option("UID", "MY_USERNAME")
    .option("query", "SELECT 'a' AS index FROM MY_TABLE")
    .load()
    .show()
)  That's what I get:  +-----+
|index|
+-----+
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
|index|
+-----+  Other errors are derived from this one. For example, when running the query:  SELECT current_date() + interval 1 day FROM MY_TABLE  raises the exception:  java.sql.SQLDataException: [Cloudera][JDBC](10140) Error converting value to Date.  This happens because Spark is expecting a date to be parsed but Impala returns the column name as a value. We can see the returned value by casting to string:  SELECT CAST(current_date() + interval 1 day AS STRING) FROM MY_TABLE  +-----------------------------------------------+
|cast(current_date() + interval 1 day as string)|
+-----------------------------------------------+
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
|                           cast(current_date...|
+-----------------------------------------------+  Can someone help me? I searched for a while and found some people facing this issue some years ago. Is there a solution already? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Impala
 - 
						
							
		
			Apache Spark