Member since 
    
	
		
		
		05-15-2023
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                13
            
            
                Posts
            
        
                2
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		12-11-2024
	
		
		02:19 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I don't understand what the benefit of doing it this way is. As far as I know, when creating a table in Hive, a new entity of type hive_table is automatically created in Atlas. This happens automatically, in contrast to your manual approach. Am I misunderstanding something? Could you please explain it to me? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-16-2024
	
		
		02:54 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Thank you, my friend. A week ago, I read through your configurations in the official documentation and experimented with them. However, I encountered an error along the lines of 'class not found.' Currently, I've identified the root cause: I'm using HDP 3.1.0, which includes PySpark 2.3.2.3.1.0.0-78. Therefore, I upgraded it to PySpark 3, while still using the standalone-metastore-1.21.2.3.1.0.0-78-hive3.jar file by default. That's the reason why, when using the configuration, I encountered the 'class not found' error. Now, I've replaced that JAR file with hive-metastore-2.3.9.jar. Everything is working fine now. Once again, thank you, my friend. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-05-2024
	
		
		05:50 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thank you, though your point about data integrity is valid, it's worth noting that PySpark has supported this feature since version 2.1, and there hasn't been any announcement about its removal. I believe this might be a bug. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-28-2024
	
		
		06:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I think this is a bug in Spark. I followed their changes in the documentation (https://spark.apache.org/docs/latest/sql-migration-guide.html), but I haven't seen any notes about this problem.  I find that there is another temporary solution to address this issue. We can directly write to the location of a desired partition on that table. I have implemented it as follows:     -- Create a test table:
CREATE EXTERNAL TABLE IF NOT EXISTS staging.current_sonnh (
`date`	date,	
deal_id STRING,
hr_code STRING,
custid STRING
)
PARTITIONED BY (partition_date STRING)
STORED AS ORC
LOCATION '/lake/staging_zone/sonnh/current_sonnh'
TBLPROPERTIES("orc.compress"="SNAPPY", "external.table.purge"="true");     -- Insert sample data
INSERT  INTO TABLE
	staging.current_sonnh
	(
	`date`, deal_id, hr_code, custid, partition_date
	)
SELECT
	TO_DATE('2024-01-01') , 1234, 'HR1234', 'CI1234', 20240101;  Initialize the Spark session and perform as below:       x = spark.read.format("orc").load('/lake/staging_zone/sonnh/current_sonnh/partition_date=20240101')
spark.sql("ALTER table staging.current_sonnh ADD PARTITION (partition_date=20240102)")
x.write.mode("overwrite").orc("/lake/staging_zone/sonnh/current_sonnh/partition_date=20240102")       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-24-2024
	
		
		07:28 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							             Dears team, I have been using PySpark 3.4.2 with the following syntax:                    sql_query = "
INSERT OVERWRITE TABLE table_1 PARTITION(partition_date = {YYYYMMDD})
SELECT
    table_1.a
    , table_1.b
    , table_2.c
FROM table_2 change_capture_view
FULL OUTER JOIN (
    SELECT * FROM table_1 WHERE WHERE partition_date = {YYYYMMDD_D_1}
) current_view
    ON change_capture_view.a <=> current_view.a
WHERE change_capture_view.a IS NULL
"         and use spark.sql(sql_query ). And encountered the error:         File "/usr/hdp/3.1.0.0-78/spark3/python/lib/pyspark.zip/pyspark/sql/session.py", line 1440, in sql
File "/usr/hdp/3.1.0.0-78/spark3/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
File "/usr/hdp/3.1.0.0-78/spark3/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 175, in deco
pyspark.errors.exceptions.captured.AnalysisException: Cannot overwrite a path that is also being read from.     In essence, I am trying to retrieve data from the partitions of previous dates to process and write into the partition of the current date on the same table.       Although previously, with the same syntax, I used it on PySpark 2.3.2.3.1.0.0-78 and it worked normally. Can someone help me with this issue? I've tried creating a temporary table from table 1, but still encountered a similar error.     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Spark
			
    
	
		
		
		09-15-2023
	
		
		03:16 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 i got same the error. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-18-2023
	
		
		12:23 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Tks @RangaReddy  My purpose is to collect a series of pagings from an RDBMS and compare it with JVM_HEAP_MEMORY. Do you find this approach acceptable? I believe it could help alleviate the issue of small files on HDFS.  I'm facing difficulties in calculating the size of the DataFrame. It seems there's no straightforward way to accomplish it 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-14-2023
	
		
		01:34 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							  I am using Spark 2.3.2.3.1.0.0-78.   I tried to use:   spark_session.sparkContext._conf.get('spark.executor.memory')   but I only received 'None'.     Can someone help me, please?   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Spark
			
    
	
		
		
		07-17-2023
	
		
		01:55 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello Team ,  We are using HDP-3.1.0  We have executed import-hive.sh script to import already existing hive tables in Atlas. It got successfully executed. Now we can see all hive databases and tables in Atlas, but we are not able to see data linage of that imported tables.  If we create external table on any hdfs path then we can see lineage in Atlas.  Also, If we create any managed tables in Atlas, we are not able to see lineage of that old tables. But we still can see lineage of that new tables.   Why we are not getting lineage of older tables?  Please suggest. We are stuck now.  Thanks, 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Atlas
			
    
	
		
		
		07-05-2023
	
		
		09:03 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 It works for me 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        



