Reply
Highlighted
New Contributor
Posts: 2
Registered: ‎01-06-2019

Reading PARTITIONED HIVE table from SPARK 1.6

I have partitioned HIVE table with only 600MB size, but its has 32000 directories due to those partitons of table.

 

Partitioned on 2 columns of HIVE table....

 

Due to this reading table to dataframe using sqlContext.sql("SELECT * FROM .....") taking 32000 partitions in spark and data skewness spark running more than hour to just to show sample results....

 

 

Please guide me if have faced these kind of issues, SPARK version is 1.6

New Contributor
Posts: 2
Registered: ‎01-06-2019

Re: Reading PARTITIONED HIVE table from SPARK 1.6

Any updates Techie's, nobody faced this issue??

Posts: 36
Topics: 0
Kudos: 10
Solutions: 3
Registered: ‎07-30-2018

Re: Reading PARTITIONED HIVE table from SPARK 1.6

Hi Vinod,

32K partition is huge to handle, We can define bucket instead of partition to avoid too many small files.

Can you share the type of query you are trying on this partitions.

Thanks
Jerry