Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Reading PARTITIONED HIVE table from SPARK 1.6

Reading PARTITIONED HIVE table from SPARK 1.6

New Contributor

I have partitioned HIVE table with only 600MB size, but its has 32000 directories due to those partitons of table.

 

Partitioned on 2 columns of HIVE table....

 

Due to this reading table to dataframe using sqlContext.sql("SELECT * FROM .....") taking 32000 partitions in spark and data skewness spark running more than hour to just to show sample results....

 

 

Please guide me if have faced these kind of issues, SPARK version is 1.6

2 REPLIES 2
Highlighted

Re: Reading PARTITIONED HIVE table from SPARK 1.6

New Contributor

Any updates Techie's, nobody faced this issue??

Re: Reading PARTITIONED HIVE table from SPARK 1.6

Rising Star
Hi Vinod,

32K partition is huge to handle, We can define bucket instead of partition to avoid too many small files.

Can you share the type of query you are trying on this partitions.

Thanks
Jerry