Support Questions

Find answers, ask questions, and share your expertise

External Hive Partitioned table, is empty!!

avatar
Rising Star

@Neeraj Sabharwal

Hi all,

Im trying to create an external hive partitioned table which location points to an HDFS location.This HDFS location get appended every time i run my spark streaming application, so my hive table appends too.

	Kafka >> Spark Streaming >> HDFS >> Hive External Table.

I could do the above flow smoothly with a non partitioned table, but when i want to add partitions to my external table i'm not able to get data in to my hive external table whatsoever.

I have tried the below after creating the external table, but still the problem persists.

<code>CREATE EXTERNAL TABLE user (
  userId BIGINT,
  type INT,
  level TINYINT,
  date String
)
PARTITIONED BY (date String)
LOCATION '/external_table_path';
<code>ALTER TABLE user ADD PARTITION(date='2010-02-22');

I tried the above fix but the things haven't changed.

What is the best work around to get my appending HDFS data in to an external Partition table??

3 REPLIES 3

avatar
Contributor

@Neeraj Sabharwal, your ALTER TABLE statement should work. One question: Do you place the data for partition in a subdirectory with the partition name?

Hive partitions exist as subdirectories. For example, your user table should have a structure similar to this:

/external_table_path/date=2010-02-22
/external_table_path/date=2010-02-23
/external_table_path/date=2010-02-24

And so on.

The ALTER TABLE statement will create the directories as well as adding the partition details to the Hive metastore. Once the partitions are created you can simply drop the right file/s in the right directory.

Cheers, Steven.

avatar
Rising Star

@Steven O'Neill

i'm not touching the data in the HDFS. it has created a seperate folder for the date column on using alter statement.

avatar

Want to point out couple of things..

  1. date is a hive key word. To have column called date you need to say `date` string,.
  2. If you specify that a table is partitioned ( in your cased partitioned by (`date` string) ), that partition is implicitly accepted as the last column of the table. No need to specify it in the schema again.
CREATE EXTERNAL TABLE user ( userId BIGINT,  type INT,  level TINYINT )PARTITIONED BY (`date` String)LOCATION '/external_table_path';

As @Steven O'Neill mentioned, when ingesting data into HDFS, you should create sub-directories like date=2016-01-01 under your table location. Then your alter table statement will work.