Im trying to create an external hive partitioned table which location points to an HDFS location.This HDFS location get appended every time i run my spark streaming application, so my hive table appends too.
Kafka >> Spark Streaming >> HDFS >> Hive External Table.
I could do the above flow smoothly with a non partitioned table, but when i want to add partitions to my external table i'm not able to get data in to my hive external table whatsoever.
I have tried the below after creating the external table, but still the problem persists.
<code>CREATE EXTERNAL TABLE user ( userId BIGINT, type INT, level TINYINT, date String ) PARTITIONED BY (date String) LOCATION '/external_table_path';
<code>ALTER TABLE user ADD PARTITION(date='2010-02-22');
I tried the above fix but the things haven't changed.
What is the best work around to get my appending HDFS data in to an external Partition table??
@Neeraj Sabharwal, your ALTER TABLE statement should work. One question: Do you place the data for partition in a subdirectory with the partition name?
Hive partitions exist as subdirectories. For example, your user table should have a structure similar to this:
/external_table_path/date=2010-02-22 /external_table_path/date=2010-02-23 /external_table_path/date=2010-02-24
And so on.
The ALTER TABLE statement will create the directories as well as adding the partition details to the Hive metastore. Once the partitions are created you can simply drop the right file/s in the right directory.
Want to point out couple of things..
CREATE EXTERNAL TABLE user ( userId BIGINT, type INT, level TINYINT )PARTITIONED BY (`date` String)LOCATION '/external_table_path';
As @Steven O'Neill mentioned, when ingesting data into HDFS, you should create sub-directories like date=2016-01-01 under your table location. Then your alter table statement will work.