Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to insert backdated data in Hive Table ?

How to insert backdated data in Hive Table ?

I'm new to this technology. We receive the file which may contain backdated records. From which we have to load the data into the hive table which will be insert only(partitioned on trans_Date). I need to know what should be the mechanism to insert the records in the table, where trans_date is backdated. Trans_date(column) is the transaction date and Record_date(column) is the date on which the record is inserted in the table.

1 REPLY 1
Highlighted

Re: How to insert backdated data in Hive Table ?

Hey @Prathamesh H!
My suggestion for backfill in Hive, is to use partition on hive tables.
For each new (or old) data, you will create a new partition and they will be separately from hot/cold data.
BTW: you two types of partition: dynamic and static. Usually when you want to assume that your partition will be a value from a source column ca use dynamic, or if you intended to add manually, use static.
Here's some good link with further details.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual
https://hortonworks.com/blog/update-hive-tables-easy-way/

And if you're planning to test you hive tables, I'd suggest to use EXTERNAL TABLES, they give you freedom to drop the schema of the table whenever you want, without deleting the data from the table.
Here's an example:

#My data
[hive@hwc1225-node4 ~]$ cat simple_ds 
Vinicius,27
John,28
Somebody,0
Death,99999
WalkingDead,

#Creating the table (i'm using external table)
hive> CREATE EXTERNAL TABLE simple_ds(
    > name STRING,
    > age INT
    > )
    > PARTITIONED BY(year string)
    > LOCATION '/user/hive/warehouse/simple_ds';
hive> ALTER TABLE simple_ds ADD PARTITION(year=2017);
OK
Time taken: 0.321 seconds
hive> show partitions simple_ds;
OK
year=2017
Time taken: 0.569 seconds, Fetched: 1 row(s)


#CREATING THE PARTITION OUTSIDE OF HIVE
[hive@hwc1225-node4 ~]$ hdfs  dfs -mkdir /user/hive/warehouse/simple_ds/year=2002


#Backing to Hive to refresh its partitions
[hive@hwc1225-node4 ~]$ hive
hive> msck repair table simple_ds;
OK
Partitions not in metastore:	simple_ds:year=2002
Repair: Added partition to metastore simple_ds:year=2002
Time taken: 4.479 seconds, Fetched: 2 row(s)
hive> show partitions simple_ds;
OK
year=2002
year=2017
Time taken: 0.904 seconds, Fetched: 2 row(s)


#Feeding the simple_ds table
hdfs dfs -put simple_ds /user/hive/warehouse/simple_ds/year=2002/


#Oh damn i forgot to put some comments on the table and the delimiter
[hive@hwc1225-node4 ~]$ hive
hive> drop table simple_ds;
hive > CREATE EXTERNAL TABLE simple_ds(
    > name STRING,
    > age INT
    > ) COMMENT 'Table Simple_DS for HCC'
    > PARTITIONED BY(year string)
    > ROW FORMAT DELIMITED
    > FIELDS TERMINATED BY ','
    > LOCATION '/user/hive/warehouse/simple_ds';
hive> msck repair table simple_ds;
OK
Partitions not in metastore:	simple_ds:year=2002	simple_ds:year=2017
Repair: Added partition to metastore simple_ds:year=2002
Repair: Added partition to metastore simple_ds:year=2017
Time taken: 0.689 seconds, Fetched: 3 row(s)
hive> select * from simple_ds;
OK
Vinicius	27	2002
John	28	2002
Somebody	0	2002
Death	99999	2002
WalkingDead	NULL	2002
Time taken: 1.119 seconds, Fetched: 5 row(s)

Hope this helps!

Don't have an account?
Coming from Hortonworks? Activate your account here