Support Questions
Find answers, ask questions, and share your expertise

Hive table not being populated with data from staging table

Contributor

Hi all,

I'm having an issue I was hoping someone could help me with, I believe its due to how my tables are being partitioned but I'm struggling to come up with a solution.

I have created a table such as the example one below

CREATE TABLE Demo
(time timestamp COMMENT 'timestamp in format yyyymmddTss:mm:hh', 
exampleId varchar(6) COMMENT 'example field'
example2 varchar(10) COMMENT 'example field'
example3 varchar(50) COMMENT 'example field'
example4 varchar(50) COMMENT 'example field'
)
COMMENT 'A table to demonstrate my problem'
PARTITIONED BY (TRAN_DATE DATE COMMENT 'Transaction Date') 
CLUSTERED BY (exampleId)
SORTED BY (exampleId) INTO 24 BUCKETS
stored as orc;

And I am then trying to copy data from a CSV file into a table using an external table such as below

Drop TABLE Demo_staging

CREATE TABLE Demo_staging
(time timestamp COMMENT 'timestamp in format yyyymmddTss:mm:hh', 
exampleId varchar(6) COMMENT 'example field'
example2 varchar(10) COMMENT 'example field'
example3 varchar(50) COMMENT 'example field'
example4 varchar(50) COMMENT 'example field'
)
COMMENT 'The staging table to demonstrate my problem'
row format delimited fields terminated by ',' null defined as '\001'
STORED AS TEXTFILE 
LOCATION '${appPath}/raw'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="2");
insert overwrite table Demo partition (TRAN_DATE = ${day}) SELECT * FROM Demo_staging;

The value in TRAN_DATE should be a date format of format YYYYMMDD which is derived from the field time in which all the values are set to 2015-06-20T00:00:00 but i'm not sure how TRAN_DATE is supposed to get this value. The value of ${day} is 20150620. I've tried using the following as a test to see the data appearing but have had no luck

insert overwrite table Demo partition (to_char(time,YYYY-MM-DD) = ${day}) SELECT * FROM Demo_staging;

I can see the data has appeared in my staging table but it does not make it to the actual table and I can only think of the partitioning being the reason for this.

Any help is greatly appreciated.

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

In the partition field the left value is the column so you do not change this. The right value is the partition you want to load into. I.e.

INSERT ... partition( TRAN_DATE= your date )

My tip: change the partition field from Date to Integer. Then it would work.

Alternatively you could cast the right side to a date.

partition ( TRAN_DATE = to_date(${day},'YYYY-MM-DD'))

but honestly a partition column of string or integer is most likely less headache. I never used a Date so not sure if this would work.

View solution in original post

2 REPLIES 2

In the partition field the left value is the column so you do not change this. The right value is the partition you want to load into. I.e.

INSERT ... partition( TRAN_DATE= your date )

My tip: change the partition field from Date to Integer. Then it would work.

Alternatively you could cast the right side to a date.

partition ( TRAN_DATE = to_date(${day},'YYYY-MM-DD'))

but honestly a partition column of string or integer is most likely less headache. I never used a Date so not sure if this would work.

View solution in original post

Contributor

Thanks, this is very helpful in understanding the partitioning. I've made the changes you have suggested but i'm still not getting any data into my table, despite the staging table being populated with data, do you know why this might be the case?