Created 01-28-2016 05:29 PM
Hi all,
I'm having an issue I was hoping someone could help me with, I believe its due to how my tables are being partitioned but I'm struggling to come up with a solution.
I have created a table such as the example one below
CREATE TABLE Demo (time timestamp COMMENT 'timestamp in format yyyymmddTss:mm:hh', exampleId varchar(6) COMMENT 'example field' example2 varchar(10) COMMENT 'example field' example3 varchar(50) COMMENT 'example field' example4 varchar(50) COMMENT 'example field' ) COMMENT 'A table to demonstrate my problem' PARTITIONED BY (TRAN_DATE DATE COMMENT 'Transaction Date') CLUSTERED BY (exampleId) SORTED BY (exampleId) INTO 24 BUCKETS stored as orc;
And I am then trying to copy data from a CSV file into a table using an external table such as below
Drop TABLE Demo_staging CREATE TABLE Demo_staging (time timestamp COMMENT 'timestamp in format yyyymmddTss:mm:hh', exampleId varchar(6) COMMENT 'example field' example2 varchar(10) COMMENT 'example field' example3 varchar(50) COMMENT 'example field' example4 varchar(50) COMMENT 'example field' ) COMMENT 'The staging table to demonstrate my problem' row format delimited fields terminated by ',' null defined as '\001' STORED AS TEXTFILE LOCATION '${appPath}/raw' tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="2"); insert overwrite table Demo partition (TRAN_DATE = ${day}) SELECT * FROM Demo_staging;
The value in TRAN_DATE should be a date format of format YYYYMMDD which is derived from the field time in which all the values are set to 2015-06-20T00:00:00 but i'm not sure how TRAN_DATE is supposed to get this value. The value of ${day} is 20150620. I've tried using the following as a test to see the data appearing but have had no luck
insert overwrite table Demo partition (to_char(time,YYYY-MM-DD) = ${day}) SELECT * FROM Demo_staging;
I can see the data has appeared in my staging table but it does not make it to the actual table and I can only think of the partitioning being the reason for this.
Any help is greatly appreciated.
Thanks
Created 01-28-2016 05:39 PM
In the partition field the left value is the column so you do not change this. The right value is the partition you want to load into. I.e.
INSERT ... partition( TRAN_DATE= your date )
My tip: change the partition field from Date to Integer. Then it would work.
Alternatively you could cast the right side to a date.
partition ( TRAN_DATE = to_date(${day},'YYYY-MM-DD'))
but honestly a partition column of string or integer is most likely less headache. I never used a Date so not sure if this would work.
Created 01-28-2016 05:39 PM
In the partition field the left value is the column so you do not change this. The right value is the partition you want to load into. I.e.
INSERT ... partition( TRAN_DATE= your date )
My tip: change the partition field from Date to Integer. Then it would work.
Alternatively you could cast the right side to a date.
partition ( TRAN_DATE = to_date(${day},'YYYY-MM-DD'))
but honestly a partition column of string or integer is most likely less headache. I never used a Date so not sure if this would work.
Created 01-29-2016 11:38 AM
Thanks, this is very helpful in understanding the partitioning. I've made the changes you have suggested but i'm still not getting any data into my table, despite the staging table being populated with data, do you know why this might be the case?