I have a csv file that has huge amount of data which has the data for my hive table. Also, I want to create my hive table as an ORC table with partitions and the partition column is a computed column. So I first created an external table (table1) pointing to the file location, created another table (table2) with orc table properties as I want and inserted the data from table1 to table2.
The partition column should be: date_key DIV 10000000000.
The script I wrote is:
## to create the orc table: CREATE TABLE table_name ( key1 type1, key2 type2..... keyn typen) PARTITIONED BY (date_key DIV 10000000000 int) STORED AS ORC tblproperties(........); INSERT OVERWRITE TABLE table_name PARTITION (date_key DIV 10000000000) SELECT * FROM table_name2; I am getting an error: FAILED: ParseException line 12:25 cannot recognize input near 'DIV' '10000000000' 'int' in column type
Someone please suggest what is wrong here. Am I even doing this right? Is there a different/better way of doing this? Thanks!
Hi @Sree Kupp
Could you brief what are you mentioning about computed column? Are you referring data column?
BTW below approach should work fine.
CREATE TABLE table_name ( key1 type1, key2 type2..... keyn typen) PARTITIONED BY (date_key int) STORED AS ORC;
insert into table table_name partition(date_key) select * from table_name2;
Hi @Bala Vignesh N V I want to partition the table with "integer part of date_key/10000000000". date_key is one of the columns in the table. partitioning it by date_key along will give me millions of part files which I don't want. date_key is of the format: YYYYMMDDHHMMTZ (TZ is the time zone and is 4 digits). So dividing date_key by 10000000000 will make partitions by month.
I am trying to migrate data from Vertica to Hadoop. In vertica, the syntax is:
PARTITION BY (date_key // 10000000000).
The equivalent of // in Hadoop is "DIV". That means if date_key is not perfectly divisible bt the number 10000000000, it returns just the integer part and discards the decimal part.
But that is throwing an error for me.
@Sree Kupp I don't think that it will work. Are you able to create hive table with PARTITIONED BY (date_key DIV 10000000000 int) ?. You are defining a logic when creating a table which is not possible with hive. You can mention only the field name and its data type but not the logic in hive when creating partitions. Create the table as I have mentioned in the previous comment and when inserting do the calculation which you want to do in the select clause and then insert it. Let me know if you have any follow up question.