I have staging area and data loaded in staging is in avro format . can I create ORC file from avro file format directly without creating avro table in hive ? as data in already in the binary format in avro , is it possible to directly create ORC table rather than first creating DDL in hive in avro format and later insert data in ORC table from avro ?
1. ORC files can be created from Avro but not directly. This can be done in two steps.
a. Convert the Avro into json format using avro-tools jar on command line.
b. Convert the json file into ORC using orc-tools jar. (introduced from ORC v1.4) [See: https://orc.apache.org/news/2017/05/08/ORC-1.4.0/]
2. Through Hive tables - Yes, we can accomplish this by creating a new table with ORC storage format and inserting data from the table which has the data in Avro format. [table2 in the below example stores the data in ORC format and table1 in Avro]
CREATE TABLE test2 (col1 string, col2 string) STORED AS ORC; INSERT INTO test2 select * from test1;