Support Questions

Find answers, ask questions, and share your expertise

Most efficient/optimized way to load data into Hive External and ORC Table,Different between ORC table and CSv file append to external tables

Want to load the everyday csv file data into hive external table on daily basis.

There is two method to load csv file into hive external table .

1) CSV file load into HDFS Path having Date in csv file and point the HDFS Path into hive external table.

2) Load CSV file into Hive ORC table (Command INSERT ).

Which is the most efficient/optimized way to do this loading process.

,

1 REPLY 1

@kotesh banoth

In terms of time taken to load into an external table, copying and placing the csv file in the external table location is easiest and simplest way which consumes very less amount of time than loading into an ORC table. Because you cant load a csv file into a ORC table directly, you need to create a stage table with csvserde and then load from stage table into an ORC table which is a two step process. Where as copying and placing a file is one step and less time consuming.

Above are related only in terms of loading a data into a table. While if you compare reading a data from a table, ORC will perform better than a normal TEXT format table. Hope it helps.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.