Created 06-22-2016 01:39 PM
Hello,
Is it possible to import data from a CSV file into a hive table the orc format?
Thank
Created 06-22-2016 01:42 PM
You can load the data from csv file to a temp hive table with same structure as orc table, then insert the data into orc table as:
insert into table table_orc as select * from table_textfile;
Thanks and Regards,
Sindhu
Created 06-22-2016 01:42 PM
You can load the data from csv file to a temp hive table with same structure as orc table, then insert the data into orc table as:
insert into table table_orc as select * from table_textfile;
Thanks and Regards,
Sindhu
Created 06-22-2016 01:54 PM
Thank you. but I would go directly from the csv file to the hive orc table format without creating the textfile data.
Thank
Created 06-22-2016 02:53 PM
Hi @alain
One more way:
3 Step Method
Step 1: You can create a external table pointing to an HDFS location conforming to the schema of your csv file. You can drop the csv file(s) into the external table location.
Step 2: Create a managed Hive table with ORC format.
Step 3: Do Insert into Managed table select from External table. ( Once the records are copied, delete the files from the external directory)
This process can be automated using scripting via oozie or cron. I have used this to do mass batch ingestion.
More recent way of doing this is using Apache Nifi with Hive table processor, makes life much more simpler..:). If you want to read about Nifi please go to
http://hortonworks.com/products/hdf/
Thanks
Satish
Created 06-23-2016 08:47 AM
@alain TSAFACK Ambari Hive Views provide this feature (Upload Table) where you can directly upload CSV file into an ORC Hive table.( It takes care internally the 2 step process to create ORC table)
Created 06-23-2016 10:43 AM
when uploading a CSV file containing "\N", I simply get the string value "N" instead of NULL in hive
is there someone help to solve it ?
https://github.com/ogrodnek/csv-serde/issues/15
,when uploading a CSV file containing "\N", I simply get the string value "N" instead of NULL in hive
is there someone to solve it?
Created 06-23-2016 10:49 AM
I have wrote a hard code in class org.apache.hadoop.hive.serde2.OpenCSVSerde, but it doesn't work when I replace the old jar "/usr/hdp/current/hive-client/lib/hive-serde-1.2.1.2.3.0.0-2557.jar". what should I do to make the new jar work?
@Override public Object deserialize(final Writable blob) throws SerDeException { Text rowText = (Text) blob; String text = rowText.toString().replace("\\N","\"\""); CSVReader csv = null; try { csv = newReader(new CharArrayReader(text.toCharArray()), separatorChar, quoteChar, escapeChar);