Created on 11-28-2016 09:33 AM - edited 09-16-2022 03:49 AM
Hi,
i would like to run the query LOAD DATA INPATH '/user/maria_dev/drivers.csv' OVERWRITE INTO TABLE temp_drivers;
But my hive default fileformat is ORC, Data is in .csv format.help.png
Created 11-29-2016 08:09 AM
Satya, when you created the table temp_drivers, what storage format did you use?
In the tutorial that you linked, the create table statement was:
create table temp_drivers (col_value STRING);
If you you changed it to
create table temp_drivers (col_value STRING) stored as ORC;
Then you might consider deleting the table and creating it new.
If you did not define the 'stored as ORC' storage part in the statement then you might want to check your parameters in Hive: The default format is plain text files. TEXTFILE is the default file format, unless the configuration parameter hive.default.fileformat has a different setting.
Created 11-28-2016 09:44 AM
you can change your default file format by setting set hive.default.fileformat=TextFile;
Created 11-28-2016 08:06 PM
Thanks, could you please share your email address. my issue is
http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/
I am practicing from above link, but my hive.default.fileformat is ORC
Created 11-28-2016 09:46 AM
how to change, initially i configured default as ORC .
Created 11-28-2016 09:47 AM
at session level just run this command set hive.default.fileformat=TextFile; and then run your query
Created 11-28-2016 09:47 AM
Hi.
You can either use HDF/Nifi and directly read the CSV file, transform it to the needed format and upload it to HDFS or do the following using standard HDP tools:
1. Move your CSV file to HDFS
-copyFromLocal drivers.csv /user/maria_dev/drivers.csv
2.Create an external table to manage the csv content in HIVE
CREATE EXTERNAL TABLE IF NOT EXISTS drivers( ... ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/user/maria_dev/drivers.csv;
3. Create an ORC table to be able to insert your Hive managed table:
CREATE TABLE IF NOT EXISTS mydrivers( ... ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS ORC;
4. Now, use an SQL statement to move the data from the external table that you created in Step 2 to the Hive-managed ORC table that you created in Step 3:
INSERT OVERWRITE TABLE mydrivers SELECT * FROM drivers;
5.Verify that you imported the data into the ORC-formatted table correctly:
hive> select * from mydrivers limit 3;
Hope this helps.
Also there is a detailed documentation about this topic at: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_data-access/content/moving_data_from_hdf...
HTH.
Created 11-28-2016 07:13 PM
Thanks, but when i run the query the below error i am getting.
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 30019]: The file that you are trying to load does not match the file format of the destination table. Destination table is stored as ORC but the file being loaded is not a valid ORC file.
Created 11-29-2016 08:09 AM
Satya, when you created the table temp_drivers, what storage format did you use?
In the tutorial that you linked, the create table statement was:
create table temp_drivers (col_value STRING);
If you you changed it to
create table temp_drivers (col_value STRING) stored as ORC;
Then you might consider deleting the table and creating it new.
If you did not define the 'stored as ORC' storage part in the statement then you might want to check your parameters in Hive: The default format is plain text files. TEXTFILE is the default file format, unless the configuration parameter hive.default.fileformat has a different setting.
Created 11-29-2016 08:20 AM
i am trying to load a orc table with a non orc file,but during installation time of ambari i changed hive.default.fileformat to ORC (default is TextFile)
2.png1.png
Created 11-29-2016 08:37 AM
Great. Then please drop table and create it again using the STORED AS TEXTFILE parameter or use my above described procedure to import the data using a temp table inbetween if you really need it as ORC stored data.