Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

CSV Query to run from hive.default.fileformat is ORC

avatar
Explorer

Hi,

i would like to run the query LOAD DATA INPATH '/user/maria_dev/drivers.csv' OVERWRITE INTO TABLE temp_drivers;

But my hive default fileformat is ORC, Data is in .csv format.help.png

1 ACCEPTED SOLUTION

avatar
Rising Star

Satya, when you created the table temp_drivers, what storage format did you use?

In the tutorial that you linked, the create table statement was:

create table temp_drivers (col_value STRING);

If you you changed it to

create table temp_drivers (col_value STRING) stored as ORC;

Then you might consider deleting the table and creating it new.

If you did not define the 'stored as ORC' storage part in the statement then you might want to check your parameters in Hive: The default format is plain text files. TEXTFILE is the default file format, unless the configuration parameter hive.default.fileformat has a different setting.

View solution in original post

10 REPLIES 10

avatar
Super Guru

you can change your default file format by setting set hive.default.fileformat=TextFile;

avatar
Explorer

Thanks, could you please share your email address. my issue is

http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/

I am practicing from above link, but my hive.default.fileformat is ORC

1.png2.png

avatar
Explorer

how to change, initially i configured default as ORC .

avatar
Super Guru

at session level just run this command set hive.default.fileformat=TextFile; and then run your query

avatar
Rising Star

Hi.

You can either use HDF/Nifi and directly read the CSV file, transform it to the needed format and upload it to HDFS or do the following using standard HDP tools:

1. Move your CSV file to HDFS

-copyFromLocal drivers.csv /user/maria_dev/drivers.csv

2.Create an external table to manage the csv content in HIVE

CREATE EXTERNAL TABLE IF NOT EXISTS drivers(
        ...
	)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    STORED AS TEXTFILE
    location '/user/maria_dev/drivers.csv;

3. Create an ORC table to be able to insert your Hive managed table:

CREATE TABLE IF NOT EXISTS mydrivers(
        ...	)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    STORED AS ORC;

4. Now, use an SQL statement to move the data from the external table that you created in Step 2 to the Hive-managed ORC table that you created in Step 3:

INSERT OVERWRITE TABLE mydrivers SELECT * FROM drivers;

5.Verify that you imported the data into the ORC-formatted table correctly:

hive> select * from mydrivers limit 3;

Hope this helps.

Also there is a detailed documentation about this topic at: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_data-access/content/moving_data_from_hdf...

HTH.

avatar
Explorer

Thanks, but when i run the query the below error i am getting.

org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 30019]: The file that you are trying to load does not match the file format of the destination table. Destination table is stored as ORC but the file being loaded is not a valid ORC file.

avatar
Rising Star

Satya, when you created the table temp_drivers, what storage format did you use?

In the tutorial that you linked, the create table statement was:

create table temp_drivers (col_value STRING);

If you you changed it to

create table temp_drivers (col_value STRING) stored as ORC;

Then you might consider deleting the table and creating it new.

If you did not define the 'stored as ORC' storage part in the statement then you might want to check your parameters in Hive: The default format is plain text files. TEXTFILE is the default file format, unless the configuration parameter hive.default.fileformat has a different setting.

avatar
Explorer

i am trying to load a orc table with a non orc file,but during installation time of ambari i changed hive.default.fileformat to ORC (default is TextFile)

2.png1.png

avatar
Rising Star

Great. Then please drop table and create it again using the STORED AS TEXTFILE parameter or use my above described procedure to import the data using a temp table inbetween if you really need it as ORC stored data.