Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

CSV Query to run from hive.default.fileformat is ORC

Solved Go to solution
Highlighted

CSV Query to run from hive.default.fileformat is ORC

New Contributor

Hi,

i would like to run the query LOAD DATA INPATH '/user/maria_dev/drivers.csv' OVERWRITE INTO TABLE temp_drivers;

But my hive default fileformat is ORC, Data is in .csv format.help.png

1 ACCEPTED SOLUTION

Accepted Solutions

Re: CSV Query to run from hive.default.fileformat is ORC

Contributor

Satya, when you created the table temp_drivers, what storage format did you use?

In the tutorial that you linked, the create table statement was:

create table temp_drivers (col_value STRING);

If you you changed it to

create table temp_drivers (col_value STRING) stored as ORC;

Then you might consider deleting the table and creating it new.

If you did not define the 'stored as ORC' storage part in the statement then you might want to check your parameters in Hive: The default format is plain text files. TEXTFILE is the default file format, unless the configuration parameter hive.default.fileformat has a different setting.

10 REPLIES 10

Re: CSV Query to run from hive.default.fileformat is ORC

you can change your default file format by setting set hive.default.fileformat=TextFile;

Re: CSV Query to run from hive.default.fileformat is ORC

New Contributor

Thanks, could you please share your email address. my issue is

http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/

I am practicing from above link, but my hive.default.fileformat is ORC

1.png2.png

Re: CSV Query to run from hive.default.fileformat is ORC

New Contributor

how to change, initially i configured default as ORC .

Re: CSV Query to run from hive.default.fileformat is ORC

at session level just run this command set hive.default.fileformat=TextFile; and then run your query

Re: CSV Query to run from hive.default.fileformat is ORC

Contributor

Hi.

You can either use HDF/Nifi and directly read the CSV file, transform it to the needed format and upload it to HDFS or do the following using standard HDP tools:

1. Move your CSV file to HDFS

-copyFromLocal drivers.csv /user/maria_dev/drivers.csv

2.Create an external table to manage the csv content in HIVE

CREATE EXTERNAL TABLE IF NOT EXISTS drivers(
        ...
	)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    STORED AS TEXTFILE
    location '/user/maria_dev/drivers.csv;

3. Create an ORC table to be able to insert your Hive managed table:

CREATE TABLE IF NOT EXISTS mydrivers(
        ...	)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    STORED AS ORC;

4. Now, use an SQL statement to move the data from the external table that you created in Step 2 to the Hive-managed ORC table that you created in Step 3:

INSERT OVERWRITE TABLE mydrivers SELECT * FROM drivers;

5.Verify that you imported the data into the ORC-formatted table correctly:

hive> select * from mydrivers limit 3;

Hope this helps.

Also there is a detailed documentation about this topic at: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_data-access/content/moving_data_from_hdf...

HTH.

Re: CSV Query to run from hive.default.fileformat is ORC

New Contributor

Thanks, but when i run the query the below error i am getting.

org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 30019]: The file that you are trying to load does not match the file format of the destination table. Destination table is stored as ORC but the file being loaded is not a valid ORC file.

Re: CSV Query to run from hive.default.fileformat is ORC

Contributor

Satya, when you created the table temp_drivers, what storage format did you use?

In the tutorial that you linked, the create table statement was:

create table temp_drivers (col_value STRING);

If you you changed it to

create table temp_drivers (col_value STRING) stored as ORC;

Then you might consider deleting the table and creating it new.

If you did not define the 'stored as ORC' storage part in the statement then you might want to check your parameters in Hive: The default format is plain text files. TEXTFILE is the default file format, unless the configuration parameter hive.default.fileformat has a different setting.

Re: CSV Query to run from hive.default.fileformat is ORC

New Contributor

i am trying to load a orc table with a non orc file,but during installation time of ambari i changed hive.default.fileformat to ORC (default is TextFile)

2.png1.png

Re: CSV Query to run from hive.default.fileformat is ORC

Contributor

Great. Then please drop table and create it again using the STORED AS TEXTFILE parameter or use my above described procedure to import the data using a temp table inbetween if you really need it as ORC stored data.

Don't have an account?
Coming from Hortonworks? Activate your account here