- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
CSV Query to run from hive.default.fileformat is ORC
- Labels:
-
Apache Hive
Created on ‎11-28-2016 09:33 AM - edited ‎09-16-2022 03:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
i would like to run the query LOAD DATA INPATH '/user/maria_dev/drivers.csv' OVERWRITE INTO TABLE temp_drivers;
But my hive default fileformat is ORC, Data is in .csv format.help.png
Created ‎11-29-2016 08:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Satya, when you created the table temp_drivers, what storage format did you use?
In the tutorial that you linked, the create table statement was:
create table temp_drivers (col_value STRING);
If you you changed it to
create table temp_drivers (col_value STRING) stored as ORC;
Then you might consider deleting the table and creating it new.
If you did not define the 'stored as ORC' storage part in the statement then you might want to check your parameters in Hive: The default format is plain text files. TEXTFILE is the default file format, unless the configuration parameter hive.default.fileformat has a different setting.
Created ‎11-28-2016 09:44 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you can change your default file format by setting set hive.default.fileformat=TextFile;
Created ‎11-28-2016 08:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, could you please share your email address. my issue is
http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/
I am practicing from above link, but my hive.default.fileformat is ORC
Created ‎11-28-2016 09:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
how to change, initially i configured default as ORC .
Created ‎11-28-2016 09:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
at session level just run this command set hive.default.fileformat=TextFile; and then run your query
Created ‎11-28-2016 09:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi.
You can either use HDF/Nifi and directly read the CSV file, transform it to the needed format and upload it to HDFS or do the following using standard HDP tools:
1. Move your CSV file to HDFS
-copyFromLocal drivers.csv /user/maria_dev/drivers.csv
2.Create an external table to manage the csv content in HIVE
CREATE EXTERNAL TABLE IF NOT EXISTS drivers( ... ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/user/maria_dev/drivers.csv;
3. Create an ORC table to be able to insert your Hive managed table:
CREATE TABLE IF NOT EXISTS mydrivers( ... ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS ORC;
4. Now, use an SQL statement to move the data from the external table that you created in Step 2 to the Hive-managed ORC table that you created in Step 3:
INSERT OVERWRITE TABLE mydrivers SELECT * FROM drivers;
5.Verify that you imported the data into the ORC-formatted table correctly:
hive> select * from mydrivers limit 3;
Hope this helps.
Also there is a detailed documentation about this topic at: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_data-access/content/moving_data_from_hdf...
HTH.
Created ‎11-28-2016 07:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, but when i run the query the below error i am getting.
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 30019]: The file that you are trying to load does not match the file format of the destination table. Destination table is stored as ORC but the file being loaded is not a valid ORC file.
Created ‎11-29-2016 08:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Satya, when you created the table temp_drivers, what storage format did you use?
In the tutorial that you linked, the create table statement was:
create table temp_drivers (col_value STRING);
If you you changed it to
create table temp_drivers (col_value STRING) stored as ORC;
Then you might consider deleting the table and creating it new.
If you did not define the 'stored as ORC' storage part in the statement then you might want to check your parameters in Hive: The default format is plain text files. TEXTFILE is the default file format, unless the configuration parameter hive.default.fileformat has a different setting.
Created ‎11-29-2016 08:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i am trying to load a orc table with a non orc file,but during installation time of ambari i changed hive.default.fileformat to ORC (default is TextFile)
2.png1.png
Created ‎11-29-2016 08:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Great. Then please drop table and create it again using the STORED AS TEXTFILE parameter or use my above described procedure to import the data using a temp table inbetween if you really need it as ORC stored data.
