Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

basic hive clarification

avatar
Expert Contributor

1)I want to try the examples in the below link related to JSON.when i check in internet i found so many JSON SerDe's. Not sure which one i need to use and also what is the difference between each one? http://thornydev.blogspot.in/2013/07/querying-json-records-via-hive.html

2)I am new to hive and how to load data into hive table.When i check in Internet i found using Load and also using Insert statement. Not sure which one i need to use and also what is the difference between each one?

1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
5 REPLIES 5

avatar
Master Guru

LOAD by itself doesn't do any data transformations, it essentially takes files and puts them in the hive table directory. ( So you need to be sure to have created your hive table with the correct storage options for the data. )

If you need to transform data you need to create an external table and then use the insert into statement to transform your data.

avatar

Hi @vamsi valiveti

1) You can start with the JSON SerDe that's cited in the Hive documentation: https://cwiki.apache.org/confluence/display/Hive/Json+SerDe

CREATE TABLE my_table(a string, b bigint, ...)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE;

2) Load and insert are used in different scenarios:

  • Load is used to copy (or move) data from a file into you Hive table. This is used when you already have data in a file in HDFS or your local file system and you want to create a table to process it with SQL. Here you start by creating a Hive table and you do a load to get data from files into your Hive table.
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

If your data is in your local file system use 'LOCAL'. In this case data will be copied from you local file system into the HDFS directory of your Hive table (by default in the Hive warehouse folder). If your data is already in HDFS, do not use LOCAL but note that this moves (and does not copy) your HDFS files from their current path to the HDFS directory of your Hive table.

You have also other options such as OVERWRITE to decide if you want to delete existing data in you Hive table or if you want to append.

  • Insert is used if you want to copy data from a Hive table to another table. Here let say you have table 1 with data in it. You want to create table 2 to get a specific part of you table 1 data. You can combine an insert and select statement to get data that you want from Table 1 and store it in table 2
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;

Here again you have choice between appending new data or replacing everything (overwrite)

avatar
Expert Contributor

Thanks for input

1)I do not see any download option in https://cwiki.apache.org/confluence/display/Hive/Json+SerDe and where to download the jar file

2)when i check below link i can see below sql statement.My question is when we will use below insert statement and normal load statement?

insert into table testTableNew values (1,'row1'),(2,'row2'),(3,'row3');

http://sanjivblogs.blogspot.com/2014/12/transactions-are-available-in-hive-014.html?view=classic

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

@vamsi valiveti have you tried this solution ?