Support Questions

sebastien_frack · ‎11-07-2017

Hi,

In my organization, Hive is used with the hive.support.concurrency setted to false.

I am wondering what are the consequences about inserting datas during a select (and vice versa).

At insert, I think the table's metadatas are updated at the very end of the Map/Reduce job.

Thus, a select should be not disturbed, because I think files involved by the select are determined at the very beginning of the M/R job...

For an insert overwrite, I think this is pretty similar, but I didn't find a confirmation during my research...

Could you validate (or not ;)) my thoughts ?

Thanks 🙂

ekoifman · ‎11-07-2017

If your competing read/insert target a single partition this should be safe since Hive uses 'rename' file system operation at the end of insert to make new files visible. Rename is atomic on HDFS. If your insert is a dynamic partition insert then you are writing multiple partitions and the data for each partition is using the 'rename' operation. This means that some read operation could see a set of files that reflects only part of the insert.

Insert overwrite actually deletes existing files so this can conflict with a concurrent read.

sebastien_frack · ‎11-09-2017

Thanks it helps.

before OVERWRITE :

$ hdfs dfs -ls /apps/hive/warehouse/xyz.db/table_tmp
Found 1 items
718 2017-11-09 10:18 /apps/hive/warehouse/xyz.db/table_tmp/000000_0

during OVERWRITE :

$ hdfs dfs -ls /apps/hive/warehouse/xyz.db/table_tmp
Found 2 items
0 2017-11-09 10:35 /apps/hive/warehouse/xyz.db/table_tmp/.hive-staging_hive_2017-11-09_10-35-38_682_2619781700846007196-1
718 2017-11-09 10:18 /apps/hive/warehouse/xyz.db/table_tmp/000000_0

after OVERWRITE :

$ hdfs dfs -ls /apps/hive/warehouse/xyz.db/table_tmp
Found 1 items
718 2017-11-09 10:35 /apps/hive/warehouse/xyz.db/table_tmp/000000_0

What I understand is that a query running (involving the file in example), for example, since 10:15 and still executing at 10:35 does not garantee a good execution (but I can presume the file, especially because it is small here, will have already been processed in a first stage of the M/R process).

Is that so ?

I am wondering if OVERWRITE is a good way to build intermediate table in this case... Without LOCK functionnality enabled, do you suggest a better way ?

Cloudera Community

Support Questions

what is the behaviour of select during an insert in HIVE with hive.support.concurrency=false ?

Spark 3 legacy configurations list ( Spark 2 behav...

Hive insert query optimization

Hive ARRAY column INSERT INTO SELECT FROM exceptio...

If SELECT return no rows, "INSERT OVERWRITE" of Hi...

Hive error encountered during table insertion

Resolution of Failed Knox Gateway Start During CDP...

Issue with Hive HQL insert query - KryoException -...

hive Insert to Dynamic Partition query Generating ...

Hive Insert into table issue

SELECT queries fail on Hive table view