Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

what is the default behavior of insert overwrite on external hdfs table?

what is the default behavior of insert overwrite on external hdfs table?

New Contributor

Our Hortonworks version is 2.6.3.0-235, our Hive version is 1.2.1000

We have the following issue at the moment:

Hive insert overwrite will fail on external table if the external table's folder does not exist.

Here is the details:

We have an external table "config_another_test_output". Here is the table information:

+---------------------------------------------------------------------------------------------------------------------+--+
| createtab_stmt |
+---------------------------------------------------------------------------------------------------------------------+--+
| CREATE EXTERNAL TABLE `config_another_test_output`( |
| `stars` string, |
| `desc` string, |
| `pros` string, |
| `cons` string, |
| `advice` string, |
| `summary` string) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
| LOCATION |
| 'hdfs://titan/dev/10112/app/TC30/dataiku/CONFIG_ANOTHER_TEST/output' |
| TBLPROPERTIES ( |
| 'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', |
| 'DSS_DEFAUT_HIVE_TABLE_TAG'='eyJtZ3QiOnRydWUsImZtdCI6IjhpeTNOYSIsInBhcnQiOiIxQjJNMlkiLCJsb2MiOiJNNzJVeHAifQ==', |
| 'numFiles'='1', |
| 'numRows'='621', |
| 'orc.compress'='SNAPPY', |
| 'rawDataSize'='1548153', |
| 'totalSize'='233502', |
| 'transient_lastDdlTime'='1542815783') |
+---------------------------------------------------------------------------------------------------------------------+--+

In the location section, we have the HDFS path for this table.

Our current problem is:

if the above output folder ('hdfs://titan/dev/10112/app/TC30/dataiku/CONFIG_ANOTHER_TEST/output' ) does not exist, insert overwrite

will fail:

INFO: Loading data to table dev_tc30_dataiku.config_another_test_output from hdfs://titan/tmp/.hive-staging_hive_2018-11-21_10-45-41_452_43360044430205414-24417/-ext-10000

ERROR : Failed with exception Destination directory hdfs://titan/dev/10112/app/TC30/dataiku/CONFIG_ANOTHER_TEST/output has not be cleaned up.

org.apache.hadoop.hive.ql.metadata.HiveException: Destination directory hdfs://titan/dev/10112/app/TC30/dataiku/CONFIG_ANOTHER_TEST/output has not be cleaned up.

at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3380)

at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1895)

at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:373)

at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:162)

at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)

at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1756)

at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1497)

at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1294)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1156)

at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:197)

at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:76)

at org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:266)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask (state=08S01,code=1)

Our question is: what is the default behavior of insert overwrite if the folder does not exist? should insert overwrite create the folder if the folder does not exist? is this an issue for hive 1.2?

Thanks for all the helps.

John

2 REPLIES 2

Re: what is the default behavior of insert overwrite on external hdfs table?

Cloudera Employee

Hi @dahai long;

Have you tried using:

hdfs dfs -chown -R <your_user>:<your_group>  /<related path>

i.e.

[hdfs@c3253-node3 ~]$ hdfs dfs -chown -R hive:hive /data1

I tried without the -R and could not create the directory automatically, but once changed the owner and also added the -R, did not have any issue, could you please try that and let me know the results.

Regards,

Ariel Q.

Re: what is the default behavior of insert overwrite on external hdfs table?

New Contributor

Hi Ariel, thanks for your reply.

We are actually using a vendor application that using Hive, the vendor application first delete the folder then do the insert overwrite on the external table and it fails, and it is hard to change the process within the vendor application.

For our cluster setting, hive should able to impersonate all users:

<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>

Vendor claimed that the insert overwrite should able to create the folder if the folder does not exist:

"the behavior you’ve shown with beeline is not the expected behavior for insert/overwrite. Insert/overwrite should create the directory and write to it, or if it already exists, should clobber then remake it."

We are not too sure about if this is true, that's why we are asking about the default behavior of insert overwrite.

What are we testing is a very simple insert overwrite:

INSERT OVERWRITE TABLE `dev_tc30_dataiku`.`config_another_test_output` SELECT *

FROM `dev_tc30_dataiku`.`config_another_test_dev_demo`

Thanks,

John

Don't have an account?
Coming from Hortonworks? Activate your account here