Member since
11-21-2018
2
Posts
0
Kudos Received
0
Solutions
11-22-2018
02:39 AM
Hi Ariel, thanks for your reply. We are actually using a vendor application that using Hive, the vendor application first delete the folder then do the insert overwrite on the external table and it fails, and it is hard to change the process within the vendor application. For our cluster setting, hive should able to impersonate all users: <property> <name>hadoop.proxyuser.hive.groups</name> <value>*</value> </property> Vendor claimed that the insert overwrite should able to create the folder if the folder does not exist: "the behavior you’ve shown with
beeline is not the expected behavior for insert/overwrite. Insert/overwrite
should create the directory and write to it, or if it already exists, should
clobber then remake it." We are not too sure about if this is true, that's why we are asking about the default behavior of insert overwrite. What are we testing is a very simple insert overwrite: INSERT OVERWRITE TABLE
`dev_tc30_dataiku`.`config_another_test_output` SELECT * FROM
`dev_tc30_dataiku`.`config_another_test_dev_demo` Thanks, John
... View more
11-21-2018
05:38 PM
Our Hortonworks version is 2.6.3.0-235, our Hive version is 1.2.1000 We have the following issue at the moment: Hive insert overwrite will fail on external table if the external table's folder does not exist. Here is the details: We have an external table "config_another_test_output". Here is the table information: +---------------------------------------------------------------------------------------------------------------------+--+ | createtab_stmt | +---------------------------------------------------------------------------------------------------------------------+--+ | CREATE EXTERNAL TABLE `config_another_test_output`( | | `stars` string, | | `desc` string, | | `pros` string, | | `cons` string, | | `advice` string, | | `summary` string) | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' | | LOCATION | | 'hdfs://titan/dev/10112/app/TC30/dataiku/CONFIG_ANOTHER_TEST/output' | | TBLPROPERTIES ( | | 'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', | | 'DSS_DEFAUT_HIVE_TABLE_TAG'='eyJtZ3QiOnRydWUsImZtdCI6IjhpeTNOYSIsInBhcnQiOiIxQjJNMlkiLCJsb2MiOiJNNzJVeHAifQ==', | | 'numFiles'='1', | | 'numRows'='621', | | 'orc.compress'='SNAPPY', | | 'rawDataSize'='1548153', | | 'totalSize'='233502', | | 'transient_lastDdlTime'='1542815783') | +---------------------------------------------------------------------------------------------------------------------+--+ In the location section, we have the HDFS path for this table. Our current problem is: if the above output folder ('hdfs://titan/dev/10112/app/TC30/dataiku/CONFIG_ANOTHER_TEST/output' ) does not exist, insert overwrite will fail: INFO: Loading data to table
dev_tc30_dataiku.config_another_test_output from
hdfs://titan/tmp/.hive-staging_hive_2018-11-21_10-45-41_452_43360044430205414-24417/-ext-10000 ERROR :
Failed with exception Destination directory
hdfs://titan/dev/10112/app/TC30/dataiku/CONFIG_ANOTHER_TEST/output has not be
cleaned up. org.apache.hadoop.hive.ql.metadata.HiveException:
Destination directory
hdfs://titan/dev/10112/app/TC30/dataiku/CONFIG_ANOTHER_TEST/output has not be
cleaned up. at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3380) at
org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1895) at
org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:373) at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:162) at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1756) at
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1497) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1294) at
org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161) at
org.apache.hadoop.hive.ql.Driver.run(Driver.java:1156) at
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:197) at
org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:76) at
org.apache.hive.service.cli.operation.SQLOperation$2$1.run(SQLOperation.java:255) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hive.service.cli.operation.SQLOperation$2.run(SQLOperation.java:266) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Error: Error
while processing statement: FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask (state=08S01,code=1) Our question is: what is the default behavior of insert overwrite if the folder does not exist? should insert overwrite create the folder if the folder does not exist? is this an issue for hive 1.2? Thanks for all the helps. John
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive