Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

CDH4.4.0 - LOAD INPATH fails when destination file exists

avatar
New Contributor

Hi,

 

In CDH4.3.x and CDH3 I could load the files with the same name into the same partition multiple times. The following was working fine:

1. LOAD DATA INPATH '/tmp/ht/sdp-fss-ccreporting.log' OVERWRITE INTO TABLE ccrlocal_sdp_fss_transaction_logs PARTITION (year='2013', month='10',day='11');

2. hadoop fs -put sdp-fss-ccreporting.log /tmp/ht

3. LOAD DATA INPATH '/tmp/ht/sdp-fss-ccreporting.log' OVERWRITE INTO TABLE ccrlocal_sdp_fss_transaction_logs PARTITION (year='2013', month='10',day='11');

 

In CHD4.4.0 it stopped doing it. It fails with the following message:

hive -e "LOAD DATA INPATH '/tmp/ht/sdp-fss-ccreporting.log' OVERWRITE INTO TABLE ccrlocal_sdp_fss_transaction_logs PARTITION (year='2013', month='10',day='01')"
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history file=/tmp/tm-ccr/hive_job_log_9c596186-0eac-4471-aff9-7ff6e8d3b5d3_533521785.txt
Loading data to table default.ccrlocal_sdp_fss_transaction_logs partition (year=2013, month=10, day=01)
Failed with exception Error moving: hdfs://localhost:54310/tmp/ht/sdp-fss-ccreporting.log into: hdfs://localhost:54310/user/hive/warehouse/ccrlocal_sdp_fss_transaction_logs/year=2013/month=10/day=01/sdp-fss-ccreporting.log
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

 

Here is the hive.log

 

2013-11-20 11:44:41,785 WARN  conf.Configuration (Configuration.java:loadProperty(2068)) - org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@2d8ca1e3:an attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
2013-11-20 11:44:41,795 WARN  conf.Configuration (Configuration.java:loadProperty(2068)) - org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@2d8ca1e3:an attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
2013-11-20 11:44:42,675 WARN  conf.Configuration (Configuration.java:loadProperty(2068)) - org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@61922138:an attempt to override final parameter: mapred.tasktracker.reduce.tasks.maximum;  Ignoring.
2013-11-20 11:44:42,680 WARN  conf.Configuration (Configuration.java:loadProperty(2068)) - org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@61922138:an attempt to override final parameter: mapred.tasktracker.map.tasks.maximum;  Ignoring.
2013-11-20 11:44:42,720 INFO  ql.Driver (PerfLogger.java:PerfLogBegin(88)) - <PERFLOG method=Driver.run>
2013-11-20 11:44:42,720 INFO  ql.Driver (PerfLogger.java:PerfLogBegin(88)) - <PERFLOG method=TimeToSubmit>
2013-11-20 11:44:42,720 INFO  ql.Driver (PerfLogger.java:PerfLogBegin(88)) - <PERFLOG method=compile>
2013-11-20 11:44:45,216 INFO  ql.Driver (Driver.java:compile(468)) - Semantic Analysis Completed
2013-11-20 11:44:45,234 INFO  ql.Driver (Driver.java:getSchema(265)) - Returning Hive schema: Schema(fieldSchemas:null, properties:null)
2013-11-20 11:44:45,235 INFO  ql.Driver (PerfLogger.java:PerfLogEnd(115)) - </PERFLOG method=compile start=1384947882720 end=1384947885235 duration=2515>
2013-11-20 11:44:45,235 INFO  ql.Driver (PerfLogger.java:PerfLogBegin(88)) - <PERFLOG method=Driver.execute>
2013-11-20 11:44:45,235 INFO  ql.Driver (Driver.java:execute(1099)) - Starting command: LOAD DATA INPATH '/tmp/ht/sdp-fss-ccreporting.log' OVERWRITE INTO TABLE ccrlocal_sdp_fss_transaction_logs PARTITION (year='2013', month='10',day='01')
2013-11-20 11:44:45,265 INFO  ql.Driver (PerfLogger.java:PerfLogEnd(115)) - </PERFLOG method=TimeToSubmit start=1384947882720 end=1384947885265 duration=2545>
2013-11-20 11:44:45,269 INFO  exec.Task (SessionState.java:printInfo(418)) - Loading data to table default.ccrlocal_sdp_fss_transaction_logs partition (year=2013, month=10, day=01) from hdfs://localhost:54310/tmp/ht/sdp-fss-ccreporting.log
2013-11-20 11:44:45,561 DEBUG metadata.Hive (Hive.java:renameFile(2028)) - Replacing src:hdfs://localhost:54310/tmp/ht/sdp-fss-ccreporting.log;dest: hdfs://localhost:54310/user/hive/warehouse/ccrlocal_sdp_fss_transaction_logs/year=2013/month=10/day=01/sdp-fss-ccreporting.log;Status:false
2013-11-20 11:44:45,563 ERROR exec.Task (SessionState.java:printError(427)) - Failed with exception Error moving: hdfs://localhost:54310/tmp/ht/sdp-fss-ccreporting.log into: hdfs://localhost:54310/user/hive/warehouse/ccrlocal_sdp_fss_transaction_logs/year=2013/month=10/day=01/sdp-fss-ccreporting.log
org.apache.hadoop.hive.ql.metadata.HiveException: Error moving: hdfs://localhost:54310/tmp/ht/sdp-fss-ccreporting.log into: hdfs://localhost:54310/user/hive/warehouse/ccrlocal_sdp_fss_transaction_logs/year=2013/month=10/day=01/sdp-fss-ccreporting.log
	at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2182)
	at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1189)
	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:304)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:66)
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1383)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1169)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:982)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.io.IOException: Error moving: hdfs://localhost:54310/tmp/ht/sdp-fss-ccreporting.log into: hdfs://localhost:54310/user/hive/warehouse/ccrlocal_sdp_fss_transaction_logs/year=2013/month=10/day=01/sdp-fss-ccreporting.log
	at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2176)
	... 19 more

2013-11-20 11:44:45,576 ERROR ql.Driver (SessionState.java:printError(427)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
2013-11-20 11:44:45,576 INFO  ql.Driver (PerfLogger.java:PerfLogEnd(115)) - </PERFLOG method=Driver.execute start=1384947885235 end=1384947885576 duration=341>
2013-11-20 11:44:45,577 INFO  ql.Driver (PerfLogger.java:PerfLogBegin(88)) - <PERFLOG method=releaseLocks>
2013-11-20 11:44:45,577 INFO  ql.Driver (PerfLogger.java:PerfLogEnd(115)) - </PERFLOG method=releaseLocks start=1384947885577 end=1384947885577 duration=0>
2013-11-20 11:44:45,583 INFO  ql.Driver (PerfLogger.java:PerfLogBegin(88)) - <PERFLOG method=releaseLocks>
2013-11-20 11:44:45,583 INFO  ql.Driver (PerfLogger.java:PerfLogEnd(115)) - </PERFLOG method=releaseLocks start=1384947885583 end=1384947885583 duration=0>

 

 

 

The only time when I can overwrite data into the table is when in step 3 I do not specify the file name:

1. LOAD DATA INPATH '/tmp/ht/sdp-fss-ccreporting.log' OVERWRITE INTO TABLE ccrlocal_sdp_fss_transaction_logs PARTITION (year='2013', month='10',day='11');

2. hadoop fs -put sdp-fss-ccreporting.log /tmp/ht

3. LOAD DATA INPATH '/tmp/ht' OVERWRITE INTO TABLE ccrlocal_sdp_fss_transaction_logs PARTITION (year='2013', month='10',day='11');

 

It could be a similar issuer as https://issues.apache.org/jira/browse/HIVE-3300, but I am not sure.

 

Thanks,

Alexei

 

 

 

 

 

 

 

 

Who agreed with this topic