Created 11-14-2017 03:31 PM
Hello,
We have an hadoop cluster with 3 nodes in which we had a sqoop import job that worked very well until few days ago.
The number of files of the external table in 999 files (is it a maximum number ?).
This is the import command :
sqoop import -D oraoop.locations=hadop202.mickey.int -D mapred.map.max.attempts=1 -D oraoop.import.consistent.read=false -D oraoop.timestamp.string=false --connect jdbc:oracle:thin:@//CRIDB101:1521/appli --username sqoop -password '******' --table=ATLAS_STATS_20171114 --columns=APPLICATION,USERNAME,OFFICE,STAT_TYPE,STAT_KEY,STAT_INFO,TIME_STAMP,REQUESTER,DETAIL_INFO_1,DETAIL_INFO_2,DETAIL_INFO_3,DETAIL_INFO_4,OWNER,STATS_ID,DB_NAME,PARAMS --where "sqoop = 'Z'" --hcatalog-database=crim --hcatalog-table=atlas_stats_clob --num-mappers=2 --split-by=TIME_STAMP
and this the error we get :
17/11/14 16:17:31 INFO mapreduce.Job: Job job_1510660800260_0022 failed with state FAILED due to: Job commit failed: org.apache.hive.hcatalog.common.HCatException : 2012 : Moving of data failed during commit : Could not find a unique destination path for move: file = hdfs://hadop202.mickey.int:8020/data/hive/crim.db/atlas_stats_clob/_SCRATCH0.46847233143209766/part-m-00000 , src = hdfs://hadop202.mickey.int:8020/data/hive/crim.db/atlas_stats_clob/_SCRATCH0.46847233143209766, dest = hdfs://hadop202.mickey.int:8020/data/hive/crim.db/atlas_stats_clob
Thanks for help
Created 11-27-2017 07:49 AM
External table with 999 files was the problem
Created 11-16-2017 01:31 PM
I did an export/import in another table:
export table atlas_stats_clob to '/data/hive/export/'; import table atlas_imported from '/data/hive/export/data/';
and try again with the same sqoop import option in the new table:
sqoop import-D oraoop.locations=hadop202.mickey.int-D mapred.map.max.attempts=1-D oraoop.import.consistent.read=false-D oraoop.timestamp.string=false--connect jdbc:oracle:thin:@//CRIDB101:1521/appli --username sqoop -password '******' --table=ATLAS_STATS_20171114 --columns=APPLICATION,USERNAME,OFFICE,STAT_TYPE,STAT_KEY,STAT_INFO,TIME_STAMP,REQUESTER,DETAIL_INFO_1,DETAIL_INFO_2,DETAIL_INFO_3,DETAIL_INFO_4,OWNER,STATS_ID,DB_NAME,PARAMS --where "sqoop = 'Z'" --hcatalog-database=crim --hcatalog-table=atlas_imported --num-mappers=2 --split-by=TIME_STAMP
but I have the same issue, is there a limit in the number of files for a table? :
Created 11-20-2017 10:09 AM
Hello, I think that the problem come from a threshold of 1000 written in FileOutputCommitterContainer.java :
indeed:
on one side I have this error
…17/10/21 02:12:45 INFO mapreduce.Job: Job job_1505194606915_0236 failed with state FAILED due to: Job commit failed: org.apache.hive.hcatalog.common.HCatException : 2012 : Moving of data failed during commit : Could not find a unique destination path for move: file = hdfs://vpbshadop202.mickey.int:8020/data/hive/crim.db/atlas_stats_clob/_SCRATCH0.04665097541205321/part-m-00000 , src = hdfs://vpbshadop202.mickey.int:8020/data/hive/crim.db/atlas_stats_clob/_SCRATCH0.04665097541205321, dest = hdfs://vpbshadop202.mickey.int:8020/data/hive/crim.db/atlas_stats_clob at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.getFinalPath(FileOutputCommitterContainer.java:662) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:515)…
and in FileOutputCommitterContainer.java, I can see
Could not find a unique destination path for move
when counter = maxAppendAttempts = APPEND_COUNTER_WARN_THRESHOLD = 1000
and in the other side I have:
hdfs dfs -ls /data/hive/crim.db/atlas_stats_clob/part-m* | wc -l 999
is there a way to increase this threshold ?
Created 11-27-2017 07:49 AM
External table with 999 files was the problem
Created 11-27-2017 08:17 AM
Could you try that command with this param "--create-hcatalog-table" ?
Created 02-25-2019 01:53 PM
Hi,
recently we ran into the same problem after few years of successful imports.
Did you maybe find a way to overcome the problem?