Created on 03-14-2018 08:05 AM - edited 09-16-2022 05:58 AM
I read http://druid.io/docs/latest/ingestion/command-line-hadoop-indexer.html and tried the following command:
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhdp.version=2.6.3.0-235 -classpath /usr/hdp/current/druid-overlord/conf/_common:/usr/hdp/current/druid-overlord/lib/*:/etc/hadoop/conf io.druid.cli.Main index hadoop ./hadoop_index_spec.json
But this job fails with below:
2018-03-14T07:37:06,132 INFO [main] io.druid.indexer.JobHelper - Deleting path[/tmp/druid/mmcellh/2018-03-14T071308.731Z_55fbb15cd4d4454885d909c870837f93] 2018-03-14T07:37:06,150 ERROR [main] io.druid.cli.CliHadoopIndexer - failure!!!! java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_151] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_151] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_151] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_151] at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:117) [druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235] at io.druid.cli.Main.main(Main.java:108) [druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235] Caused by: io.druid.java.util.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed! at io.druid.indexer.JobHelper.runJobs(JobHelper.java:389) ~[druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235] at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:95) ~[druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235] at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235] at io.druid.cli.CliInternalHadoopIndexer.run(CliInternalHadoopIndexer.java:131) ~[druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235] at io.druid.cli.Main.main(Main.java:108) ~[druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235] ... 6 more
And the yarn application log shows "xxxx is not a valid DFS filename":
2018-03-14T07:31:41,369 ERROR [main] io.druid.indexer.JobHelper - Exception in retry loop java.lang.IllegalArgumentException: Pathname /tmp/data/index/output/mmcellh/2014-02-11T10:00:00.000Z_2014-02-11T11:00:00.000Z/2018-03-14T07:13:08.731Z/0/index.zip.3 from hdfs://sandbox-hdp.hortonworks.com:8020/tmp/data/index/output/mmcellh/2014-02-11T10:00:00.000Z_2014-02-11T11:00:00.000Z/2018-03-14T07:13:08.731Z/0/index.zip.3 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:217) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:480) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:476) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:491) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:417) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:930) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?] at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:891) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?] at io.druid.indexer.JobHelper$4.push(JobHelper.java:415) [druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235] ...
https://github.com/druid-io/druid/pull/1121 looks similar but this should have been fixed in HDP 2.6.3.
So I'm wondering if the classpath I'm using is correct.
Created 03-21-2018 01:29 PM
I'm running this index job via the command line using the jars as described here:
http://druid.io/docs/latest/ingestion/command-line-hadoop-indexer.html
Have determined Druid 0.12.0 has something weird going on in conjunction with the druid-parquet-extensions as the fs.defaultFs set in the conf/druid/_common/common.runtime.properties is seemingly not respected at some point (don't exactly have a ton of time to trace through their open source project). So here is what I have done as a successful workaround, hopefully this will be helpful
java -Xmx512m -Ddruid.storage.storageDirectory=hdfs://{my_namenode_ip}:{my_namename_port}/{my_segments_path} -Ddruid.storage.type=hdfs -Dfile.encoding=UTF-8 -classpath extensions/druid-parquet-extensions/*:extensions/druid-avro-extensions:extensions/druid-hdfs-storage:lib/*:conf/druid/_common:{HADOOP_PATH}{HADOOP_JAR} io.druid.cli.Main index hadoop {DRUID_INDEXER_DATA}
Created 03-14-2018 02:25 PM
Please also share spec file - hadoop_index_spec.json and complete yarn application logs.
Created 03-15-2018 11:28 PM
Thank you, @Nishant Bangarwa
I sent those by email.
Created 03-16-2018 12:04 PM
Having the same problem while testing an update to 0.12.0. Ran into your thread, thought i'd share a link that is seemingly related from awhile ago..
https://groups.google.com/forum/#!topic/druid-development/8u5orNnQlwE
"Druid checks the default file system for replacing ":" with "_" and making a valid DFS file path, What is the value of fs.defaultFS set in hadoop config files ? can you try pointing this to hdfs filesystem, If its not already doing that ?"
Created 03-19-2018 02:51 AM
The core-site.xml under /etc/hadoop/conf shows:
<property> <name>fs.defaultFS</name> <value>hdfs://sandbox-hdp.hortonworks.com:8020</value> <final>true</final> </property>
So... I guess my config is OK?
Do I need to add "druid.indexer.fork.property.druid.indexer.task.hadoopWorkingPath" in some property file and add this in the -cp?
Created 03-21-2018 03:03 AM
I think your classpath is missing the HDFS module that is under extensions directory...
Created 03-21-2018 01:29 PM
I'm running this index job via the command line using the jars as described here:
http://druid.io/docs/latest/ingestion/command-line-hadoop-indexer.html
Have determined Druid 0.12.0 has something weird going on in conjunction with the druid-parquet-extensions as the fs.defaultFs set in the conf/druid/_common/common.runtime.properties is seemingly not respected at some point (don't exactly have a ton of time to trace through their open source project). So here is what I have done as a successful workaround, hopefully this will be helpful
java -Xmx512m -Ddruid.storage.storageDirectory=hdfs://{my_namenode_ip}:{my_namename_port}/{my_segments_path} -Ddruid.storage.type=hdfs -Dfile.encoding=UTF-8 -classpath extensions/druid-parquet-extensions/*:extensions/druid-avro-extensions:extensions/druid-hdfs-storage:lib/*:conf/druid/_common:{HADOOP_PATH}{HADOOP_JAR} io.druid.cli.Main index hadoop {DRUID_INDEXER_DATA}
Created 03-22-2018 03:18 AM
Thanks a lot!
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhdp.version=2.6.3.0-235 -Ddruid.storage.storageDirectory=hdfs://`hostname -f`:8020/tmp/data/index/output -Ddruid.storage.type=hdfs -classpath /usr/hdp/current/druid-overlord/extensions/druid-hdfs-storage/*:/usr/hdp/current/druid-overlord/lib/*:/usr/hdp/current/druid-overlord/conf/_common:/etc/hadoop/conf/ io.druid.cli.Main index hadoop ./hadoop_index_spec.json
Above worked.
Mine is sandbox so using `hostname -f`.