Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?

avatar

I read http://druid.io/docs/latest/ingestion/command-line-hadoop-indexer.html and tried the following command:

java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhdp.version=2.6.3.0-235 -classpath /usr/hdp/current/druid-overlord/conf/_common:/usr/hdp/current/druid-overlord/lib/*:/etc/hadoop/conf io.druid.cli.Main index hadoop ./hadoop_index_spec.json

But this job fails with below:

2018-03-14T07:37:06,132 INFO [main] io.druid.indexer.JobHelper - Deleting path[/tmp/druid/mmcellh/2018-03-14T071308.731Z_55fbb15cd4d4454885d909c870837f93]
2018-03-14T07:37:06,150 ERROR [main] io.druid.cli.CliHadoopIndexer - failure!!!!
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_151]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_151]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_151]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_151]
        at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:117) [druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.cli.Main.main(Main.java:108) [druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
Caused by: io.druid.java.util.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!
        at io.druid.indexer.JobHelper.runJobs(JobHelper.java:389) ~[druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:95) ~[druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.cli.CliInternalHadoopIndexer.run(CliInternalHadoopIndexer.java:131) ~[druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.cli.Main.main(Main.java:108) ~[druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        ... 6 more

And the yarn application log shows "xxxx is not a valid DFS filename":

2018-03-14T07:31:41,369 ERROR [main] io.druid.indexer.JobHelper - Exception in retry loop
java.lang.IllegalArgumentException: Pathname /tmp/data/index/output/mmcellh/2014-02-11T10:00:00.000Z_2014-02-11T11:00:00.000Z/2018-03-14T07:13:08.731Z/0/index.zip.3 from hdfs://sandbox-hdp.hortonworks.com:8020/tmp/data/index/output/mmcellh/2014-02-11T10:00:00.000Z_2014-02-11T11:00:00.000Z/2018-03-14T07:13:08.731Z/0/index.zip.3 is not a valid DFS filename.
        at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:217) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:480) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:476) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:491) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:417) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:930) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:891) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?]
        at io.druid.indexer.JobHelper$4.push(JobHelper.java:415) [druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
...

https://github.com/druid-io/druid/pull/1121 looks similar but this should have been fixed in HDP 2.6.3.

So I'm wondering if the classpath I'm using is correct.

1 ACCEPTED SOLUTION

avatar
New Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
7 REPLIES 7

avatar
Contributor

Please also share spec file - hadoop_index_spec.json and complete yarn application logs.

avatar

Thank you, @Nishant Bangarwa

I sent those by email.

avatar
New Contributor

@Hajime

Having the same problem while testing an update to 0.12.0. Ran into your thread, thought i'd share a link that is seemingly related from awhile ago..
https://groups.google.com/forum/#!topic/druid-development/8u5orNnQlwE

"Druid checks the default file system for replacing ":" with "_" and making a valid DFS file path, What is the value of fs.defaultFS set in hadoop config files ? can you try pointing this to hdfs filesystem, If its not already doing that ?"

avatar

The core-site.xml under /etc/hadoop/conf shows:

    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://sandbox-hdp.hortonworks.com:8020</value>
      <final>true</final>
    </property>

So... I guess my config is OK?

Do I need to add "druid.indexer.fork.property.druid.indexer.task.hadoopWorkingPath" in some property file and add this in the -cp?

avatar
Expert Contributor

I think your classpath is missing the HDFS module that is under extensions directory...

avatar
New Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

Thanks a lot!

java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhdp.version=2.6.3.0-235 -Ddruid.storage.storageDirectory=hdfs://`hostname -f`:8020/tmp/data/index/output -Ddruid.storage.type=hdfs -classpath /usr/hdp/current/druid-overlord/extensions/druid-hdfs-storage/*:/usr/hdp/current/druid-overlord/lib/*:/usr/hdp/current/druid-overlord/conf/_common:/etc/hadoop/conf/ io.druid.cli.Main index hadoop ./hadoop_index_spec.json

Above worked.
Mine is sandbox so using `hostname -f`.