Support Questions

Find answers, ask questions, and share your expertise

What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?

I read http://druid.io/docs/latest/ingestion/command-line-hadoop-indexer.html and tried the following command:

java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhdp.version=2.6.3.0-235 -classpath /usr/hdp/current/druid-overlord/conf/_common:/usr/hdp/current/druid-overlord/lib/*:/etc/hadoop/conf io.druid.cli.Main index hadoop ./hadoop_index_spec.json

But this job fails with below:

2018-03-14T07:37:06,132 INFO [main] io.druid.indexer.JobHelper - Deleting path[/tmp/druid/mmcellh/2018-03-14T071308.731Z_55fbb15cd4d4454885d909c870837f93]
2018-03-14T07:37:06,150 ERROR [main] io.druid.cli.CliHadoopIndexer - failure!!!!
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_151]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_151]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_151]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_151]
        at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:117) [druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.cli.Main.main(Main.java:108) [druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
Caused by: io.druid.java.util.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!
        at io.druid.indexer.JobHelper.runJobs(JobHelper.java:389) ~[druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:95) ~[druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.cli.CliInternalHadoopIndexer.run(CliInternalHadoopIndexer.java:131) ~[druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.cli.Main.main(Main.java:108) ~[druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        ... 6 more

And the yarn application log shows "xxxx is not a valid DFS filename":

2018-03-14T07:31:41,369 ERROR [main] io.druid.indexer.JobHelper - Exception in retry loop
java.lang.IllegalArgumentException: Pathname /tmp/data/index/output/mmcellh/2014-02-11T10:00:00.000Z_2014-02-11T11:00:00.000Z/2018-03-14T07:13:08.731Z/0/index.zip.3 from hdfs://sandbox-hdp.hortonworks.com:8020/tmp/data/index/output/mmcellh/2014-02-11T10:00:00.000Z_2014-02-11T11:00:00.000Z/2018-03-14T07:13:08.731Z/0/index.zip.3 is not a valid DFS filename.
        at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:217) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:480) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:476) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:491) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:417) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:930) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:891) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?]
        at io.druid.indexer.JobHelper$4.push(JobHelper.java:415) [druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
...

https://github.com/druid-io/druid/pull/1121 looks similar but this should have been fixed in HDP 2.6.3.

So I'm wondering if the classpath I'm using is correct.

1 ACCEPTED SOLUTION

New Contributor

I'm running this index job via the command line using the jars as described here:

http://druid.io/docs/latest/ingestion/command-line-hadoop-indexer.html

Have determined Druid 0.12.0 has something weird going on in conjunction with the druid-parquet-extensions as the fs.defaultFs set in the conf/druid/_common/common.runtime.properties is seemingly not respected at some point (don't exactly have a ton of time to trace through their open source project). So here is what I have done as a successful workaround, hopefully this will be helpful

java -Xmx512m -Ddruid.storage.storageDirectory=hdfs://{my_namenode_ip}:{my_namename_port}/{my_segments_path} -Ddruid.storage.type=hdfs -Dfile.encoding=UTF-8 -classpath extensions/druid-parquet-extensions/*:extensions/druid-avro-extensions:extensions/druid-hdfs-storage:lib/*:conf/druid/_common:{HADOOP_PATH}{HADOOP_JAR} io.druid.cli.Main index hadoop {DRUID_INDEXER_DATA}

View solution in original post

7 REPLIES 7

Explorer

Please also share spec file - hadoop_index_spec.json and complete yarn application logs.

Thank you, @Nishant Bangarwa

I sent those by email.

New Contributor

@Hajime

Having the same problem while testing an update to 0.12.0. Ran into your thread, thought i'd share a link that is seemingly related from awhile ago..
https://groups.google.com/forum/#!topic/druid-development/8u5orNnQlwE

"Druid checks the default file system for replacing ":" with "_" and making a valid DFS file path, What is the value of fs.defaultFS set in hadoop config files ? can you try pointing this to hdfs filesystem, If its not already doing that ?"

The core-site.xml under /etc/hadoop/conf shows:

    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://sandbox-hdp.hortonworks.com:8020</value>
      <final>true</final>
    </property>

So... I guess my config is OK?

Do I need to add "druid.indexer.fork.property.druid.indexer.task.hadoopWorkingPath" in some property file and add this in the -cp?

Expert Contributor

I think your classpath is missing the HDFS module that is under extensions directory...

New Contributor

I'm running this index job via the command line using the jars as described here:

http://druid.io/docs/latest/ingestion/command-line-hadoop-indexer.html

Have determined Druid 0.12.0 has something weird going on in conjunction with the druid-parquet-extensions as the fs.defaultFs set in the conf/druid/_common/common.runtime.properties is seemingly not respected at some point (don't exactly have a ton of time to trace through their open source project). So here is what I have done as a successful workaround, hopefully this will be helpful

java -Xmx512m -Ddruid.storage.storageDirectory=hdfs://{my_namenode_ip}:{my_namename_port}/{my_segments_path} -Ddruid.storage.type=hdfs -Dfile.encoding=UTF-8 -classpath extensions/druid-parquet-extensions/*:extensions/druid-avro-extensions:extensions/druid-hdfs-storage:lib/*:conf/druid/_common:{HADOOP_PATH}{HADOOP_JAR} io.druid.cli.Main index hadoop {DRUID_INDEXER_DATA}

Thanks a lot!

java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhdp.version=2.6.3.0-235 -Ddruid.storage.storageDirectory=hdfs://`hostname -f`:8020/tmp/data/index/output -Ddruid.storage.type=hdfs -classpath /usr/hdp/current/druid-overlord/extensions/druid-hdfs-storage/*:/usr/hdp/current/druid-overlord/lib/*:/usr/hdp/current/druid-overlord/conf/_common:/etc/hadoop/conf/ io.druid.cli.Main index hadoop ./hadoop_index_spec.json

Above worked.
Mine is sandbox so using `hostname -f`.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.