Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?

Solved Go to solution

What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?

I read http://druid.io/docs/latest/ingestion/command-line-hadoop-indexer.html and tried the following command:

java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhdp.version=2.6.3.0-235 -classpath /usr/hdp/current/druid-overlord/conf/_common:/usr/hdp/current/druid-overlord/lib/*:/etc/hadoop/conf io.druid.cli.Main index hadoop ./hadoop_index_spec.json

But this job fails with below:

2018-03-14T07:37:06,132 INFO [main] io.druid.indexer.JobHelper - Deleting path[/tmp/druid/mmcellh/2018-03-14T071308.731Z_55fbb15cd4d4454885d909c870837f93]
2018-03-14T07:37:06,150 ERROR [main] io.druid.cli.CliHadoopIndexer - failure!!!!
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_151]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_151]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_151]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_151]
        at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:117) [druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.cli.Main.main(Main.java:108) [druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
Caused by: io.druid.java.util.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!
        at io.druid.indexer.JobHelper.runJobs(JobHelper.java:389) ~[druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:95) ~[druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.cli.CliInternalHadoopIndexer.run(CliInternalHadoopIndexer.java:131) ~[druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.cli.Main.main(Main.java:108) ~[druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        ... 6 more

And the yarn application log shows "xxxx is not a valid DFS filename":

2018-03-14T07:31:41,369 ERROR [main] io.druid.indexer.JobHelper - Exception in retry loop
java.lang.IllegalArgumentException: Pathname /tmp/data/index/output/mmcellh/2014-02-11T10:00:00.000Z_2014-02-11T11:00:00.000Z/2018-03-14T07:13:08.731Z/0/index.zip.3 from hdfs://sandbox-hdp.hortonworks.com:8020/tmp/data/index/output/mmcellh/2014-02-11T10:00:00.000Z_2014-02-11T11:00:00.000Z/2018-03-14T07:13:08.731Z/0/index.zip.3 is not a valid DFS filename.
        at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:217) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:480) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:476) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:491) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:417) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:930) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:891) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?]
        at io.druid.indexer.JobHelper$4.push(JobHelper.java:415) [druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
...

https://github.com/druid-io/druid/pull/1121 looks similar but this should have been fixed in HDP 2.6.3.

So I'm wondering if the classpath I'm using is correct.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?

New Contributor

I'm running this index job via the command line using the jars as described here:

http://druid.io/docs/latest/ingestion/command-line-hadoop-indexer.html

Have determined Druid 0.12.0 has something weird going on in conjunction with the druid-parquet-extensions as the fs.defaultFs set in the conf/druid/_common/common.runtime.properties is seemingly not respected at some point (don't exactly have a ton of time to trace through their open source project). So here is what I have done as a successful workaround, hopefully this will be helpful

java -Xmx512m -Ddruid.storage.storageDirectory=hdfs://{my_namenode_ip}:{my_namename_port}/{my_segments_path} -Ddruid.storage.type=hdfs -Dfile.encoding=UTF-8 -classpath extensions/druid-parquet-extensions/*:extensions/druid-avro-extensions:extensions/druid-hdfs-storage:lib/*:conf/druid/_common:{HADOOP_PATH}{HADOOP_JAR} io.druid.cli.Main index hadoop {DRUID_INDEXER_DATA}

7 REPLIES 7
Highlighted

Re: What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?

New Contributor

Please also share spec file - hadoop_index_spec.json and complete yarn application logs.

Re: What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?

Thank you, @Nishant Bangarwa

I sent those by email.

Re: What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?

New Contributor

@Hajime

Having the same problem while testing an update to 0.12.0. Ran into your thread, thought i'd share a link that is seemingly related from awhile ago..
https://groups.google.com/forum/#!topic/druid-development/8u5orNnQlwE

"Druid checks the default file system for replacing ":" with "_" and making a valid DFS file path, What is the value of fs.defaultFS set in hadoop config files ? can you try pointing this to hdfs filesystem, If its not already doing that ?"

Re: What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?

The core-site.xml under /etc/hadoop/conf shows:

    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://sandbox-hdp.hortonworks.com:8020</value>
      <final>true</final>
    </property>

So... I guess my config is OK?

Do I need to add "druid.indexer.fork.property.druid.indexer.task.hadoopWorkingPath" in some property file and add this in the -cp?

Re: What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?

Expert Contributor

I think your classpath is missing the HDFS module that is under extensions directory...

Re: What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?

New Contributor

I'm running this index job via the command line using the jars as described here:

http://druid.io/docs/latest/ingestion/command-line-hadoop-indexer.html

Have determined Druid 0.12.0 has something weird going on in conjunction with the druid-parquet-extensions as the fs.defaultFs set in the conf/druid/_common/common.runtime.properties is seemingly not respected at some point (don't exactly have a ton of time to trace through their open source project). So here is what I have done as a successful workaround, hopefully this will be helpful

java -Xmx512m -Ddruid.storage.storageDirectory=hdfs://{my_namenode_ip}:{my_namename_port}/{my_segments_path} -Ddruid.storage.type=hdfs -Dfile.encoding=UTF-8 -classpath extensions/druid-parquet-extensions/*:extensions/druid-avro-extensions:extensions/druid-hdfs-storage:lib/*:conf/druid/_common:{HADOOP_PATH}{HADOOP_JAR} io.druid.cli.Main index hadoop {DRUID_INDEXER_DATA}

Re: What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?

Thanks a lot!

java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhdp.version=2.6.3.0-235 -Ddruid.storage.storageDirectory=hdfs://`hostname -f`:8020/tmp/data/index/output -Ddruid.storage.type=hdfs -classpath /usr/hdp/current/druid-overlord/extensions/druid-hdfs-storage/*:/usr/hdp/current/druid-overlord/lib/*:/usr/hdp/current/druid-overlord/conf/_common:/etc/hadoop/conf/ io.druid.cli.Main index hadoop ./hadoop_index_spec.json

Above worked.
Mine is sandbox so using `hostname -f`.