<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186651#M148753</link>
    <description>&lt;P&gt;Thanks a lot!&lt;/P&gt;&lt;PRE&gt;java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhdp.version=2.6.3.0-235 -Ddruid.storage.storageDirectory=hdfs://`hostname -f`:8020/tmp/data/index/output -Ddruid.storage.type=hdfs -classpath /usr/hdp/current/druid-overlord/extensions/druid-hdfs-storage/*:/usr/hdp/current/druid-overlord/lib/*:/usr/hdp/current/druid-overlord/conf/_common:/etc/hadoop/conf/ io.druid.cli.Main index hadoop ./hadoop_index_spec.json&lt;BR /&gt;&lt;/PRE&gt;&lt;P&gt;Above worked. &lt;BR /&gt;Mine is sandbox so using `hostname -f`.&lt;/P&gt;</description>
    <pubDate>Thu, 22 Mar 2018 10:18:58 GMT</pubDate>
    <dc:creator>hosako</dc:creator>
    <dc:date>2018-03-22T10:18:58Z</dc:date>
    <item>
      <title>What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186644#M148746</link>
      <description>&lt;P&gt;I read &lt;A href="http://druid.io/docs/latest/ingestion/command-line-hadoop-indexer.html" target="_blank"&gt;http://druid.io/docs/latest/ingestion/command-line-hadoop-indexer.html&lt;/A&gt; and tried the following command:&lt;/P&gt;&lt;P&gt;java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhdp.version=2.6.3.0-235 -classpath /usr/hdp/current/druid-overlord/conf/_common:/usr/hdp/current/druid-overlord/lib/*:/etc/hadoop/conf io.druid.cli.Main index hadoop ./hadoop_index_spec.json&lt;/P&gt;&lt;P&gt;But this job fails with below:&lt;/P&gt;&lt;PRE&gt;2018-03-14T07:37:06,132 INFO [main] io.druid.indexer.JobHelper - Deleting path[/tmp/druid/mmcellh/2018-03-14T071308.731Z_55fbb15cd4d4454885d909c870837f93]
2018-03-14T07:37:06,150 ERROR [main] io.druid.cli.CliHadoopIndexer - failure!!!!
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_151]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_151]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_151]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_151]
        at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:117) [druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.cli.Main.main(Main.java:108) [druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
Caused by: io.druid.java.util.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!
        at io.druid.indexer.JobHelper.runJobs(JobHelper.java:389) ~[druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:95) ~[druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.cli.CliInternalHadoopIndexer.run(CliInternalHadoopIndexer.java:131) ~[druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        at io.druid.cli.Main.main(Main.java:108) ~[druid-services-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
        ... 6 more
&lt;/PRE&gt;&lt;P&gt;And the yarn application log shows "xxxx is not a valid DFS filename":&lt;/P&gt;&lt;PRE&gt;2018-03-14T07:31:41,369 ERROR [main] io.druid.indexer.JobHelper - Exception in retry loop
java.lang.IllegalArgumentException: Pathname /tmp/data/index/output/mmcellh/2014-02-11T10:00:00.000Z_2014-02-11T11:00:00.000Z/2018-03-14T07:13:08.731Z/0/index.zip.3 from hdfs://sandbox-hdp.hortonworks.com:8020/tmp/data/index/output/mmcellh/2014-02-11T10:00:00.000Z_2014-02-11T11:00:00.000Z/2018-03-14T07:13:08.731Z/0/index.zip.3 is not a valid DFS filename.
        at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:217) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:480) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:476) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:491) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:417) ~[hadoop-hdfs-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:930) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?]
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:891) ~[hadoop-common-2.7.3.2.6.3.0-235.jar:?]
        at io.druid.indexer.JobHelper$4.push(JobHelper.java:415) [druid-indexing-hadoop-0.10.1.2.6.3.0-235.jar:0.10.1.2.6.3.0-235]
...&lt;/PRE&gt;&lt;P&gt;&lt;A href="https://github.com/druid-io/druid/pull/1121" target="_blank"&gt;https://github.com/druid-io/druid/pull/1121&lt;/A&gt; looks similar but this should have been fixed in HDP 2.6.3.&lt;/P&gt;&lt;P&gt;So I'm wondering if the classpath I'm using is correct.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 12:58:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186644#M148746</guid>
      <dc:creator>hosako</dc:creator>
      <dc:date>2022-09-16T12:58:17Z</dc:date>
    </item>
    <item>
      <title>Re: What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186645#M148747</link>
      <description>&lt;P&gt;Please also share spec file -  hadoop_index_spec.json and complete yarn application logs.&lt;/P&gt;</description>
      <pubDate>Wed, 14 Mar 2018 21:25:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186645#M148747</guid>
      <dc:creator>nbangarwa</dc:creator>
      <dc:date>2018-03-14T21:25:38Z</dc:date>
    </item>
    <item>
      <title>Re: What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186646#M148748</link>
      <description>&lt;P&gt;Thank you, &lt;A rel="user" href="https://community.cloudera.com/users/10777/nbangarwa.html" nodeid="10777"&gt;@Nishant Bangarwa&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I sent those by email.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Mar 2018 06:28:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186646#M148748</guid>
      <dc:creator>hosako</dc:creator>
      <dc:date>2018-03-16T06:28:47Z</dc:date>
    </item>
    <item>
      <title>Re: What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186647#M148749</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/206/hosako.html" nodeid="206"&gt;@Hajime&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Having the same problem while testing an update to 0.12.0. Ran into your thread, thought i'd share a link that is seemingly related from awhile ago..&lt;BR /&gt;&lt;A href="https://groups.google.com/forum/#!topic/druid-development/8u5orNnQlwE" target="_blank"&gt;https://groups.google.com/forum/#!topic/druid-development/8u5orNnQlwE&lt;/A&gt;&lt;/P&gt;&lt;P&gt;"Druid checks the default file system for replacing ":" with "_" and making a valid DFS file path,
What is the value of fs.defaultFS set in hadoop config files ? 
can you try pointing this to hdfs filesystem, If its not already doing that ?"&lt;/P&gt;</description>
      <pubDate>Fri, 16 Mar 2018 19:04:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186647#M148749</guid>
      <dc:creator>northjetty</dc:creator>
      <dc:date>2018-03-16T19:04:14Z</dc:date>
    </item>
    <item>
      <title>Re: What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186648#M148750</link>
      <description>&lt;P&gt;The core-site.xml under /etc/hadoop/conf shows:&lt;/P&gt;&lt;PRE&gt;    &amp;lt;property&amp;gt;
      &amp;lt;name&amp;gt;fs.defaultFS&amp;lt;/name&amp;gt;
      &amp;lt;value&amp;gt;hdfs://sandbox-hdp.hortonworks.com:8020&amp;lt;/value&amp;gt;
      &amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
    &amp;lt;/property&amp;gt;&lt;BR /&gt;&lt;/PRE&gt;&lt;P&gt;So... I guess my config is OK?&lt;/P&gt;&lt;P&gt;Do I need to add "druid.indexer.fork.property.druid.indexer.task.hadoopWorkingPath" in some property file and add this in the -cp?&lt;/P&gt;</description>
      <pubDate>Mon, 19 Mar 2018 09:51:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186648#M148750</guid>
      <dc:creator>hosako</dc:creator>
      <dc:date>2018-03-19T09:51:58Z</dc:date>
    </item>
    <item>
      <title>Re: What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186649#M148751</link>
      <description>&lt;P&gt;I think your classpath is missing the HDFS module that is under extensions directory... &lt;/P&gt;</description>
      <pubDate>Wed, 21 Mar 2018 10:03:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186649#M148751</guid>
      <dc:creator>sbouguerra</dc:creator>
      <dc:date>2018-03-21T10:03:09Z</dc:date>
    </item>
    <item>
      <title>Re: What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186650#M148752</link>
      <description>&lt;P&gt;I'm running this index job via the command line using the jars as described here:&lt;/P&gt;&lt;P&gt;&lt;A href="http://druid.io/docs/latest/ingestion/command-line-hadoop-indexer.html" target="_blank"&gt;http://druid.io/docs/latest/ingestion/command-line-hadoop-indexer.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Have determined Druid 0.12.0 has something weird going on in conjunction with the druid-parquet-extensions as the fs.defaultFs set in the conf/druid/_common/common.runtime.properties is seemingly not respected at some point (don't exactly have a ton of time to trace through their open source project). So here is what I have done as a successful workaround, hopefully this will be helpful&lt;/P&gt;&lt;P&gt;java -Xmx512m -Ddruid.storage.storageDirectory=hdfs://{my_namenode_ip}:{my_namename_port}/{my_segments_path} -Ddruid.storage.type=hdfs -Dfile.encoding=UTF-8 -classpath extensions/druid-parquet-extensions/*:extensions/druid-avro-extensions:extensions/druid-hdfs-storage:lib/*:conf/druid/_common:{HADOOP_PATH}{HADOOP_JAR} io.druid.cli.Main index hadoop {DRUID_INDEXER_DATA}&lt;/P&gt;</description>
      <pubDate>Wed, 21 Mar 2018 20:29:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186650#M148752</guid>
      <dc:creator>northjetty</dc:creator>
      <dc:date>2018-03-21T20:29:24Z</dc:date>
    </item>
    <item>
      <title>Re: What would be the right command to start Druid Hadoop Indexer for HDP 2.6.3?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186651#M148753</link>
      <description>&lt;P&gt;Thanks a lot!&lt;/P&gt;&lt;PRE&gt;java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Dhdp.version=2.6.3.0-235 -Ddruid.storage.storageDirectory=hdfs://`hostname -f`:8020/tmp/data/index/output -Ddruid.storage.type=hdfs -classpath /usr/hdp/current/druid-overlord/extensions/druid-hdfs-storage/*:/usr/hdp/current/druid-overlord/lib/*:/usr/hdp/current/druid-overlord/conf/_common:/etc/hadoop/conf/ io.druid.cli.Main index hadoop ./hadoop_index_spec.json&lt;BR /&gt;&lt;/PRE&gt;&lt;P&gt;Above worked. &lt;BR /&gt;Mine is sandbox so using `hostname -f`.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Mar 2018 10:18:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/What-would-be-the-right-command-to-start-Druid-Hadoop/m-p/186651#M148753</guid>
      <dc:creator>hosako</dc:creator>
      <dc:date>2018-03-22T10:18:58Z</dc:date>
    </item>
  </channel>
</rss>

