About riccardo_iacomi

riccardo_iacomi · ‎09-26-2017

Yes, it has been solved. But looking at the other comments it looks like not all the orc-related problems have been addressed and resolved.

riccardo_iacomi · ‎09-26-2017

Thanks for the info. Mine got actually solved with the upgrade, it might be related to a different reason.

riccardo_iacomi · ‎09-04-2017

Thank you, but I've already tried that. Seems like I've ran into this known issue. I will upgrade to 2.6.2 and share the result.

riccardo_iacomi · ‎09-01-2017

Hello, I am trying to evaluate hive LLAP on a Hortonworks HDP 2.6 cluster. Unfortunately, I get a java.lang.RuntimeException: ORC split generation failed when trying to execute queries: ERROR : Status: Failed ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1504166274656_0006_3_00, diagnostics=[Vertex vertex_1504166274656_0006_3_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: gprs_records initializer failed, vertex=vertex_1504166274656_0006_3_00 [Map 1], java.lang.RuntimeException: ORC split generation failed with exception: java.lang.ArrayIndexOutOfBoundsException: 5 at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1615) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1701) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:446) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:569) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 5 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1609) ... 15 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 5 at org.apache.orc.OrcFile$WriterVersion.from(OrcFile.java:145) at org.apache.orc.impl.OrcTail.getWriterVersion(OrcTail.java:73) at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:383) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:63) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:89) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1419) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1305) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2600(OrcInputFormat.java:1104) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1285) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1282) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1104) ... 4 more ] ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1504166274656_0006_3_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1504166274656_0006_3_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE] ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 INFO : org.apache.tez.common.counters.DAGCounter: INFO : AM_CPU_MILLISECONDS: 840 INFO : AM_GC_TIME_MILLIS: 23 ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1504166274656_0006_3_00, diagnostics=[Vertex vertex_1504166274656_0006_3_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: gprs_records initializer failed, vertex=vertex_1504166274656_0006_3_00 [Map 1], java.lang.RuntimeException: ORC split generation failed with exception: java.lang.ArrayIndexOutOfBoundsException: 5 at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1615) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1701) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:446) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:569) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 5 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1609) ... 15 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 5 at org.apache.orc.OrcFile$WriterVersion.from(OrcFile.java:145) at org.apache.orc.impl.OrcTail.getWriterVersion(OrcTail.java:73) at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:383) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:63) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:89) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1419) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1305) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2600(OrcInputFormat.java:1104) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1285) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1282) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1104) ... 4 more ]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1504166274656_0006_3_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1504166274656_0006_3_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 INFO : Resetting the caller context to HIVE_SSN_ID:2415846c-fb92-480f-b869-240a0b0f30ed INFO : Completed executing command(queryId=hive_20170831085623_51baf1df-5823-459e-80cf-76fa1f81789f); Time taken: 0.342 seconds Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1504166274656_0006_3_00, diagnostics=[Vertex vertex_1504166274656_0006_3_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: gprs_records initializer failed, vertex=vertex_1504166274656_0006_3_00 [Map 1], java.lang.RuntimeException: ORC split generation failed with exception: java.lang.ArrayIndexOutOfBoundsException: 5 at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1615) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1701) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:446) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:569) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 5 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1609) ... 15 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 5 at org.apache.orc.OrcFile$WriterVersion.from(OrcFile.java:145) at org.apache.orc.impl.OrcTail.getWriterVersion(OrcTail.java:73) at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:383) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:63) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:89) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1419) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1305) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2600(OrcInputFormat.java:1104) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1285) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1282) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1104) ... 4 more ]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1504166274656_0006_3_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1504166274656_0006_3_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 (state=08S01,code=2) Here's some information to complete the picture: I've enabled LLAP using the Ambari interface, installed HiveServer interactive, restarted the services. I am using Beeline version 1.2.1000.2.6.0 to connect to it. The table being queried is in ORC format. ORC files are generated by an ETL pipeline and written by Java code using ORC Core. I have no problem querying it without LLAP using HiveServer2. Any advice is really appreciated. Thank you all!

riccardo_iacomi · ‎05-03-2017

Sorry to bother again, two additional questions: from my understanding, the File(), Execute(), Script() ... are meant to execute commands cluster-wise, right? I imagine the configuration loaded using the get_config() method in the params and status_params scripts is backed up by some configuration file in the cluster. If I am right, where is it located? Thank you for the help @bhagan

riccardo_iacomi · ‎04-28-2017

Hello everybody, I am trying to add a custom service to ambari using the documentation as main resource. The service is made up by two master components implemented as java processes. I can add the service to the stack and install it via the ambari web interface using the dummy python scripts provided as example. There are some aspects not covered I would like to know more about: Is there some additional resource on how to implement the actual methods to install, start and stop the components of the service? How can I distribute the program files in the cluster? Thank you all for the help.

riccardo_iacomi · ‎04-05-2017

Thank you, I've read that documentation page, but maybe I misunderstood something. Trying to explain what I actually would like to achieve maybe makes more sense: I would like to create two separate folders on HDFS, one with storage policy All_SSD, and the other with storage policy of Hot. Files placed in these folders will be moved internally to SSDs or HDDs accordingly. When I move one file from one folder to another, will it actually involve the whole cluster, hence redistributing data, or the node will try to move its share of blocks between drives? From your answer, I assume this move operation actually does nothing on a physical level, everything on that level will be managed by the mover tool, right? In that case, the question simply applies to the mover: will it try to perform the operations locally or will it involve all the nodes?

riccardo_iacomi · ‎04-04-2017

Hello, I would like to implement tiered storage in a cluster. Suppose each node has several drives (HDDs and SSDs). If a move command is issued from a tier to another, will each node try to perform the operation "locally", moving its share of blocks between its drives, or conversely will the data be distributed again in the cluster hence consuming network capacity?

riccardo_iacomi · ‎03-28-2017

Hi @Matt Burgess, thank you for the reply. Sorry to answer this late, but I clearly missed your post. Just in case someone runs into this, I wish to link a mirror question on S.O. that can be useful.

riccardo_iacomi · ‎03-08-2017

Hi all! I am trying to develop a custom processor in NiFi which writes directly orc files into a remote hadoop cluster. In order to write them, I am using the orc core api. I have tried writing orc files on the local FS and everything is fine so far (hive, which is their "final destination", has no problem in reading them). The issue is that, while trying to create a Writer object, I get a NoClassDefFoundError on org.apache.hadoop.hdfs.DistributedFileSystem. That's the code used: Configuration conf = new Configuration(); conf.addResource(new Path(hadoopConfigurationPath+"/core-site.xml")); conf.addResource(new Path(hadoopConfigurationPath+"/hdfs-site.xml")); conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); String hdfsUri = conf.get("fs.default.name"); ... try{ writer = OrcFile.createWriter(new Path(hdfsUri+"/"+filename+".orc"), OrcFile.writerOptions(conf).setSchema(orcSchema)); } catch(IOException e){ log.error("Cannot open hdfs file. Reason: "+e.getMessage()); session.transfer(flowfile, hdfsFailure); return; } ... I've copied the hadoop-hdfs jar in the lib directory, and I tried looking runtime at the jar loaded in the classpath using ClassLoader: the jar with the required class can be seen. Including the jar in the processor dependencies does not solve the issue too. Any suggestion on how to get rid of this error is really appreciated. Thank you all!

Online	Offline
Last Visited	‎09-28-2017 05:46 AM

Member Since	‎09-27-2016 10:24 AM
Last Visited	‎09-28-2017 05:46 AM
Posts	22
Kudos received	11

Cloudera Community

Re: NiFi: unable to improve performances

Re: Hive LLAP - ORC split generation failed

Re: Hive LLAP - ORC split generation failed

Re: Hive LLAP - ORC split generation failed

Hive LLAP - ORC split generation failed

Re: Ambari - adding custom service

Ambari - adding custom service

Re: HDFS tiered storage - network usage

HDFS tiered storage - network usage

Re: NiFi - custom orc processor gives NoClassDefFo...

NiFi - custom orc processor gives NoClassDefFoundE...