Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

select count(*) fails with tez over cassandra

avatar
Rising Star

Hello,

I have a table in cassandra, and I use the driver hive-cassandra to do selects over it. This is the table

CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC)

And I have only 3 partitions

7934-qxqo5.png

At the moment to query my table using hive like that

hive -e "select count(*) from table1;"

I got this error

Status: Failed
Vertex failed, vertexName=Map 1, 
vertexId=vertex_1474275943985_0179_1_00, diagnostics=[Task failed, 
taskId=task_1474275943985_0179_1_00_000001, diagnostics=[TaskAttempt 0 
failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: 
org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
actual length: 9223372036854775711
   at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
   at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
   at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
   at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
   at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
   at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 actual length: 9223372036854775711
   at org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
   at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
   at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
   at org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
   at org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
   at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
   at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
   at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
   at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
   at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
   at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
   at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
   ... 14 more


So far I understand that in readfields we are getting more data that we are expecting. But considering the size of the table, I dont think the data is a problem.

@Constantin Stanca has helped me trying to find the problem, I am re lauching the subjet 🙂

Another thing to add is that if I do select * it works perfectly fine with tez 🙂 . Using the engine mp, select count(*) and select * works fine also.

We are using hortonworks version 2.3.2

1 ACCEPTED SOLUTION

avatar
Super Guru

@jean rivera

I think that I finally found the reason: https://issues.apache.org/jira/browse/HIVE-14857?jql=text%20~%20%22select%20count%22

Probably the ticket you filed is a duplicate.

I know that it is not fixing your issue now, but if you find the response helpful, please vote/accept best answer.

View solution in original post

4 REPLIES 4

avatar
Contributor

Looks a bug in either Hive or Tez. Would you mind filing a jira in apache and uploading logs there?

avatar
Rising Star

avatar
Super Guru

@jean rivera

I think that I finally found the reason: https://issues.apache.org/jira/browse/HIVE-14857?jql=text%20~%20%22select%20count%22

Probably the ticket you filed is a duplicate.

I know that it is not fixing your issue now, but if you find the response helpful, please vote/accept best answer.

avatar
Rising Star

Yes, It was me who created the ticket.