Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

select count(*) fails with tez over cassandra

avatar
Rising Star

Hello,

I have a table in cassandra, and I use the driver hive-cassandra to do selects over it. This is the table

CREATE TABLE table1 ( campaign_id text, sid text, name text, ts timestamp, PRIMARY KEY (campaign_id, sid) ) WITH CLUSTERING ORDER BY (sid ASC)

And I have only 3 partitions

7934-qxqo5.png

At the moment to query my table using hive like that

hive -e "select count(*) from table1;"

I got this error

Status: Failed
Vertex failed, vertexName=Map 1, 
vertexId=vertex_1474275943985_0179_1_00, diagnostics=[Task failed, 
taskId=task_1474275943985_0179_1_00_000001, diagnostics=[TaskAttempt 0 
failed, info=[Error: Failure while running 
task:java.lang.RuntimeException: 
org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 
actual length: 9223372036854775711
   at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
   at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
   at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
   at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
   at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
   at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 actual length: 9223372036854775711
   at org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
   at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
   at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
   at org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
   at org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
   at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
   at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
   at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
   at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
   at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
   at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
   at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
   ... 14 more


So far I understand that in readfields we are getting more data that we are expecting. But considering the size of the table, I dont think the data is a problem.

@Constantin Stanca has helped me trying to find the problem, I am re lauching the subjet 🙂

Another thing to add is that if I do select * it works perfectly fine with tez 🙂 . Using the engine mp, select count(*) and select * works fine also.

We are using hortonworks version 2.3.2

1 ACCEPTED SOLUTION

avatar
Super Guru

@jean rivera

I think that I finally found the reason: https://issues.apache.org/jira/browse/HIVE-14857?jql=text%20~%20%22select%20count%22

Probably the ticket you filed is a duplicate.

I know that it is not fixing your issue now, but if you find the response helpful, please vote/accept best answer.

View solution in original post

4 REPLIES 4

avatar
New Member

Looks a bug in either Hive or Tez. Would you mind filing a jira in apache and uploading logs there?

avatar
Rising Star

avatar
Super Guru

@jean rivera

I think that I finally found the reason: https://issues.apache.org/jira/browse/HIVE-14857?jql=text%20~%20%22select%20count%22

Probably the ticket you filed is a duplicate.

I know that it is not fixing your issue now, but if you find the response helpful, please vote/accept best answer.

avatar
Rising Star

Yes, It was me who created the ticket.