Support Questions

cristianvlean · ‎12-07-2016

Hi there!

I'm new using hortonworks sandbox 2.5 and i wonder why HIVE shows different behavior:

hive> select * from tweets limit 2;
OK

then shows fetched rows

hive> select count(*) from tweets;
Query ID = hdfs_20161205070716_8b694554-f4fe-4aea-b4c9-507d2fc343e0Total 
jobs = 1Launching Job 1 out of 1

then shows 
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1                 FAILED      4          0        0        4      13       0
Reducer 2             KILLED      1          0        0        1       0       0
--------------------------------------------------------------------------------
VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 261.46 s   
--------------------------------------------------------------------------------

it is because the second query requires more capability?

Regards

myoung · ‎12-07-2016

@Cristian Vasquez

The first query is limiting the number of rows returned to 2. That is a quick and easy query to run. The second query is doing a count which requires processing on all of the rows of data.

How much memory have you given your Sandbox?
How many rows are there in your tweets table?
Look at the error messages for the job to see why the tasks failed.

View solution in original post

ed_gleeck · ‎12-07-2016

Simply put, the first query only hits the metastore database and doesn't launch a map reduce job.

On the other hand, the second query runs a map side mapreduce job

EDIT:

Interestingly enough, for the first query, hive makes some good decisions on how to read the data. A simple select * could essentially simply be fetching a file from hdfs like an hdfs get... simplified, but true.

ed_gleeck · ‎12-07-2016

Ahh.. actually misread your query... thought you were simply reading off a schema... @Michael Young explains it better 🙂

cristianvlean · ‎12-07-2016

Hey Ed. Thanks for answering.

I'm trying to make contact with Michael, lol.

As you can see above, my select count(*) is failling.

myoung · ‎12-07-2016

@Cristian Vasquez

The first query is limiting the number of rows returned to 2. That is a quick and easy query to run. The second query is doing a count which requires processing on all of the rows of data.

How much memory have you given your Sandbox?
How many rows are there in your tweets table?
Look at the error messages for the job to see why the tasks failed.

cristianvlean · ‎12-07-2016

Dead Michael, as you say, i'd love knowing how many rows does Tweets table has.

That's why i'm trying to execute a select count(*)

Regards

myoung · ‎12-07-2016

Were you able to look at the logs to see why the query is failing? I expect you should see messages related to Java HEAP space. Without the logs, it's hard to say what the problem is.

cristianvlean · ‎12-09-2016

Hi Michael, I'm still trying to figure out where to find that log (is there any folder?). I've tried finding the Query ID that Hive shows once i execute a query, but couldn't find it. Query ID = root_20161205190741_fb2a555d-1633-404d-9128-0c3696d2d56a

Until now, i've just found the following exception (through CLI) after execution fails:

Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1480931794353_0004_1_00, diagnostics=[Task failed, taskId=task_1480931794353_0004_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space

Any suggestion?

Regards and thank you Michael for helping me

Cloudera Community

Support Questions

Hive Difference between CLI commands