Support Questions

Find answers, ask questions, and share your expertise

Hive Difference between CLI commands

avatar
Contributor

Hi there!

I'm new using hortonworks sandbox 2.5 and i wonder why HIVE shows different behavior:

hive> select * from tweets limit 2;
OK

then shows fetched rows
hive> select count(*) from tweets;
Query ID = hdfs_20161205070716_8b694554-f4fe-4aea-b4c9-507d2fc343e0Total 
jobs = 1Launching Job 1 out of 1

then shows 
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1                 FAILED      4          0        0        4      13       0
Reducer 2             KILLED      1          0        0        1       0       0
--------------------------------------------------------------------------------
VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 261.46 s   
--------------------------------------------------------------------------------


it is because the second query requires more capability?

Regards

1 ACCEPTED SOLUTION

avatar
Super Guru
@Cristian Vasquez

The first query is limiting the number of rows returned to 2. That is a quick and easy query to run. The second query is doing a count which requires processing on all of the rows of data.

  • How much memory have you given your Sandbox?
  • How many rows are there in your tweets table?
  • Look at the error messages for the job to see why the tasks failed.

View solution in original post

7 REPLIES 7

avatar
Rising Star

Simply put, the first query only hits the metastore database and doesn't launch a map reduce job.

On the other hand, the second query runs a map side mapreduce job

EDIT:

Interestingly enough, for the first query, hive makes some good decisions on how to read the data. A simple select * could essentially simply be fetching a file from hdfs like an hdfs get... simplified, but true.

avatar
Rising Star

Ahh.. actually misread your query... thought you were simply reading off a schema... @Michael Young explains it better 🙂

avatar
Contributor

Hey Ed. Thanks for answering.

I'm trying to make contact with Michael, lol.

As you can see above, my select count(*) is failling.

avatar
Super Guru
@Cristian Vasquez

The first query is limiting the number of rows returned to 2. That is a quick and easy query to run. The second query is doing a count which requires processing on all of the rows of data.

  • How much memory have you given your Sandbox?
  • How many rows are there in your tweets table?
  • Look at the error messages for the job to see why the tasks failed.

avatar
Contributor

Dead Michael, as you say, i'd love knowing how many rows does Tweets table has.

That's why i'm trying to execute a select count(*)

Regards

avatar
Super Guru

Were you able to look at the logs to see why the query is failing? I expect you should see messages related to Java HEAP space. Without the logs, it's hard to say what the problem is.

avatar
Contributor

Hi Michael, I'm still trying to figure out where to find that log (is there any folder?). I've tried finding the Query ID that Hive shows once i execute a query, but couldn't find it. Query ID = root_20161205190741_fb2a555d-1633-404d-9128-0c3696d2d56a

Until now, i've just found the following exception (through CLI) after execution fails:

Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1480931794353_0004_1_00, diagnostics=[Task failed, taskId=task_1480931794353_0004_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space

Any suggestion?

Regards and thank you Michael for helping me

10198-hive-config.png