Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive Difference between CLI commands

Solved Go to solution
Highlighted

Hive Difference between CLI commands

New Contributor

Hi there!

I'm new using hortonworks sandbox 2.5 and i wonder why HIVE shows different behavior:

hive> select * from tweets limit 2;
OK

then shows fetched rows
hive> select count(*) from tweets;
Query ID = hdfs_20161205070716_8b694554-f4fe-4aea-b4c9-507d2fc343e0Total 
jobs = 1Launching Job 1 out of 1

then shows 
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1                 FAILED      4          0        0        4      13       0
Reducer 2             KILLED      1          0        0        1       0       0
--------------------------------------------------------------------------------
VERTICES: 00/02  [>>--------------------------] 0%    ELAPSED TIME: 261.46 s   
--------------------------------------------------------------------------------


it is because the second query requires more capability?

Regards

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Hive Difference between CLI commands

@Cristian Vasquez

The first query is limiting the number of rows returned to 2. That is a quick and easy query to run. The second query is doing a count which requires processing on all of the rows of data.

  • How much memory have you given your Sandbox?
  • How many rows are there in your tweets table?
  • Look at the error messages for the job to see why the tasks failed.
7 REPLIES 7

Re: Hive Difference between CLI commands

Contributor

Simply put, the first query only hits the metastore database and doesn't launch a map reduce job.

On the other hand, the second query runs a map side mapreduce job

EDIT:

Interestingly enough, for the first query, hive makes some good decisions on how to read the data. A simple select * could essentially simply be fetching a file from hdfs like an hdfs get... simplified, but true.

Re: Hive Difference between CLI commands

Contributor

Ahh.. actually misread your query... thought you were simply reading off a schema... @Michael Young explains it better :)

Re: Hive Difference between CLI commands

New Contributor

Hey Ed. Thanks for answering.

I'm trying to make contact with Michael, lol.

As you can see above, my select count(*) is failling.

Re: Hive Difference between CLI commands

@Cristian Vasquez

The first query is limiting the number of rows returned to 2. That is a quick and easy query to run. The second query is doing a count which requires processing on all of the rows of data.

  • How much memory have you given your Sandbox?
  • How many rows are there in your tweets table?
  • Look at the error messages for the job to see why the tasks failed.

Re: Hive Difference between CLI commands

New Contributor

Dead Michael, as you say, i'd love knowing how many rows does Tweets table has.

That's why i'm trying to execute a select count(*)

Regards

Re: Hive Difference between CLI commands

Were you able to look at the logs to see why the query is failing? I expect you should see messages related to Java HEAP space. Without the logs, it's hard to say what the problem is.

Re: Hive Difference between CLI commands

New Contributor

Hi Michael, I'm still trying to figure out where to find that log (is there any folder?). I've tried finding the Query ID that Hive shows once i execute a query, but couldn't find it. Query ID = root_20161205190741_fb2a555d-1633-404d-9128-0c3696d2d56a

Until now, i've just found the following exception (through CLI) after execution fails:

Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1480931794353_0004_1_00, diagnostics=[Task failed, taskId=task_1480931794353_0004_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space

Any suggestion?

Regards and thank you Michael for helping me

10198-hive-config.png

Don't have an account?
Coming from Hortonworks? Activate your account here