- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive Difference between CLI commands
- Labels:
-
Apache Hive
Created 12-07-2016 01:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there!
I'm new using hortonworks sandbox 2.5 and i wonder why HIVE shows different behavior:
hive> select * from tweets limit 2; OK then shows fetched rows
hive> select count(*) from tweets; Query ID = hdfs_20161205070716_8b694554-f4fe-4aea-b4c9-507d2fc343e0Total jobs = 1Launching Job 1 out of 1 then shows -------------------------------------------------------------------------------- VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -------------------------------------------------------------------------------- Map 1 FAILED 4 0 0 4 13 0 Reducer 2 KILLED 1 0 0 1 0 0 -------------------------------------------------------------------------------- VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 261.46 s --------------------------------------------------------------------------------
it is because the second query requires more capability?
Regards
Created 12-07-2016 01:32 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The first query is limiting the number of rows returned to 2. That is a quick and easy query to run. The second query is doing a count which requires processing on all of the rows of data.
- How much memory have you given your Sandbox?
- How many rows are there in your tweets table?
- Look at the error messages for the job to see why the tasks failed.
Created 12-07-2016 01:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Simply put, the first query only hits the metastore database and doesn't launch a map reduce job.
On the other hand, the second query runs a map side mapreduce job
EDIT:
Interestingly enough, for the first query, hive makes some good decisions on how to read the data. A simple select * could essentially simply be fetching a file from hdfs like an hdfs get... simplified, but true.
Created 12-07-2016 01:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ahh.. actually misread your query... thought you were simply reading off a schema... @Michael Young explains it better 🙂
Created 12-07-2016 02:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Ed. Thanks for answering.
I'm trying to make contact with Michael, lol.
As you can see above, my select count(*) is failling.
Created 12-07-2016 01:32 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The first query is limiting the number of rows returned to 2. That is a quick and easy query to run. The second query is doing a count which requires processing on all of the rows of data.
- How much memory have you given your Sandbox?
- How many rows are there in your tweets table?
- Look at the error messages for the job to see why the tasks failed.
Created 12-07-2016 02:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dead Michael, as you say, i'd love knowing how many rows does Tweets table has.
That's why i'm trying to execute a select count(*)
Regards
Created 12-07-2016 05:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Were you able to look at the logs to see why the query is failing? I expect you should see messages related to Java HEAP space. Without the logs, it's hard to say what the problem is.
Created on 12-09-2016 03:27 PM - edited 08-19-2019 01:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Michael, I'm still trying to figure out where to find that log (is there any folder?). I've tried finding the Query ID that Hive shows once i execute a query, but couldn't find it. Query ID = root_20161205190741_fb2a555d-1633-404d-9128-0c3696d2d56a
Until now, i've just found the following exception (through CLI) after execution fails:
Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1480931794353_0004_1_00, diagnostics=[Task failed, taskId=task_1480931794353_0004_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
Any suggestion?
Regards and thank you Michael for helping me