Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive long time running "select * from categories", Impala works ok

avatar
Explorer

Hello, I have a problem with Hive and I don't know how to fix it.

 

With Impala I can do a select * from categories (from "Get Started" tutorial) with no problems, but I can't do it with Hive.

 

With Hive, first I tried show tables command correctly. It shows the log and all the tables in Results output. This is the log:

 

 

INFO  : Compiling command(queryId=hive_20180212042222_7e9c76e8-1e88-43f7-92ac-f5cecb1d224d): show tables
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20180212042222_7e9c76e8-1e88-43f7-92ac-f5cecb1d224d); Time taken: 0.011 seconds
INFO  : Executing command(queryId=hive_20180212042222_7e9c76e8-1e88-43f7-92ac-f5cecb1d224d): show tables
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20180212042222_7e9c76e8-1e88-43f7-92ac-f5cecb1d224d); Time taken: 0.024 seconds
INFO  : OK

 

When trying to do a select * from categories the query keeps running with no end. In a few seconds it shows the log:

 

INFO  : Compiling command(queryId=hive_20180212042727_e0dd0e0d-b229-4824-bb23-b46fd4d87d15): select * from categories
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:categories.category_id, type:int, comment:null), FieldSchema(name:categories.category_department_id, type:int, comment:null), FieldSchema(name:categories.category_name, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20180212042727_e0dd0e0d-b229-4824-bb23-b46fd4d87d15); Time taken: 0.087 seconds

But doesn't show any data from the table in Results output. It keeps running the query (in Query history it says Query running in the yellow icon) and also the time of execution keeps increasing, with no end (or maybe till 2-3h).

 

 

I restarted all the services multiple times with no luck. Even I set the VM's RAM to 12GB.

 

Can you give me a hand in order to fix the problem?

 

Thanks in advance

 

2 REPLIES 2

avatar
Explorer

I really don't understand what's going on. In case I broke some config, I have reset the VM.

 

Anyone can test this SQL sentence with Hive in Cloudera Quick Start VM 5.12 (VirtualBox format)?

 

select * from customers;

 

 

These are the steps I just have done:

  1. Download Cloudera Quick Start VM 5.12 from here (VirtualBox format): https://www.cloudera.com/downloads/quickstart_vms/5-12.html
  2. Configure the VM in order to have 2 CPU processes and 12GB of RAM (required to launch Cloudera Express).
  3. Launch the VM.
  4. Launch Cloudera Express with no problems.
  5. In Cloudera Manager I launch with no problem: HDFS, Hive, HUE, Impala and YARN, but they have some warnings. Hive says "1 is less than the recommended minimum of 4." regarding Spark Executor Cores in Hive and HiveServer2.
  6. In Linux terminal I launch the sqoop import-all-tables indicated in Tutorial Exercise 1.
  7. In Impala I execute invalidate metadata; and select * from customers; with no problems, and returning all the records of the table in less than 1 second.
  8. In Hive I execute select * from customers; and the query keeps running with no end (more than 30 min at least).

 

In Hive's log ("Show Logs" icon in Hue) says:

 

INFO : Compiling command(queryId=hive_20180215091515_596862e0-3749-4b30-aa1c-4bdd4f7c76a8): select * from customers
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:customers.customer_id, type:int, comment:null), FieldSchema(name:customers.customer_fname, type:string, comment:null), FieldSchema(name:customers.customer_lname, type:string, comment:null), FieldSchema(name:customers.customer_email, type:string, comment:null), FieldSchema(name:customers.customer_password, type:string, comment:null), FieldSchema(name:customers.customer_street, type:string, comment:null), FieldSchema(name:customers.customer_city, type:string, comment:null), FieldSchema(name:customers.customer_state, type:string, comment:null), FieldSchema(name:customers.customer_zipcode, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20180215091515_596862e0-3749-4b30-aa1c-4bdd4f7c76a8); Time taken: 0.05 seconds

 

 

So, what's happening? All seems ok but Hive doesn't return any record.

 

 

Maybe this info can help to identify the problem:

 

The other day I executed the query It took 3:30h and gave the next error:

 

Error while processing statement: FAILED: Error in acquiring locks: Locks on the underlying objects cannot be acquired. retry after some time

 

 

Now, when I execute the query "SHOW LOCKS customers;" it returns "Done. 0 results." but it takes 57 seconds. Is this normal? Maybe there is some performance problem with Hive?

 

I tested also to change the Spark Executor Cores in Hive from 1 to 4 (in order to avoid the warning), to redeploy and restart Hive, Impala and Hue... but it didn't fix the problem.

 

Can anyone help me with this problem?

 

At least saying something like "no one use VM in Virtualbox format, use Docker/VMWare/KVM instead" or "I have no problem with Hive using Cloudera Quick Start VM 5.12 with VirtualBox format".

 

Thanks in advance!

 

avatar
Explorer

I think I fixed the problem, I don't know exactly how.

 

The thing is, I really have a problem with Hive's performance, the querys are running too slow

 

Show tables; takes 64-67 seconds the first time I execute the sentence with Hive just started. The next times it takes less than a second. But, trying to show 16k lines from customers takes too much and when running for 3:30h it gives the lock error (it's really a timeout, there are no locks).

 

I have tried restarting the services according to Showing Big Data Value (previous to exercise 2): HDFS, Hive, Hue and Impala, in the correct order with no luck.

 

But as last option I tried restarting all Cloudera Quickstart services and now Hive works OK!

 

Maybe YARN, Zookeeper, Ozzie, or some other service is needed in order to Hive gain performance?

 

Well, at least now I can finish exercise 2... I hope Hive doesn't bring me more problems.

 

Regards