Created on 02-11-2018 07:44 PM - edited 09-16-2022 05:51 AM
Hello, I have a problem with Hive and I don't know how to fix it.
With Impala I can do a select * from categories (from "Get Started" tutorial) with no problems, but I can't do it with Hive.
With Hive, first I tried show tables command correctly. It shows the log and all the tables in Results output. This is the log:
INFO : Compiling command(queryId=hive_20180212042222_7e9c76e8-1e88-43f7-92ac-f5cecb1d224d): show tables INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=hive_20180212042222_7e9c76e8-1e88-43f7-92ac-f5cecb1d224d); Time taken: 0.011 seconds INFO : Executing command(queryId=hive_20180212042222_7e9c76e8-1e88-43f7-92ac-f5cecb1d224d): show tables INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20180212042222_7e9c76e8-1e88-43f7-92ac-f5cecb1d224d); Time taken: 0.024 seconds INFO : OK
When trying to do a select * from categories the query keeps running with no end. In a few seconds it shows the log:
INFO : Compiling command(queryId=hive_20180212042727_e0dd0e0d-b229-4824-bb23-b46fd4d87d15): select * from categories INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:categories.category_id, type:int, comment:null), FieldSchema(name:categories.category_department_id, type:int, comment:null), FieldSchema(name:categories.category_name, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20180212042727_e0dd0e0d-b229-4824-bb23-b46fd4d87d15); Time taken: 0.087 seconds
But doesn't show any data from the table in Results output. It keeps running the query (in Query history it says Query running in the yellow icon) and also the time of execution keeps increasing, with no end (or maybe till 2-3h).
I restarted all the services multiple times with no luck. Even I set the VM's RAM to 12GB.
Can you give me a hand in order to fix the problem?
Thanks in advance
Created 02-15-2018 09:50 AM
I really don't understand what's going on. In case I broke some config, I have reset the VM.
Anyone can test this SQL sentence with Hive in Cloudera Quick Start VM 5.12 (VirtualBox format)?
select * from customers;
These are the steps I just have done:
In Hive's log ("Show Logs" icon in Hue) says:
INFO : Compiling command(queryId=hive_20180215091515_596862e0-3749-4b30-aa1c-4bdd4f7c76a8): select * from customers
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:customers.customer_id, type:int, comment:null), FieldSchema(name:customers.customer_fname, type:string, comment:null), FieldSchema(name:customers.customer_lname, type:string, comment:null), FieldSchema(name:customers.customer_email, type:string, comment:null), FieldSchema(name:customers.customer_password, type:string, comment:null), FieldSchema(name:customers.customer_street, type:string, comment:null), FieldSchema(name:customers.customer_city, type:string, comment:null), FieldSchema(name:customers.customer_state, type:string, comment:null), FieldSchema(name:customers.customer_zipcode, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20180215091515_596862e0-3749-4b30-aa1c-4bdd4f7c76a8); Time taken: 0.05 seconds
So, what's happening? All seems ok but Hive doesn't return any record.
Maybe this info can help to identify the problem:
The other day I executed the query It took 3:30h and gave the next error:
Error while processing statement: FAILED: Error in acquiring locks: Locks on the underlying objects cannot be acquired. retry after some time
Now, when I execute the query "SHOW LOCKS customers;" it returns "Done. 0 results." but it takes 57 seconds. Is this normal? Maybe there is some performance problem with Hive?
I tested also to change the Spark Executor Cores in Hive from 1 to 4 (in order to avoid the warning), to redeploy and restart Hive, Impala and Hue... but it didn't fix the problem.
Can anyone help me with this problem?
At least saying something like "no one use VM in Virtualbox format, use Docker/VMWare/KVM instead" or "I have no problem with Hive using Cloudera Quick Start VM 5.12 with VirtualBox format".
Thanks in advance!
Created 02-15-2018 07:14 PM
I think I fixed the problem, I don't know exactly how.
The thing is, I really have a problem with Hive's performance, the querys are running too slow.
Show tables; takes 64-67 seconds the first time I execute the sentence with Hive just started. The next times it takes less than a second. But, trying to show 16k lines from customers takes too much and when running for 3:30h it gives the lock error (it's really a timeout, there are no locks).
I have tried restarting the services according to Showing Big Data Value (previous to exercise 2): HDFS, Hive, Hue and Impala, in the correct order with no luck.
But as last option I tried restarting all Cloudera Quickstart services and now Hive works OK!
Maybe YARN, Zookeeper, Ozzie, or some other service is needed in order to Hive gain performance?
Well, at least now I can finish exercise 2... I hope Hive doesn't bring me more problems.
Regards