Created on 08-08-2018 12:07 AM - edited 09-16-2022 06:34 AM
I'm using Hive(with Yarn) that is installed by CDH-5.14.2-1, and made a database which keeps purchase history. One table which has purchase history has 1,000,000,000 tuples.
I tried the following query to measure Hive's performance.
SELECT c.gender,
g.NAME,
i.NAME,
Sum(b.num)
FROM customers c
JOIN boughts_bil b
ON ( c.id = b.cus_id
AND b.id < $var )
JOIN items i
ON ( i.id = b.item_id )
JOIN genres g
ON ( g.id = i.gen_id )
GROUP BY c.gender,
g.NAME,
i.NAME;
Incidentally, since I want to try with no optimization, I made no partitions.
When I set "$var=30,000,000", the error "Execution Error, return code 2 from org.apache.hadoop.hive.ql.exe" has occurred. In reality, I use the same query and that time it worked fine.
Cloudera's plan was Express when it was going well, but now the plan became Enterprise-only. Is it cause?
Or are there different reasons for example out of memory error.
Please give your wisdom.
Thanks.
addition
I checked HistoryServer and write like below
Diagnostics:
Application failed due to failed ApplicationMaster.
Only partial information is available; some values may be inaccurate.
I'll check the table value.