I'm using Hive(with Yarn) that is installed by CDH-5.14.2-1, and made a database which keeps purchase history. One table which has purchase history has 1,000,000,000 tuples.
I tried the following query to measure Hive's performance.
FROM customers c
JOIN boughts_bil b
ON ( c.id = b.cus_id
AND b.id < $var )
JOIN items i
ON ( i.id = b.item_id )
JOIN genres g
ON ( g.id = i.gen_id )
GROUP BY c.gender,
Incidentally, since I want to try with no optimization, I made no partitions.
When I set "$var=30,000,000", the error "Execution Error, return code 2 from org.apache.hadoop.hive.ql.exe" has occurred. In reality, I use the same query and that time it worked fine.
Cloudera's plan was Express when it was going well, but now the plan became Enterprise-only. Is it cause?
Or are there different reasons for example out of memory error.
Please give your wisdom.
I checked HistoryServer and write like below
Application failed due to failed ApplicationMaster.
Only partial information is available; some values may be inaccurate.
I'll check the table value.