I have a 3 node development cluster CDH 5.6.0 (managed by cloudera manager). This time I configured MySQL in AWS RDS as the Hive Metastore.
1NN + 1 DN
each node is 32GB RAM and 2X2TB 7200 rpm disks
Like always I have tuned the memory params in yarn. A simple SELECT query on 1 Hive tablewith 22 million records with one WHERE clause takes 15 minutes !
Can it be that the AWS RDS Hive Metastore is slowing the Hive query down ?
As a comparison I wrote Java MR code that does exactly what the query does and that ran in 1m 30s ! So something does not seem all right with Hive