Created on
11-10-2016
03:05 PM
- last edited on
11-11-2016
06:34 AM
by
cjervis
Hi guys
Yesterday I completed upgrading our 3 node dev cluster from 5.8.0 to 5.9.0 CDH parcels using CM. Impala is now at 2.7.0, which is cool.
I am very thankful as ever to the Cloudera team for striving to keep the starving developer version alive for data wranglers like me :-)
Some quick stats of query timings. Impala versus HiveMR versus HiveSpark
Not sure why Impala is slower :-(
Machine 1 = NN + DN1
Machine 2 = DN2
Machine 3 = DN3
Each Machine = 8 cores 32GB RAM
CDH 5.9.0
Impala 2.7.0
impala-shell -q "select last_name, first_name from cdr.cdr_mjp_people where lower(last_name) like '%subramanian%'"
Fetched 1281 row(s) in 256.58s
hive -e "hive.execution.engine=mr;select last_name, first_name from cdr.cdr_mjp_people where lower(last_name) like '%subramanian%'"
Time taken: 181.024 seconds, Fetched: 1281 row(s)
hive -e "set hive.execution.engine=spark;select last_name, first_name from cdr.cdr_mjp_people where lower(last_name) like '%subramanian%'"
Time taken: 360.214 seconds, Fetched: 1281 row(s)
Thanks
Warmly
sanjay
Created 01-24-2018 05:53 AM
Could you please help me with the steps. If you have any Document, please let me know.
Thanks,