I'm following this tutorial: https://www.cloudera.com/tutorials/getting-started-with-hdp-sandbox/3.html
I've installed HDP Sandbox 3.0.1 using Docker. Docker is running on a machine with 8 cores and 32Gb of RAM.
One of the queries in the tutorial is as follows:
SELECT truckid, avg(mpg) avgmpg FROM truckmileage GROUP BY truckid;
After an hour, this query has still not completed. These datasets are not very big, and this performance is so bad that the product is basically unusable. How can I fix this?
Did you ever find an answer to this? I'm stuck on this query too. I found that increasing many tuning parameters to their recommended values made performance on the steps prior to this workable, but this one is just impossible to run.
My resources: VMWare Workstation Player, 8 cores, 56GB RAM, NVMe SSD storage, real hardware is 2x8 core Sandy Bridge Xeon processors. I can't imagine what people trying to run on the default configuration are seeing, that was completely unusable and unstable.