Created 07-02-2019 12:00 AM
I'm trying to go through the tutorials to load the trucks and geolocation data, and then use Hive to create tables and run queries on the tables. When I tried this first on the Docker container-based approach, I could get things to run but the queries against the derived tables would take a a very long time, and usually timeout/fail.
So then -- thinking that my development laptop was the bottleneck -- I spun up a CloudFormation instance of Cloudbreak and tried to do the same steps. Unfortunately, even the Cloudbreak instance running in AWS had unacceptable query performance. I read about tuning, but I shouldn't have to do any tuning when using the tutorial.
I feel like I'm missing something -- any advice on what to do to make queries run better? I'm back to running on my local dev environment with the docker instance.