What's New @ Cloudera

Find the latest Cloudera product news
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Cloudera Data Warehouse is Now More Cost Effective for BI Queries

avatar
Cloudera Employee
The Impala Virtual Warehouses within Cloudera Data Warehouse (CDW) give users an intelligent way to run fast BI queries. Coordinators are the ones that BI tools, such as HUE or Tableau, connect to. There is either one of these running, or two of them if you choose HA mode. Previously, these nodes were always running so that the BI tool connections would not break, and new queries could always be accepted. Now, CDW is able to terminate Coordinators automatically when all queries stop, and restart them once new queries arrive - all without interfering with BI tool connections.
 
Since Impala Executor nodes already had the ability to auto-suspend, this means that all of the largest (i.e. most expensive) nodes used by the Impala Virtual Warehouse now support auto-suspend, dramatically reducing your CDW cost.
 
This capability is currently released as a Tech Preview feature, so please request the Impala Coordinator Auto-Suspend entitlement from your Cloudera account team if you want to try it out. Once this is granted you will have an option to select Allow Shutdown Of Coordinator when creating your Impala Virtual Warehouse. You will then be able to select the Trigger Shutdown Delay, which is how long to keep the Coordinator(s) running after queries have stopped arriving. During runtime, once this idle time has passed, CDW will automatically stop the Coordinator(s). However, there is is still a very lightweight proxy service which is running on another existing housekeeping node. This is what now listens for incoming query requests. When this proxy receives a query it automatically starts up the Coordinator(s) again so they can do their job of query planning, cost based optimization, and orchestrating execution by the executor nodes.
 
CDW is capable of serving 1,000s of BI users within an organization, letting them run their queries at the speed of thought. End users expect the capacity to always be available, but admins only want to pay for capacity when it is actually needed. With this new level of intelligence, CDW is better able to satisfy both groups. Now that is modern data warehousing at its best.