The goal is to connect PowerBI to Druid via Hive DruidStorageHandler. Druid is installed on another cluster, however the latency is very low(pings of 0.1ms). I am planning to install HDP cluster in version >= 3.0 with Hive service with LLAP. Because the cluster will only serve as proxy to druid I want it to be relatively small(ie. 5-12 nodes) 1.How much storage is required for such usecase? From what I understood hive will store only metadata about Druid OLAP cubes, so i guess the HDFS storage may contain hundreds of gigabytes rather than terabytes. 2. I guess instead of storage, the crucial resources used in such usecase are RAM and CPU?
3.Based on https://community.cloudera.com/t5/Community-Articles/LLAP-sizing-and-setup/ta-p/247425 I understand that LLAP is the only service running on node. This means that there is no Datanode service installed on these nodes and therefore we get rid of data locality? 4.What would be a standard distribution of services across 10 nodes? Based on the article above i should reserve 5 nodes for LLAP. So the remaining 5 nodes would be used for namenode, zookeeper, resource manager and datanodes. Is that correct?