I am currently playing with Docker version of Hortonworks Sandbox 3.0.1. I've done some work in order to provide persistency to the ambari configuration(backuping & restoring ambari database) and I am planning to extend it further to provide hdfs data persitency. However I can't see the scripts of Hortonworks Sandbox( docker-deploy-hdp30.sh, generate-proxy-deploy-script.sh and nginx.conf) anywhere on GIthub. Is it legal to extend these scripts and publish my implementation on GitHub?
... View more
The goal is to connect PowerBI to Druid via Hive DruidStorageHandler. Druid is installed on another cluster, however the latency is very low(pings of 0.1ms). I am planning to install HDP cluster in version >= 3.0 with Hive service with LLAP. Because the cluster will only serve as proxy to druid I want it to be relatively small(ie. 5-12 nodes) 1.How much storage is required for such usecase? From what I understood hive will store only metadata about Druid OLAP cubes, so i guess the HDFS storage may contain hundreds of gigabytes rather than terabytes. 2. I guess instead of storage, the crucial resources used in such usecase are RAM and CPU?
3.Based on https://community.cloudera.com/t5/Community-Articles/LLAP-sizing-and-setup/ta-p/247425 I understand that LLAP is the only service running on node. This means that there is no Datanode service installed on these nodes and therefore we get rid of data locality? 4.What would be a standard distribution of services across 10 nodes? Based on the article above i should reserve 5 nodes for LLAP. So the remaining 5 nodes would be used for namenode, zookeeper, resource manager and datanodes. Is that correct?
... View more