Member since
09-21-2015
85
Posts
75
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1953 | 04-21-2016 12:22 PM | |
4812 | 03-12-2016 02:19 PM | |
1706 | 10-29-2015 07:50 PM | |
2077 | 10-02-2015 04:21 PM | |
5759 | 09-29-2015 03:08 PM |
06-02-2016
06:03 PM
As far as I know Kerberos is for authentication. Not the encryption of Hive communication.
... View more
06-02-2016
05:07 PM
@Sri Bandaru- No. That's for Ambari HTTPS. I'm referring to SSL of HiveServer2 connections.
... View more
06-02-2016
02:57 PM
1 Kudo
What configuration is required in the Hive Ambari View for supporting Hive SSL?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hive
04-21-2016
12:43 PM
@Ali Bajwa A simplified approach:
On the Ambari Server: yum -y install git
git clone https://github.com/seanorama/ambari-bootstrap
cd ambari-bootstrap
export ambari_server_custom_script=${ambari_server_custom_script:-~/ambari-bootstrap/ambari-extras.sh}
export install_ambari_server=true
./ambari-bootstrap.sh Then deploy the cluster. The "extras" script above takes care of all the tedious stuff automatically (cloning Zeppelin, the blueprint defaults, the role command order, ...). yum -y install python-argparse
cd deploy
export ambari_services="HDFS MAPREDUCE2 YARN ZOOKEEPER HIVE SPARK ZEPPELIN"
bash ./deploy-recommended-cluster.bash
... View more
04-21-2016
12:22 PM
1 Kudo
The Google Cloud Storage Connector for Hadoop is configured at the cluster level without any knowledge of Kerberos. So the output you showed is what I would expect. But some thoughts: In secure environments, ideally a user can never even reach Hadoop without authentication against the Kerberos or Directory. With that assumed, you would never get the chance to run 'hadoop fs -ls ...' anyway. So lock down all access to the environment & network so only authorized users can even run the commands. It couldn't hurt to submit a feature request for a configuration option that disables 'gs' unless the user is authenticated to Hadoop. Personally I see this as a bug report, but technically it's a feature request. You would have to raise it with Google since the Connector is not currently a part of Apache Hadoop. Google maintains it separately.
Why it's not a bug: Kerberos governs communications between services, not the executions of commands. Since GS doesn't do Kerberos, it works as intended since it already has it's authentication done separately. I've not done it, but you could check if individual users/applications can pass the GCS token. If possible then you would remove it from the cluster-wide configuration and the users would be required to do this themselves. It would still not be using Kerberos but would be another layer of security. s3a://, swift://, and wasb:// support this method.
... View more
04-04-2016
03:06 PM
2 Kudos
Prerequisites:
Launch Sandbox on Azure
VM Size: Minimum of A4 or A5
A Twitter App
You'll use the API credentials
The "Application Details" don't matter
Prepare the Sandbox
Connect to SSH & Ambari
Connect to the Sandbox using SSH
or web console: http://<<ip>>:4200/
Become root:
sudo su -
Reset the Ambari password:
ambari-admin-password-reset
Login to Ambari:
http://<<ip>>:8080
User: admin
Before moving to the next steps, ensure all services on the left are started (green) or in maintenance mode (black).
Install NiFi
In Ambari, Click "Actions" (bottom left) -> Add Service
Choose NiFi and continue through the dialogs.
You shouldn't need to change anything
NiFi should now be accessible at http:<<ip>>:9090/nifi/
Tune Sandbox
The Sandbox is tuned to run on minimal hardware. We need to update the Hive, Tez & YARN configuration for our use case.
This could take up to 15 minutes to complete:
bash <(curl -sSL https://git.io/vVRPs)
Solr & Banana
Solr enables the ability to search across large corpuses of information through specialized indexing techniques.
Banana is a dashboard visualization tool for Solr.
Download the Banana Dashboard
curl -L https://git.io/vVRP3 -o /opt/hostname-hdpsearch/solr/server/solr-webapp/webapp/banana/app/dashboards/default.json
Update Solr to support Twitter's timestamp format
curl -L https://git.io/vVRPz -o /opt/hostname-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml
Start Solr
JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64 /opt/hostname-hdpsearch/solr/bin/solr start -c -z localhost:2181
Create Solr collection for tweets
/opt/hostname-hdpsearch/solr/bin/solr create -c tweets -d data_driven_schema_configs -s 1 -rf 1
... View more
03-12-2016
02:19 PM
17 Kudos
As with many topics, "it depends". For slave/worker/data hosts which only have distributed services you can likely disable swap. With distributed services it's preferred to let the process/host be killed rather than swap. The killing of that process or host shouldn't affect cluster availability. Said another way: you want to "fail fast" not to "slowly degrade". Just 1 bad process/host can greatly degrade performance of the whole cluster. For example, in a 350 host cluster removal of 2 bad nodes improved throughput by ~2x: http://www.slideshare.net/t3rmin4t0r/tez8-ui-walkthrough/23 http://pages.cs.wisc.edu/~thanhdo/pdf/talk-socc-limplock.pdf For masters, swap is also often disabled though it's not a set rule from Hortonworks and I assume there will be some discussion/disagreement. Masters can be treated somewhat like you'd treat masters in other, non-Hadoop, environments. The fear with disabling swap on masters is that an OOM (out of memory) event could affect cluster availability. But that will still happen even with swap configured, it just will take slightly longer. Good administrator/operator practices would be to monitor RAM availability, then fix any issues before running out of memory. Thus maintaining availability without affecting performance. No swap is needed then. Scenarios where you might want swap: playing/testing functionality, not performance, on hosts with very little RAM so will likely need to swap. if you have the need to use more memory, or expect to need more, than the amount of RAM which has been purchased. And can accept severe degradation in failure. In this case you would need a lot of swap configured. Your better off buying the right amount of memory. Extra thoughts: if you want to disable swap, but your organization require their to be a swap partition, set swappiness=0 if you choose to have swap, set swappiness=1 to avoid swapping until all physical memory has been used. most Cloud/Virtualization providers disable swap by default. Don't change that. some advise to avoid swap on SSDs due to reducing their lifespan
... View more
03-11-2016
08:38 PM
3 Kudos
The questions will be:
- 1. Should there be a swap partition at all (i.e. swappiness=0)?
- 2. Do recommendations vary between masters, workers or certain components?
- 3. If swappiness>=1, what should the amount be?
... View more
03-11-2016
08:36 PM
2 Kudos
David - Thanks for posting. As discussed separately, the 2xRAM recommendation is definitely out of date.
I'm working on some consensus with my team on their recommendations, and look forward to others comments coming in below.
... View more
01-12-2016
06:53 PM
1 Kudo
Mind if we convert this to an Article and update together since no answer will be correct for more than a couple months?
... View more