Member since
09-15-2015
457
Posts
507
Kudos Received
90
Solutions
02-14-2016
06:14 PM
Great, thanks for sharing! This might also help https://github.com/mr-jstraub/HDFSQuota/blob/master/HDFSQuota.ipynb
... View more
02-09-2016
07:59 AM
@Vladimir Zlatkin should work as well. You can add a new drive, mount it and add the new mount point to the list of HDFS directories. If you have a lot of drives or mount points that you need to change, I'd probably decommission the Datanode and re-commission it once the changes are finished. Keep in mind that the latter can cause some additional network traffic.
... View more
02-08-2016
10:33 AM
27 Kudos
This
article shows how to setup and secure a SolrCloud cluster with Kerberos and
Ranger. Furthermore it outlines some important configurations that are
necessary in order to use the combination Solr + HDFS + Kerberos.
Tested on HDP 2.3.4, Ambari 2.1.2, Ranger 0.5, Solr 5.2.1; MIT Kerberos
Pre-Requisites & Service Allocation
You
should have a running HDP cluster, including Kerberos, Ranger and HDFS.
For this
article I am going to use a 6 node (3 master + 3 worker) cluster with the
following service allocation.
Depending
on the size and use case of your Solr environment, you can either install Solr
on separate nodes (larger workloads and collections) or install them on the
same nodes as the Datanodes. For this installation I have decided to install
Solr on the 3 Datanodes.
Note: The picture above is only showing the main
services and components, there are additional clients and services installed
(Yarn, MR, Hive, ...).
Installing the SolrCloud
Solr aka
HDPSearch is part of the HDP-Utils repository (see
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_search/index.html).
Install Solr on all Datanodes
yum install lucidworks-hdpsearch
service solr start
ln -s /opt/lucidworks-hdpsearch/solr/server/logs /var/log/solr
Note: Make sure /opt/lucidworks-hdpsearch is owned by
user solr and solr is available as a service ("service solr status"
should return the Solr status)
Keytabs and Principals
In order
for Solr to authenticate itself with the kerberized cluster, it is necessary to
create a Solr and Spnego Keytab. The latter is used for authenticating HTTP requests. Its recommended to create a
keytab per host, instead of a keytab that is distributed to all hosts, e.g.
solr/myhostname@EXAMPLE.COM instead of solr@EXAMPLE.COM
The Solr
service keytab will also be used to enable Solr Collections to write to the
HDFS.
Create a Solr Service Keytab for each Solr host
kadmin.local
addprinc -randkey solr/horton04.example.com@EXAMPLE.COM
xst -k solr.service.keytab.horton04 solr/horton04.example.com@EXAMPLE.COM
addprinc -randkey solr/horton05.example.com@EXAMPLE.COM
xst -k solr.service.keytab.horton05 solr/horton05.example.com@EXAMPLE.COM
addprinc -randkey solr/horton06.example.com@EXAMPLE.COM
xst -k solr.service.keytab.horton06 solr/horton06.example.com@EXAMPLE.COM
exit
Move the Keytabs to the individual hosts (in my case =>
horton04,horton05,horton06) and save them under /etc/security/keytabs/solr.service.keytab
Create Spnego Service Keytab
To authenticate HTTP requests, it is necessary to create a Spnego Service Keytab, either by making a copy of the existing spnego-Keytab or by creating a separate solr/spnego principal + keytab. On each Solr host do the following:
cp /etc/security/keytabs/spnego.service.keytab /etc/security/keytabs/solr-spnego.service.keytab
Owner & Permissions
Make sure the Keytabs are owned by solr:hadoop and the permissions are set to 400.
chown solr:hadoop /etc/security/keytabs/solr*.keytab
chmod 400 /etc/security/keytabs/solr*.keytab
Configure Solr Cloud
Since all
Solr data will be stored in the Hadoop Filesystem, it is important to adjust
the time Solr will take to shutdown or "kill" the Solr process (whenever you execute "service solr stop/restart"). If
this setting is not adjusted, Solr will try to shutdown the Solr process and
because it takes a bit more time when using HDFS, Solr will simply kill the process
and most of the time lock the Solr Indexes of your collections. If the index of a collection is locked the following exception is shown after the startup routine
"org.apache.solr.common.SolrException: Index locked for write"
Increase the sleep time from 5 to 30 seconds in /opt/lucidworks-hdpsearch/solr/bin/solr
sed -i 's/(sleep 5)/(sleep 30)/g' /opt/lucidworks-hdpsearch/solr/bin/solr
Adjust Solr configuration: /opt/lucidworks-hdpsearch/solr/bin/solr.in.sh
SOLR_HEAP="1024m"
SOLR_HOST=`hostname -f`
ZK_HOST="horton01.example.com:2181,horton02.example.com:2181,horton03.example.com:2181/solr"
SOLR_KERB_PRINCIPAL=HTTP/${SOLR_HOST}@EXAMPLE.COM
SOLR_KERB_KEYTAB=/etc/security/keytabs/solr-spnego.service.keytab
SOLR_JAAS_FILE=/opt/lucidworks-hdpsearch/solr/bin/jaas.conf
SOLR_AUTHENTICATION_CLIENT_CONFIGURER=org.apache.solr.client.solrj.impl.Krb5HttpClientConfigurer
SOLR_AUTHENTICATION_OPTS=" -DauthenticationPlugin=org.apache.solr.security.KerberosPlugin -Djava.security.auth.login.config=${SOLR_JAAS_FILE} -Dsolr.kerberos.principal=${SOLR_KERB_PRINCIPAL} -Dsolr.kerberos.keytab=${SOLR_KERB_KEYTAB} -Dsolr.kerberos.cookie.domain=${SOLR_HOST} -Dhost=${SOLR_HOST} -Dsolr.kerberos.name.rules=DEFAULT"
Create Jaas-Configuration
Create a Jaas-Configuration file: /opt/lucidworks-hdpsearch/solr/bin/jaas.conf
Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/etc/security/keytabs/solr.service.keytab"
storeKey=true
debug=true
principal="solr/<HOSTNAME>@EXAMPLE.COM";
};
Make sure the file is owned by solr
chown solr:solr /opt/lucidworks-hdpsearch/solr/bin/jaas.conf
HDFS
Create a HDFS directory for Solr. This directory will be used for all the Solr data (indexes, etc.).
hdfs dfs -mkdir /apps/solr
hdfs dfs -chown solr /apps/solr
hdfs dfs -chmod 750 /apps/solr
Zookeeper
SolrCloud is using Zookeeper to store configurations and cluster states. Its recommended to create a separate ZNode for Solr. The following commands can be executed on one of the Solr nodes.
Initialize Zookeeper Znode for Solr:
/opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost horton01.example.com:2181,horton02.example.com.com:2181,horton03.example.com:2181 -cmd makepath /solr
The security.json file needs to in the the root folder of the Solr-Znode. This file contains configurations for the authentication and authorization provider.
/opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost horton01.example.com:2181,horton02.example.com.com:2181,horton03.example.com:2181 -cmd put /solr/security.json '{"authentication":{"class": "org.apache.solr.security.KerberosPlugin"},"authorization":{"class": "org.apache.ranger.authorization.solr.authorizer.RangerSolrAuthorizer"}}'
Install & Enable Ranger Solr-Plugin
Log into
the Ranger UI and create a Solr repository and user.
Create Ranger-Solr Repository (Access Manager -> Solr -> Add(+))
Service
Name: <clustername>_solr
Username:
amb_ranger_admin
Password:
<password> (typically this is admin)
Solr Url:
http://horton04.example.com:8983
Add Ranger-Solr User
Create a
new user called "
solr" with an arbitrary password.
This user is necessary to assign policy permissions to the Solr user
Add base policy
Creating
a new Solr repository in Ranger usually creates a base policy as well. If you dont see a policy in the Solr repository, create a Solr Base policy with the
following settings:
Policy Name: e.g. clustername
Solr Collections: *
Description: Default Policy for Service: bigdata_solr
Audit
Logging: Yes
User:
solr, amb_ranger_admin
Permissions:
all permissions + delegate admin
Install Solr-Plugin
Install and enable the Ranger
Solr Plugin on all nodes that have Solr installed.
yum -y install ranger_*-solr-plugin.x86_64
Copy Mysql-Connector-Java (optional, Audit to DB)
This is
only necessary if you want to setup Audit to DB
cp /usr/share/java/mysql-connector-java.jar /usr/hdp/2.3.4.0-3485/ranger-solr-plugin/lib
Adjust Plugin Configuration
Plugin
properties are located here:
/usr/hdp/<hdp-version>/ranger-solr-plugin/install.properties
Change the following values:
SQL_CONNECTOR_JAR=/usr/share/java/mysql-connector-java.jar
COMPONENT_INSTALL_DIR_NAME=/opt/lucidworks-hdpsearch/solr/server
POLICY_MGR_URL=http://<ranger-host>:6080
REPOSITORY_NAME=<clustername>_solr
If you want to enable Audit to DB, also change:
XAAUDIT.DB.IS_ENABLED=true
XAAUDIT.DB.FLAVOUR=MYSQL
XAAUDIT.DB.HOSTNAME=<ranger-db-host>
XAAUDIT.DB.DATABASE_NAME=ranger_audit
XAAUDIT.DB.USER_NAME=rangerlogger
XAAUDIT.DB.PASSWORD=*****************
(set this password to whatever you set when running Mysql pre-req steps for Ranger)
Enable the Plugin and (Re)start Solr
export JAVA_HOME=<path_to_jdk>
/usr/hdp/<version>/ranger-solr-plugin/enable-solr-plugin.sh
service solr restart
The
enable script will distribute some files and create sym-links in
/opt/lucidwords-hdpsearch/solr/server/solr-webapp/webapp/WEB-INF/lib
If you go
to the Ranger UI, you should be able to see whether your Solr instances are communicating with Ranger or not.
Smoke Test
Everything has been setup and the policies have been synced with the Solr nodes, its time for some smoke tests :)
To test
our installation we are going to setup a test collection with one of the sample
datasets from Solr, called
"films".
Go to the first node of your Solr Cloud (e.g.
horton04)
Create
the initial Solr Collection configuration by using the
basic_config, which is
part of every Solr installation
mkdir /opt/lucidworks-hdpsearch/solr_collections
mkdir /opt/lucidworks-hdpsearch/solr_collections/films
chown -R solr:solr /opt/lucidworks-hdpsearch/solr_collections
cp -R /opt/lucidworks-hdpsearch/solr/server/solr/configsets/basic_configs/conf /opt/lucidworks-hdpsearch/solr_collections/films
Adjust solrconfig.xml
(/opt/lucidworks-hdpsearch/solr_collections/films/conf)
1) Remove
any existing
directoryFactory-element
2) Add
new
Directory Factory for HDFS (make sure to modify the values for solr.hdfs.home and solr.hdfs.security.kerberos.principal)
<directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
<str name="solr.hdfs.home">hdfs://bigdata/apps/solr</str>
<str name="solr.hdfs.confdir">/etc/hadoop/conf</str>
<bool name="solr.hdfs.security.kerberos.enabled">true</bool>
<str name="solr.hdfs.security.kerberos.keytabfile">/etc/security/keytabs/solr.service.keytab</str>
<str name="solr.hdfs.security.kerberos.principal">solr/${host:}@EXAMPLE.COM</str>
<bool name="solr.hdfs.blockcache.enabled">true</bool>
<int name="solr.hdfs.blockcache.slab.count">1</int>
<bool name="solr.hdfs.blockcache.direct.memory.allocation">true</bool>
<int name="solr.hdfs.blockcache.blocksperbank">16384</int>
<bool name="solr.hdfs.blockcache.read.enabled">true</bool>
<bool name="solr.hdfs.blockcache.write.enabled">true</bool>
<bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
<int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
<int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>
</directoryFactory>
3) Adjust
Lock-type
Search the
lockType-element and change it to "hdfs"
<lockType>hdfs</lockType>
Adjust schema.xml
(/opt/lucidworks-hdpsearch/solr_collections/films/conf)
Add the
following field definitions in the
schema.xml file (There are already some base
field definitions, simply copy-and-paste the following 4 lines somewhere nearby).
<field name="directed_by" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="initial_release_date" type="string" indexed="true" stored="true"/>
<field name="genre" type="string" indexed="true" stored="true" multiValued="true"/>
Upload Films-configuration to Zookeeper (solr-znode)
Since
this is a SolrCloud setup, all configuration files will be stored in Zookeeper.
/opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost horton01.example.com:2181,horton02.example.com.com:2181,horton03.example.com:2181/solr -cmd upconfig -confname films -confdir /opt/lucidworks-hdpsearch/solr_collections/films/conf
Create the Films-Collection
Note:
Make sure you have a valid Kerberos ticket from the
Solr user (e.g. "kinit -kt
solr.service.keytab solr/`hostname -f`")
curl --negotiate -u : "http://horton04.example.com:8983/solr/admin/collections?action=CREATE&name=films&numShards=1"
Check available collections:
curl --negotiate -u : "http://horton04.example.com:8983/solr/admin/collections?action=LIST&wt=json"
Response
{
"responseHeader":{
"status":0,
"QTime":2
},
"collections":[
"films"
]
}
Load data into the collection
curl --negotiate -u : 'http://horton04.example.com:8983/solr/films/update/json?commit=true' --data-binary @/opt/lucidworks-hdpsearch/solr/example/films/films.json -H 'Content-type:application/json'
Select data from the Films-Collection
curl --negotiate -u : http://horton04.example.com:8983/solr/films/select?q=*
This should return the data from the
films-Collection.
Since the Solr-user is part of the base policy in Ranger, above commands should not bring up any errors or authorization issues.
Tests with new user (=> Tom)
To see whether Ranger is working or not, authenticate yourself as a different user (e.g.
Tom) and select the data from "films"
kinit tom@EXAMPLE.COM
curl --negotiate -u : http://horton04.example.com:8983/solr/films/select?q=*
This should return
"Unauthorized Request (403)"
Add Policy
Add
a new Ranger-Solr-Policy for the
films collection and authorize Tom
Query the
collection again
curl --negotiate -u : "http://horton04.example.com:8983/solr/films/select?q=*&wt=json"
Result:
{
"responseHeader":{
"status":0,
"QTime":3,
"params":{
"q":"*",
"wt":"json"
}
},
"response":{
"numFound":1100,
"start":0,
"docs":[
{
"id":"/en/45_2006",
"directed_by":[
"Gary Lennon"
],
"initial_release_date":"2006-11-30",
"genre":[
"Black comedy",
"Thriller",
"Psychological thriller",
"Indie film",
"Action Film",
"Crime Thriller",
"Crime Fiction",
"Drama"
],
"name":".45",
"_version_":1525514568271396864
},
...
...
...
Common Errors
Unauthorized Request (403)
Ranger
denied access to the specified Solr Collection. Check the Ranger audit log and Solr policies.
Authentication Required
Make sure
you have a valid kerberos ticket!
Defective Token detected
Caused by: GSSException: Defective token detected (Mechanism level: GSSHeader did not find
the right tag)
Usually
this issue surfaces during
Spnego authentication, the token supplied by the
client is not accepted by the server.
This
error occurs with Java JDK 1.8.0_40 (http://bugs.java.com/view_bug.do?bug_id=8080122)
Solution:
This bug was acknowledged and fixed by Oracle in Java JDK >= 1.8.0_60
White Page / Too many groups
Problem: When the Solr Admin
interface (
http://<solr_instance>;
:8389/solr)
is secured with Kerberos, users with too many AD groups cant access the page. Usually these users only see a white page as a result and the solr log is
showing the following message.
badMessage: java.lang.IllegalStateException: too much data after closed for
HttpChannelOverHttp@69d2b147{r=2,c=true,=COMPLETED,uri=/solr/}
HttpParser Header is too large >8192
Also see:
https://support.microsoft.com/en-us/kb/327825
https://ping.force.com/Support/PingFederate/Integrations/IWA-Kerberos-authentication-may-fail-when-user-belongs-to-many-AD-groups
Possible
solution:
Search
for the file: /opt/lucidworks-hdpsearch/solr/server/etc/jetty.xml
Increase
the
"solr.jetty.request.header.size" from 8192 to about 51200 (should
be sufficient for plenty of groups).
sed -i 's/name="solr.jetty.request.header.size" default="8192"/name="solr.jetty.request.header.size" default="51200"/g' /opt/lucidworks-hdpsearch/solr/server/etc/jetty.xml
Useful Links
https://cwiki.apache.org/confluence/display/solr/Collections+API
https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
https://cwiki.apache.org/confluence/display/RANGER/How+to+configure+Solr+Cloud+with+Kerberos+for+Ranger+0.5
Looking forward to your feedback
Jonas
... View more
Labels:
01-26-2016
09:10 PM
15 Kudos
In order to check the status and stability of your cluster it makes sense to run the service checks that are included in Ambari. Usually each Ambari Service provides its own service check, but their might be services that wont include any service check at all. To run a service check you have to select the service (e.g. HDFS) in Ambari and click "Run Service Check" in the "Actions" dropdown menu.
Service Checks can be started via the Ambari API and it is also possible to start all available service checks with a single API command. To bulk run these checks it is necessary to use the same API/method that is used to trigger a rolling restart of Datanodes (request_schedules). The "request_schedules" API starts all defined commands in the specified order, its even possible to specify a pause between the commands.
Available Service Checks:
Service Name
service_name
Command
HDFS
HDFS
HDFS_SERVICE_CHECK
YARN
YARN
YARN_SERVICE_CHECK
MapReduce2
MAPREDUCE2
MAPREDUCE2_SERVICE_CHECK
HBase
HBASE
HBASE_SERVICE_CHECK
Hive
HIVE
HIVE_SERVICE_CHECK
WebHCat
WEBHCAT
WEBHCAT_SERVICE_CHECK
Pig
PIG
PIG_SERVICE_CHECK
Falcon
FALCON
FALCON_SERVICE_CHECK
Storm
STORM
STORM_SERVICE_CHECK
Oozie
OOZIE
OOZIE_SERVICE_CHECK
ZooKeeper
ZOOKEEPER
ZOOKEEPER_QUORUM_SERVICE_CHECK
Tez
TEZ
TEZ_SERVICE_CHECK
Sqoop
SQOOP
SQOOP_SERVICE_CHECK
Ambari Metrics
AMBARI_METRICS
AMBARI_METRICS_SERVICE_CHECK
Atlas
ATLAS
ATLAS_SERVICE_CHECK
Kafka
KAFKA
KAFKA_SERVICE_CHECK
Knox
KNOX
KNOX_SERVICE_CHECK
Spark
SPARK
SPARK_SERVICE_CHECK
SmartSense
SMARTSENSE
SMARTSENSE_SERVICE_CHECK
Ranger
RANGER
RANGER_SERVICE_CHECK
Note: Make sure you replace user, password, clustername and ambari-server with the actual values
Start single service check via Ambari API (e.g. HDFS Service Check):
curl -ivk -H "X-Requested-By: ambari" -u <user>:<password> -X POST -d @payload http://<ambari-server>:8080/api/v1/clusters/<clustername>/requests
Payload:
{
"RequestInfo":{
"context":"HDFS Service Check",
"command":"HDFS_SERVICE_CHECK"
},
"Requests/resource_filters":[
{
"service_name":"HDFS"
}
]
}
Start bulk Service checks via Ambari API (e.g. HDFS, Yarn, MapReduce2 Service Checks):
curl -ivk -H "X-Requested-By: ambari" -u <user>:<password> -X POST -d @payload http://<ambari-server>:8080/api/v1/clusters/<clustername>/request_schedules
Payload:
[
{
"RequestSchedule":{
"batch":[
{
"requests":[
{
"order_id":1,
"type":"POST",
"uri":"/api/v1/clusters/<clustername>/requests",
"RequestBodyInfo":{
"RequestInfo":{
"context":"HDFS Service Check (batch 1 of 3)",
"command":"HDFS_SERVICE_CHECK"
},
"Requests/resource_filters":[
{
"service_name":"HDFS"
}
]
}
},
{
"order_id":2,
"type":"POST",
"uri":"/api/v1/clusters/<clustername>/requests",
"RequestBodyInfo":{
"RequestInfo":{
"context":"YARN Service Check (batch 2 of 3)",
"command":"YARN_SERVICE_CHECK"
},
"Requests/resource_filters":[
{
"service_name":"YARN"
}
]
}
},
{
"order_id":3,
"type":"POST",
"uri":"/api/v1/clusters/<clustername>/requests",
"RequestBodyInfo":{
"RequestInfo":{
"context":"MapReduce Service Check (batch 3 of 3)",
"command":"MAPREDUCE2_SERVICE_CHECK"
},
"Requests/resource_filters":[
{
"service_name":"MAPREDUCE2"
}
]
}
}
]
},
{
"batch_settings":{
"batch_separation_in_seconds":1,
"task_failure_tolerance":1
}
}
]
}
}
]
This is returned by the api
{
"resources" : [
{
"href" : "http://<ambari-server>:8080/api/v1/clusters/<clustername>/request_schedules/68",
"RequestSchedule" : {
"id" : 68
}
}
]
}
This is what it looks like in Ambari Payload to run all Service Checks
Please see this gist:
https://gist.github.com/mr-jstraub/0b55de318eeae6695c3f#payload-to-run-all-service-checks
... View more
Labels:
01-18-2016
12:48 PM
2 Kudos
Thanks for sharing!
... View more
12-09-2015
09:25 PM
4 Kudos
This article is a follow-up on my original article about the visualization of a cluster and its services/components ( https://community.hortonworks.com/articles/2010/visualizing-hdp-cluster-service-allocation.html). In the first part I am particularly focusing on a new feature that enables users to build and plan a cluster by using a drag-n-drop Web UI. Build a Cluster Until now visualizing a cluster and its service allocation either meant exporting the information from Ambari or writing a JSON file that outlines the details of the nodes. Planning and deploying a cluster should be easier, right? I'll introduce, Build a Cluster 🙂 This simple Web UI is based on different drag-n-drop functionalities and allows the creation of a new cluster by simply dragging Hadoop components from the elements list to the indivdual nodes. Lets go over the different features.... User Interface The UI is divided into two sections Elements & Settings and Cluster: Elements & Settings (left): Contain the available services and components of the environment (remember these can be edited by simply importing a different env.) as well as cluster settings (HDP version, cluster name, security enabled yes/no). Additionaly this section provides some action buttons to finalize the cluster and add nodes Cluster (right): This is the current cluster with all its components. Elements can be dragged from the elements list and dropped in the individual nodes. Nodes can be edited or removed. Nodes Note: The data structure of Nodes has changed in this version, one node does not have to represent a single physical machine anymore, a node in this app can now represent many physical machines that all share the same components. Adding Nodes The number of nodes is curently limited to 1000. New nodes can be added by pressing the "+ Node"-button in the elements section. Editing the hostname and cardinality Simply click on the hostname or cardinality. (Note: ever Hostname Syntax Hostnames allow some special syntax, which automatically generates multiple hostnames (only if cardinality is set > 1) #{x} => Number with trailing zeros {x} => Number without trailing zeros X => defines the start of the counter Examples: 1) datanode#{0}.example.com (Cardinality = 2) datanode1.example.com datanode2.example.com 2) datanode#{0}.example.com (Cardinality = 30) datanode01.example.com datanode02.example.com ... 3) datanode{100}.example.com (Cardinality = 20) datanode100.example.com datanode101.example.com ... 4) datanode.example.com (Cardinality = 2) datanode.example.com1 datanode.example.com2 Adding components to a Node Select a service in the Elements list, this will bring up the list of components of this service Select & Drag a component to any of the nodes Removing a component from a Node Drag the component from the node and drop it outside the node or over the "Trash" area, inside the elements section Finalize a Cluster When your cluster is finished, press the "Finalize"-button inside the elements section, this will convert the built cluster into the same data format as any exported or JSON-specified cluster. Additionaly this imports or basically transfers the new cluster to the main page "Cluster". Finalizing a cluster also regenerates the Ambari Blueprint (read more in the next section) Note: You can press the button multiple times while you're developing a new cluster 🙂 This might be helpful, e.g. if you want to see different views (service, component, list) on the Cluster-page during the development. Generating Ambari Blueprints In this section I am focusing on another new feature that will more than simplfy the creation of Ambari Blueprints. The blueprint section contains the actual blueprint (left) as well as the cluster creation template or hostgroup mapping (right). Hitting the "Copy"-button in the upper-right corner will copy the individual content to the clipboard (might not work with all browsers) Cluster-Configuration No blueprint is complete without any configuration! The Cluster-Configuration page provides the necessary functionalities to add general or host-group specific configurations to the blueprint or the cluster creation template (hostgroup mapping). I have seen plenty of blueprints that had typos within the configuration section, e.g. instead of dfs.blocksize the blueprint included a dfs.blcksize configuration. This is why the typeahead feature for the config location and name was added. Simply start typing and the app will come up with some suggestions. HDFS HA & Yarn HA configurations (automation) A nice little gimmick that has been added to this application is the automatic config generation for HDFS HA and Yarn HA clusters. Whenever the app recognizes a specified set of service components (e.g. 2 Namenodes, 3 Journalnodes, etc.) in the cluster, it will automatically generate the necessary configuration for HDFS or Yarn High Availability. Project & Setup: https://github.com/mr-jstraub/ambari-node-view I hope you enjoy these new features and find them useful. Looking forward to your feedback and feature requests 🙂
... View more
Labels:
12-08-2015
05:49 AM
Very common question, thanks for sharing!
... View more
11-23-2015
08:32 PM
5 Kudos
I recently ran into a situation where I had enabled HDFS HA and later had to change the value of dfs.nameservices. So basically during HA setup I set the value for dfs.nameservices to "MyHorton", but a couple hours later realized I should have used "MyCluster" instead. This article explains how you can change the dfs.nameservices value after HDFS HA has been enabled already. Background: What is the purpose of dfs.nameservices?
Its the logical name of your HDFS nameservice. Its important to remember that there are several configuration parameters that have a key, which includes the actual value of dfs.nameservices, e.g. dfs.namenode.rpc-address.[nameservice id].nn1 Preparation:
Put your HDFS in safemode and backup the namespace (https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfsadmin; dfsadmin -safemode enter; dfsadmin -saveNamespace); Stop Namenode service
Backup Hive Metastore (mysqldump hive > /tmp/mydir/backup_hive.sql) Change Configuration: You have to adjust the hdfs-site configuration. Change all configurations that contain the old nameservice id to the new nameservice id. In my case the new nameservice ID was "mycluster". fs.defaultFS=hdfs:://mycluster
dfs.nameservices=mycluster
dfs.namenode.shared.edits.dir=qjournal://horton03.cloud.hortonworks.com:8485;horton02.cloud.hortonworks.com:8485;horton01.cloud.hortonworks.com:8485/mycluster
dfs.client.failover.proxy.provider.mycluster=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.namenode.rpc-address.mycluster.nn2=horton02.cloud.hortonworks.com:8020
dfs.ha.namenodes.mycluster=nn1,nn2
dfs.namenode.rpc-address.mycluster.nn1=horton01.cloud.hortonworks.com:8020
dfs.namenode.http-address.mycluster.nn1=horton01.cloud.hortonworks.com:50070
dfs.namenode.http-address.mycluster.nn2=horton02.cloud.hortonworks.com:50070
dfs.namenode.https-address.mycluster.nn1=horton01.cloud.hortonworks.com:50470
dfs.namenode.https-address.mycluster.nn2=horton02.cloud.hortonworks.com:50470 Note: You can remove the configurations that include the old nameservice id (e.g. dfs.namenode.http-address.[old_nameservice_id].nn1) Reinit Journalnodes:
This is necessary because the shared edits directory includes the nameservice id. Please see, http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hadoop-ha/content/ha-nn-deploy-nn-cluster.html Change Hive FSRoot:
It might be necessary to change the Hive metadata after the above configuration changes. Check whether changes are necessary (as Hive-User): hive --service metatool -listFSRoot If you see any table that references the old nameservice id, you have to use the following commands to switch to the new nameservice id. Use the hive metatool to do a dry run (no actual change is made in this mode) of updating the table locations. hive --service metatool -updateLocation hdfs://mycluster hdfs://myhorton -dryRun If you are satisfied with the changes the metatool will make, run the command without the -dryRun option hive --service metatool -updateLocation hdfs://mycluster hdfs://myhorton Additional notes:
If you are using HBase you have to adjust additional configurations.
... View more
Labels:
11-10-2015
08:59 PM
1 Kudo
Great article. Thanks for sharing 🙂
... View more
- « Previous
-
- 1
- 2
- Next »