Member since
09-15-2015
457
Posts
507
Kudos Received
90
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 18188 | 11-01-2016 08:16 AM | |
| 7632 | 11-17-2015 06:35 PM |
11-23-2015
08:32 PM
5 Kudos
I recently ran into a situation where I had enabled HDFS HA and later had to change the value of dfs.nameservices. So basically during HA setup I set the value for dfs.nameservices to "MyHorton", but a couple hours later realized I should have used "MyCluster" instead. This article explains how you can change the dfs.nameservices value after HDFS HA has been enabled already. Background: What is the purpose of dfs.nameservices?
Its the logical name of your HDFS nameservice. Its important to remember that there are several configuration parameters that have a key, which includes the actual value of dfs.nameservices, e.g. dfs.namenode.rpc-address.[nameservice id].nn1 Preparation:
Put your HDFS in safemode and backup the namespace (https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfsadmin; dfsadmin -safemode enter; dfsadmin -saveNamespace); Stop Namenode service
Backup Hive Metastore (mysqldump hive > /tmp/mydir/backup_hive.sql) Change Configuration: You have to adjust the hdfs-site configuration. Change all configurations that contain the old nameservice id to the new nameservice id. In my case the new nameservice ID was "mycluster". fs.defaultFS=hdfs:://mycluster
dfs.nameservices=mycluster
dfs.namenode.shared.edits.dir=qjournal://horton03.cloud.hortonworks.com:8485;horton02.cloud.hortonworks.com:8485;horton01.cloud.hortonworks.com:8485/mycluster
dfs.client.failover.proxy.provider.mycluster=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.namenode.rpc-address.mycluster.nn2=horton02.cloud.hortonworks.com:8020
dfs.ha.namenodes.mycluster=nn1,nn2
dfs.namenode.rpc-address.mycluster.nn1=horton01.cloud.hortonworks.com:8020
dfs.namenode.http-address.mycluster.nn1=horton01.cloud.hortonworks.com:50070
dfs.namenode.http-address.mycluster.nn2=horton02.cloud.hortonworks.com:50070
dfs.namenode.https-address.mycluster.nn1=horton01.cloud.hortonworks.com:50470
dfs.namenode.https-address.mycluster.nn2=horton02.cloud.hortonworks.com:50470 Note: You can remove the configurations that include the old nameservice id (e.g. dfs.namenode.http-address.[old_nameservice_id].nn1) Reinit Journalnodes:
This is necessary because the shared edits directory includes the nameservice id. Please see, http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hadoop-ha/content/ha-nn-deploy-nn-cluster.html Change Hive FSRoot:
It might be necessary to change the Hive metadata after the above configuration changes. Check whether changes are necessary (as Hive-User): hive --service metatool -listFSRoot If you see any table that references the old nameservice id, you have to use the following commands to switch to the new nameservice id. Use the hive metatool to do a dry run (no actual change is made in this mode) of updating the table locations. hive --service metatool -updateLocation hdfs://mycluster hdfs://myhorton -dryRun If you are satisfied with the changes the metatool will make, run the command without the -dryRun option hive --service metatool -updateLocation hdfs://mycluster hdfs://myhorton Additional notes:
If you are using HBase you have to adjust additional configurations.
... View more
Labels:
11-17-2015
06:35 PM
3 Kudos
As far as I know, this is currently not possible, not sure why this feature was not pushed in the last couple years. Maybe multi-tenancy wasn't really an issue. I dont think anyone is working on HDFS-199 at the moment. I have seen a couple requests in our internal Jira regarding this, if you open a new feature enhancement with our support team, we might be able to get the ball rolling again. Your workaround looks good, I'd keep it for now.
... View more
11-10-2015
08:59 PM
1 Kudo
Great article. Thanks for sharing 🙂
... View more
11-06-2015
08:32 AM
Thanks for the article! Have you tested the visualization with bigger datasets as well? I am curious how the UI works with bigger datasets or queries that need some time to calculate.
... View more
11-02-2015
12:04 PM
Thanks! Sure we could post it in our blog.
... View more
10-30-2015
01:26 PM
He did use the SolrCloud mode for the PutSolrContentStream. I used SolrStandalone, so it should work either way 🙂
... View more
10-29-2015
08:21 AM
Awesome tutorial, Thanks for sharing 🙂
... View more
10-23-2015
03:54 PM
14 Kudos
One task everybody faces when setting up a new Hadoop cluster, is the allocation of services. Administrators of a cluster on the other hand might ask themselves, how are my services allocated? I have discussed the visualization of HDP clusters and services more often recently and therefore decided to share my application to visualize the current and future state of a cluster. (see link to hosted app at the end of the article) What does Service Allocation mean? Planning a Hadoop cluster involves many steps and tasks that need to be considered. Almost no setup is the same (although there are some similarities). The service allocation is the part that basically tells you what services and what components will be on which node and how many nodes you have or need. This can be quiet tedious and difficult, since not all services play along well with each other, every service has different hardware/setup requirements and adding many services can get confusing. This makes it even more important to have a sound overview of your service allocation. To plan, document and visualize the service allocation or complete Hadoop cluster I have used paper sketches, Excel sheets, text files, Powerpoints, Photoshop and other tools. However these approaches are often time consuming, hard to edit/re-use and in general not the best option. In need for a proper tool I have created this rather small and simple Angular application (at least it was at the beginning), which basically visualizes a cluster by using a simple JSON document as input source (see below). There are three ways to create a cluster visualization: Export a live cluster via Ambari's API Create a cluster by writing a JSON document as seen below Build a new cluster with the latest drag-n-drop build feature Lets say we have a cluster with: 2 Masternodes, 1 Datanode and a couple of different services. The cluster is defined as: { "stack_version":"HDP-2.2", "security_type":"KERBEROS", "name":"bigdata", "hosts_info":[ { "host_name":"c4068.ambari.apache.org", "components":[ "NAMENODE", "RESOURCEMANAGER", "APP_TIMELINE_SERVER", "HISTORYSERVER", "TEZ_CLIENT", "YARN_CLIENT", "HDFS_CLIENT", "HIVE_CLIENT", "MAPREDUCE2_CLIENT" ] }, { "host_name":"c4069.ambari.apache.org", "components":[ "SECONDARY_NAMENODE", "HIVE_METASTORE", "HIVE_SERVER", "HCAT", "WEBHCAT_SERVER", "TEZ_CLIENT", "YARN_CLIENT", "HDFS_CLIENT", "HIVE_CLIENT", "MAPREDUCE2_CLIENT" ] }, { "host_name":"c4070.ambari.apache.org", "components":[ "DATANODE", "NODEMANAGER", "TEZ_CLIENT", "YARN_CLIENT", "HDFS_CLIENT", "HIVE_CLIENT", "MAPREDUCE2_CLIENT" ] } ] } As soon as as the cluster is imported, you can choose between three views: Design flexibility through Environments Environments are basically exportable stack templates that contain information about available services and components as well as their groups and colors. In order to customize the visualization configuration (colors, sorting,...), you can edit the services and components within the application or the exported Environment (JSON). This makes it possible to use different output formats for specific clusters, departments, companies and so on by simply importing the environment when a cluster is imported. Why you might find this app useful: Planning new cluster Easy Ambari Blueprint generation Visualize cluster for concept or documentation Quick overview of a cluster (e.g. for support, sysadmins, ...) Consistent visualization/documentation ... If more people are interested in this project, I will add new features. For example: Filter by node groups (type of node or service or any custom group) Group nodes (Master, Worker, Edge, ...) Implement as Ambari View (?) ... I hope some might find this tool useful. Looking forward to your feedback 🙂 You can find more screenshots here: https://github.com/mr-jstraub/ambari-node-view/tree/master/screens Project & Setup: https://github.com/mr-jstraub/ambari-node-view The above article mainly focused on version 0.3, since then a new version has been released with exciting new features. Read more in the next section below Export, Build, Visualize and Deploy - What's new in v0.4 Since the above article has been publised in October, a lot of changes have been made and the web application has been heavily extended. In this short paragraph, I want to touch rather quickly on the latest enhancements, more details will follow in an additional article. Whats New? The nodes and its services/components have been completely redesigned/restructured Added an option to switch between fullnames and acronyms (e.g. Namenode and NN) New data structure for nodes. Nodes can have multiple hostnames now; this is a major change, since it reduces the data footprint immensly and allows the creation of simpler cluster templates Build a Cluster! - A drag-n-drop based user interface to build a cluster Blueprint Generator ! - Generate Ambari Blueprints directly from imported or built clusters Build a Cluster - New - This is definitely one of my favorite features. Instead of writing JSON templates, to plan and visualize a cluster, or exporting an existing cluster (although this is the easiest way) it is now possible to build a new cluster by using drag-n-drop. The tool supports up to 1000 Nodes, dynamic hostnames, HDFS & Yarn HA, .... Blueprints (Beta) - New - Generate Ambari Blueprints directly from imported or built clusters. General and Hostgroup-specific configurations can be added manually. More than one thousand suggested configuration parameters and categories. Read more in this article about Blueprints and "Build a Cluster"
... View more
Labels:
10-14-2015
06:35 PM
Hi @Chakra do you know if SAP Hana or SDA support kerberized connections to/from hadoop?
... View more
10-14-2015
04:35 PM
Hi @Olivier Renault could you dump the code into a Git repository, this way it is easier to copy and use the code. Thanks 😃
... View more
- « Previous
- Next »