About jstraub

jstraub · ‎11-23-2015

I recently ran into a situation where I had enabled HDFS HA and later had to change the value of dfs.nameservices. So basically during HA setup I set the value for dfs.nameservices to "MyHorton", but a couple hours later realized I should have used "MyCluster" instead. This article explains how you can change the dfs.nameservices value after HDFS HA has been enabled already. Background: What is the purpose of dfs.nameservices? Its the logical name of your HDFS nameservice. Its important to remember that there are several configuration parameters that have a key, which includes the actual value of dfs.nameservices, e.g. dfs.namenode.rpc-address.[nameservice id].nn1 Preparation: Put your HDFS in safemode and backup the namespace (https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfsadmin; dfsadmin -safemode enter; dfsadmin -saveNamespace); Stop Namenode service Backup Hive Metastore (mysqldump hive > /tmp/mydir/backup_hive.sql) Change Configuration: You have to adjust the hdfs-site configuration. Change all configurations that contain the old nameservice id to the new nameservice id. In my case the new nameservice ID was "mycluster". fs.defaultFS=hdfs:://mycluster dfs.nameservices=mycluster dfs.namenode.shared.edits.dir=qjournal://horton03.cloud.hortonworks.com:8485;horton02.cloud.hortonworks.com:8485;horton01.cloud.hortonworks.com:8485/mycluster dfs.client.failover.proxy.provider.mycluster=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.namenode.rpc-address.mycluster.nn2=horton02.cloud.hortonworks.com:8020 dfs.ha.namenodes.mycluster=nn1,nn2 dfs.namenode.rpc-address.mycluster.nn1=horton01.cloud.hortonworks.com:8020 dfs.namenode.http-address.mycluster.nn1=horton01.cloud.hortonworks.com:50070 dfs.namenode.http-address.mycluster.nn2=horton02.cloud.hortonworks.com:50070 dfs.namenode.https-address.mycluster.nn1=horton01.cloud.hortonworks.com:50470 dfs.namenode.https-address.mycluster.nn2=horton02.cloud.hortonworks.com:50470 Note: You can remove the configurations that include the old nameservice id (e.g. dfs.namenode.http-address.[old_nameservice_id].nn1) Reinit Journalnodes: This is necessary because the shared edits directory includes the nameservice id. Please see, http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hadoop-ha/content/ha-nn-deploy-nn-cluster.html Change Hive FSRoot: It might be necessary to change the Hive metadata after the above configuration changes. Check whether changes are necessary (as Hive-User): hive --service metatool -listFSRoot If you see any table that references the old nameservice id, you have to use the following commands to switch to the new nameservice id. Use the hive metatool to do a dry run (no actual change is made in this mode) of updating the table locations. hive --service metatool -updateLocation hdfs://mycluster hdfs://myhorton -dryRun If you are satisfied with the changes the metatool will make, run the command without the -dryRun option hive --service metatool -updateLocation hdfs://mycluster hdfs://myhorton Additional notes: If you are using HBase you have to adjust additional configurations.

jstraub · ‎11-17-2015

As far as I know, this is currently not possible, not sure why this feature was not pushed in the last couple years. Maybe multi-tenancy wasn't really an issue. I dont think anyone is working on HDFS-199 at the moment. I have seen a couple requests in our internal Jira regarding this, if you open a new feature enhancement with our support team, we might be able to get the ball rolling again. Your workaround looks good, I'd keep it for now.

jstraub · ‎11-10-2015

Great article. Thanks for sharing 🙂

jstraub · ‎11-06-2015

Thanks for the article! Have you tested the visualization with bigger datasets as well? I am curious how the UI works with bigger datasets or queries that need some time to calculate.

jstraub · ‎11-02-2015

Thanks! Sure we could post it in our blog.

jstraub · ‎10-30-2015

He did use the SolrCloud mode for the PutSolrContentStream. I used SolrStandalone, so it should work either way 🙂

jstraub · ‎10-29-2015

Awesome tutorial, Thanks for sharing 🙂

jstraub · ‎10-23-2015

One task everybody faces when setting up a new Hadoop cluster, is the allocation of services. Administrators of a cluster on the other hand might ask themselves, how are my services allocated? I have discussed the visualization of HDP clusters and services more often recently and therefore decided to share my application to visualize the current and future state of a cluster. (see link to hosted app at the end of the article) What does Service Allocation mean? Planning a Hadoop cluster involves many steps and tasks that need to be considered. Almost no setup is the same (although there are some similarities). The service allocation is the part that basically tells you what services and what components will be on which node and how many nodes you have or need. This can be quiet tedious and difficult, since not all services play along well with each other, every service has different hardware/setup requirements and adding many services can get confusing. This makes it even more important to have a sound overview of your service allocation. To plan, document and visualize the service allocation or complete Hadoop cluster I have used paper sketches, Excel sheets, text files, Powerpoints, Photoshop and other tools. However these approaches are often time consuming, hard to edit/re-use and in general not the best option. In need for a proper tool I have created this rather small and simple Angular application (at least it was at the beginning), which basically visualizes a cluster by using a simple JSON document as input source (see below). There are three ways to create a cluster visualization: Export a live cluster via Ambari's API Create a cluster by writing a JSON document as seen below Build a new cluster with the latest drag-n-drop build feature Lets say we have a cluster with: 2 Masternodes, 1 Datanode and a couple of different services. The cluster is defined as: { "stack_version":"HDP-2.2", "security_type":"KERBEROS", "name":"bigdata", "hosts_info":[ { "host_name":"c4068.ambari.apache.org", "components":[ "NAMENODE", "RESOURCEMANAGER", "APP_TIMELINE_SERVER", "HISTORYSERVER", "TEZ_CLIENT", "YARN_CLIENT", "HDFS_CLIENT", "HIVE_CLIENT", "MAPREDUCE2_CLIENT" ] }, { "host_name":"c4069.ambari.apache.org", "components":[ "SECONDARY_NAMENODE", "HIVE_METASTORE", "HIVE_SERVER", "HCAT", "WEBHCAT_SERVER", "TEZ_CLIENT", "YARN_CLIENT", "HDFS_CLIENT", "HIVE_CLIENT", "MAPREDUCE2_CLIENT" ] }, { "host_name":"c4070.ambari.apache.org", "components":[ "DATANODE", "NODEMANAGER", "TEZ_CLIENT", "YARN_CLIENT", "HDFS_CLIENT", "HIVE_CLIENT", "MAPREDUCE2_CLIENT" ] } ] } As soon as as the cluster is imported, you can choose between three views: Design flexibility through Environments Environments are basically exportable stack templates that contain information about available services and components as well as their groups and colors. In order to customize the visualization configuration (colors, sorting,...), you can edit the services and components within the application or the exported Environment (JSON). This makes it possible to use different output formats for specific clusters, departments, companies and so on by simply importing the environment when a cluster is imported. Why you might find this app useful: Planning new cluster Easy Ambari Blueprint generation Visualize cluster for concept or documentation Quick overview of a cluster (e.g. for support, sysadmins, ...) Consistent visualization/documentation ... If more people are interested in this project, I will add new features. For example: Filter by node groups (type of node or service or any custom group) Group nodes (Master, Worker, Edge, ...) Implement as Ambari View (?) ... I hope some might find this tool useful. Looking forward to your feedback 🙂 You can find more screenshots here: https://github.com/mr-jstraub/ambari-node-view/tree/master/screens Project & Setup: https://github.com/mr-jstraub/ambari-node-view The above article mainly focused on version 0.3, since then a new version has been released with exciting new features. Read more in the next section below Export, Build, Visualize and Deploy - What's new in v0.4 Since the above article has been publised in October, a lot of changes have been made and the web application has been heavily extended. In this short paragraph, I want to touch rather quickly on the latest enhancements, more details will follow in an additional article. Whats New? The nodes and its services/components have been completely redesigned/restructured Added an option to switch between fullnames and acronyms (e.g. Namenode and NN) New data structure for nodes. Nodes can have multiple hostnames now; this is a major change, since it reduces the data footprint immensly and allows the creation of simpler cluster templates Build a Cluster! - A drag-n-drop based user interface to build a cluster Blueprint Generator ! - Generate Ambari Blueprints directly from imported or built clusters Build a Cluster - New - This is definitely one of my favorite features. Instead of writing JSON templates, to plan and visualize a cluster, or exporting an existing cluster (although this is the easiest way) it is now possible to build a new cluster by using drag-n-drop. The tool supports up to 1000 Nodes, dynamic hostnames, HDFS & Yarn HA, .... Blueprints (Beta) - New - Generate Ambari Blueprints directly from imported or built clusters. General and Hostgroup-specific configurations can be added manually. More than one thousand suggested configuration parameters and categories. Read more in this article about Blueprints and "Build a Cluster"

jstraub · ‎10-14-2015

Hi @Chakra do you know if SAP Hana or SDA support kerberized connections to/from hadoop?

jstraub · ‎10-14-2015

Hi @Olivier Renault could you dump the code into a Git repository, this way it is easier to copy and use the code. Thanks 😃

Online	Offline
Last Visited	‎08-18-2019 08:21 AM

Member Since	‎09-15-2015 02:21 PM
Last Visited	‎08-18-2019 08:21 AM
Posts	457
Kudos received	469

Cloudera Community

Re: NiFi: How do I see the flowfile attributes nam...

Re: HDFS replication factor for a directory.

Changing dfs.nameservices value after HDFS HA has ...

Re: HDFS replication factor for a directory.

Re: Update NiFi Flow On-the-Fly via API

Re: New Visualization Feature in Hive View

Re: Visualize Cluster and Service Allocation - Rel...

Re: Sample HDF/NiFi flow to Push Tweets into Solr/...

Re: Sample HDF/NiFi flow to Push Tweets into Solr/...

Visualize Cluster and Service Allocation - Reloade...

Re: Hadoop Using SAP

Re: Sample Application to write to a Kerberised HB...