Created on 05-19-2016 03:59 AM - edited 08-17-2019 12:27 PM
SmartSense is an excellent tool for keeping your cluster running at optimal efficiency while maintaining operational best practices. We’ve combined knowledge from the greatest minds in the industry, and use it to analyze metadata about your cluster from the bundles you submit.
Have you ever wondered exactly what data you’re sending to SmartSense? The SmartSense Admin Guide contains a high-level description (see What’s Included in a Bundle), but for the greatest understanding you should extract a bundle and explore it with your own eyes!
Obtain a Bundle
There are two types of bundles...
To begin, let’s capture an Analysis Bundle:
...and download an unencrypted copy to our local machine:
The bundle is a gzipped tar file that contains a gzipped tar file from each host running the HST Agent. In the following examples, notice the bundle variable excludes the .tgz extension.
Linux or OS X users can extract everything with a bash for-loop:
bundle=a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 tar zxf $bundle.tgz && cd $bundle && for i in * ; do tar zxf "$i" ; rm "$i" ; done
Windows users can use a similar process with a utility like 7-Zip. Assuming 7z.exe is in your path:
setlocal set bundle=a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 7z x %bundle%.tgz && 7z x %bundle%.tar && rm %bundle%.tar && cd %bundle% for %i in (*.tgz) do 7z x %i && rm %i for %i in (*.tar) do 7z x %i && rm %i endlocal
Exploring Bundle Contents
NOTE: Example console output was obtained from a SmartSense 1.2.1 bundle and may differ in future versions. The output is also truncated for brevity. You’re encouraged to follow along with a bundle from your own cluster.
For a convenient overview of the bundle contents, use the tree command, limited to a depth of 3:
MyLaptop:a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 myuser$ tree -L 3 . ├── meta │ └── metadata.json ├── mgmt.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 │ ├── os │ │ ├── logs │ │ └── reports │ └── services │ ├── AMBARI │ ├── AMS │ ├── HDFS │ ├── HST │ ├── MR │ ├── TEZ │ ├── YARN │ └── ZK ├── node1.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 │ ├── os │ │ ├── logs │ │ └── reports │ └── services │ ├── AMBARI ... 41 directories, 4 files
At the root of the bundle, we see a ‘meta’ folder, and a folder per host. The meta folder contains some bundle metadata. Note that domain names are anonymized (my cluster uses example.com). Let’s take a look inside the two subfolders (os & services) per host...
Bundle Contents: OS
The os folder contains a couple system logs and a variety of reports. Here’s a sample from my cluster:
MyLaptop:node1.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 myuser$ tree -I "blockdevices" os/ os/ ├── logs │ └── messages.log └── reports ├── chkconfig.txt ├── cpu_info.txt ├── dns_lookup.txt ├── dstat.txt ├── error_dmesg.txt ├── file_max.txt ... 5 directories, 49 files
Most of the filenames here are self-explanatory. Reports generally contain output from system commands or the /proc filesystem. These system characteristics serve as valuable inputs for determining your cluster’s optimal configuration.
Bundle Contents: Services
Within each host folder, the services subfolder contains configurations and reports for every HDP service on that host. Here’s an example from my node1:
MyLaptop:node1.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 myuser$ tree -L 3 services services ├── AMBARI │ ├── conf │ │ ├── ambari-agent.ini │ │ ├── ambari-agent.pid │ │ └── logging.conf.sample │ └── reports │ ├── ambari_rpm.txt │ ├── postgres_rpm.txt │ ├── postmaster.txt │ └── process_info.txt ├── AMS │ ├── conf │ │ ├── ams-env.sh │ │ ├── metric_groups.conf │ │ └── metric_monitor.ini │ ├── metrics │ │ └── ams │ └── reports │ └── ams_rpm.txt ... 32 directories, 157 files
The conf folders are copied from their respective locations under /etc/ (or /var/run for the .pid files). Reports contain JMX metrics and output from CLI commands, such as the YARN application list.
You can explore the contents using text processing commands like grep, sort, and uniq, which might be sufficient for your needs. Another option is to use a text editor with a file-tree view.
Text Editors
Here are three open source text editors that integrate a file-tree for easy navigation (see attachments at the bottom for full-size images)...
TextMate 2 (OS X):
Notepad++ (Windows):
Vim + NerdTree (Linux, OS X):
Anonymization Rules
The default set of anonymization rules will protect IP addresses, hostnames, and password fields in standard HDP configuration files. You can modify or add anonymization rules if desired. Watch for a future HCC article where we take a deep dive into anonymization.
After making any changes to the anonymization ruleset, it is wise to verify everything is still functioning as intended. This can be accomplished by downloading an unencrypted bundle and examining its contents using the methods described above.
Until Next Time...
Keeping in mind that we only looked within a single host folder, and that my demo cluster has the minimum number of components for a functioning HDP stack, we can see that every bundle is packed with useful information.
Knowing exactly what’s included in a SmartSense bundle provides peace of mind, and the trust that your confidential data remains secure and private.