Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar

SmartSense is an excellent tool for keeping your cluster running at optimal efficiency while maintaining operational best practices. We’ve combined knowledge from the greatest minds in the industry, and use it to analyze metadata about your cluster from the bundles you submit.

Have you ever wondered exactly what data you’re sending to SmartSense? The SmartSense Admin Guide contains a high-level description (see What’s Included in a Bundle), but for the greatest understanding you should extract a bundle and explore it with your own eyes!

Obtain a Bundle

There are two types of bundles...

  1. Analysis Bundle: configs and metrics for all services on all hosts
  2. Troubleshooting Bundle: Analysis Bundle + logs for selected service(s)

To begin, let’s capture an Analysis Bundle:

4325-ui-capture-circled.png

...and download an unencrypted copy to our local machine:

4326-ui-download-cropped.png

The bundle is a gzipped tar file that contains a gzipped tar file from each host running the HST Agent. In the following examples, notice the bundle variable excludes the .tgz extension.

Linux or OS X users can extract everything with a bash for-loop:

bundle=a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35
tar zxf $bundle.tgz && cd $bundle && for i in * ; do tar zxf "$i" ; rm "$i" ; done

Windows users can use a similar process with a utility like 7-Zip. Assuming 7z.exe is in your path:

setlocal
set bundle=a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35
7z x %bundle%.tgz && 7z x %bundle%.tar && rm %bundle%.tar && cd %bundle%
for %i in (*.tgz) do 7z x %i && rm %i
for %i in (*.tar) do 7z x %i && rm %i
endlocal

Exploring Bundle Contents

NOTE: Example console output was obtained from a SmartSense 1.2.1 bundle and may differ in future versions. The output is also truncated for brevity. You’re encouraged to follow along with a bundle from your own cluster.

For a convenient overview of the bundle contents, use the tree command, limited to a depth of 3:

MyLaptop:a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 myuser$ tree -L 3
.
├── meta
│   └── metadata.json
├── mgmt.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35
│   ├── os
│   │   ├── logs
│   │   └── reports
│   └── services
│       ├── AMBARI
│       ├── AMS
│       ├── HDFS
│       ├── HST
│       ├── MR
│       ├── TEZ
│       ├── YARN
│       └── ZK
├── node1.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35
│   ├── os
│   │   ├── logs
│   │   └── reports
│   └── services
│       ├── AMBARI
...

41 directories, 4 files

At the root of the bundle, we see a ‘meta’ folder, and a folder per host. The meta folder contains some bundle metadata. Note that domain names are anonymized (my cluster uses example.com). Let’s take a look inside the two subfolders (os & services) per host...

Bundle Contents: OS

The os folder contains a couple system logs and a variety of reports. Here’s a sample from my cluster:

MyLaptop:node1.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 myuser$ tree -I "blockdevices" os/
os/
├── logs
│   └── messages.log
└── reports
    ├── chkconfig.txt
    ├── cpu_info.txt
    ├── dns_lookup.txt
    ├── dstat.txt
    ├── error_dmesg.txt
    ├── file_max.txt
 ...

5 directories, 49 files

Most of the filenames here are self-explanatory. Reports generally contain output from system commands or the /proc filesystem. These system characteristics serve as valuable inputs for determining your cluster’s optimal configuration.

Bundle Contents: Services

Within each host folder, the services subfolder contains configurations and reports for every HDP service on that host. Here’s an example from my node1:

MyLaptop:node1.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 myuser$ tree -L 3 services
services
├── AMBARI
│   ├── conf
│   │   ├── ambari-agent.ini
│   │   ├── ambari-agent.pid
│   │   └── logging.conf.sample
│   └── reports
│       ├── ambari_rpm.txt
│       ├── postgres_rpm.txt
│       ├── postmaster.txt
│       └── process_info.txt
├── AMS
│   ├── conf
│   │   ├── ams-env.sh
│   │   ├── metric_groups.conf
│   │   └── metric_monitor.ini
│   ├── metrics
│   │   └── ams
│   └── reports
│       └── ams_rpm.txt
...

32 directories, 157 files

The conf folders are copied from their respective locations under /etc/ (or /var/run for the .pid files). Reports contain JMX metrics and output from CLI commands, such as the YARN application list.

You can explore the contents using text processing commands like grep, sort, and uniq, which might be sufficient for your needs. Another option is to use a text editor with a file-tree view.

Text Editors

Here are three open source text editors that integrate a file-tree for easy navigation (see attachments at the bottom for full-size images)...

TextMate 2 (OS X):

4327-textmate.png

Notepad++ (Windows):

4328-notepadplusplus.png

Vim + NerdTree (Linux, OS X):

4329-vim.png

Anonymization Rules

The default set of anonymization rules will protect IP addresses, hostnames, and password fields in standard HDP configuration files. You can modify or add anonymization rules if desired. Watch for a future HCC article where we take a deep dive into anonymization.

After making any changes to the anonymization ruleset, it is wise to verify everything is still functioning as intended. This can be accomplished by downloading an unencrypted bundle and examining its contents using the methods described above.

Until Next Time...

Keeping in mind that we only looked within a single host folder, and that my demo cluster has the minimum number of components for a functioning HDP stack, we can see that every bundle is packed with useful information.

Knowing exactly what’s included in a SmartSense bundle provides peace of mind, and the trust that your confidential data remains secure and private.

3,153 Views
Version history
Last update:
‎08-17-2019 12:27 PM
Updated by:
Contributors