Support Questions
Find answers, ask questions, and share your expertise

Are there any node performance implications of using SmartSense 1.3 ? It seems SmartSense agents on cluster nodes have performance impacts due to heavy diagnostic log collection process.

Explorer
 
1 ACCEPTED SOLUTION

@hitaay

Start here: http://docs.hortonworks.com/HDPDocuments/SS1/SmartSense-1.3.0/bk_installation/content/ambari_install...

It answers 1, 2 and 3. Yes, you can limit what and how often. Sensitive information can be also randomized.

Regarding one of your concerns, be aware that Activity Analyzers deployed to the NameNodes in the cluster do not process any utilization data besides HDFS. Therefore, to process YARN, MapReduce, and Tez utilization data, another instance of the Activity Analyzer needs to be deployed to another node in the cluster, preferably on a non-master node.

If any of the responses was helpful, please vote and accept as best answer.

View solution in original post

5 REPLIES 5

@hitaay

What did you see as "Heavy performance impact"? Could you put some numbers next to it? CPU, RAM, disk, etc?

How much logging could possibly be happening in your cluster as such collecting specific logs can impact the cluster?

How big is your cluster and how utilized is?

I haven't seen one case where SmartSense was the culprit. Please help me to document a first case and get possibly to the engineering.

Contributor

SmartSense agents are passive agents and not continuously running demons. They wake up only when there is data collection need (typically once a week or on ad-hoc invocation). They only run for few minutes. They use very limited memory. The CPU utilization would depend on how many and how large logs you are collecting and number of data anonymization rules you have. But usually it is not too heavy and finishes in minutes.

Explorer
@sheetal

Thanks for your response and sharing insight on SmartSense agent functionality. I got an impression from my initial analysis that SmartSense agents works more as a active demon, pooling diagnostic information frequently which means sharing node resources. Could you help me with following further queries.

1. How do we configure/fix activation frequency of SmartSense? What is optimal activation frequency?

2. Do we have any control on limiting/setting up the kind of logs which SmartSense captures?

3. Could you share any support manual/white paper or URL on SmartSense set-up and configuration.

Thanks agian for your response.

Explorer
@Constantin Stanca

many thanks for your response.

We have recently deployed a cluster (30 node (2+PB)) with HDP 2.5 and planning to use SmartSense as a part of Ops for our cluster. I have shared these queries in order to understand functionality and technical resource requirement for SmartSense.

Could you share any pointers for better comprehension of SmartSense set-up, configuration and working.

Cheers!

@hitaay

Start here: http://docs.hortonworks.com/HDPDocuments/SS1/SmartSense-1.3.0/bk_installation/content/ambari_install...

It answers 1, 2 and 3. Yes, you can limit what and how often. Sensitive information can be also randomized.

Regarding one of your concerns, be aware that Activity Analyzers deployed to the NameNodes in the cluster do not process any utilization data besides HDFS. Therefore, to process YARN, MapReduce, and Tez utilization data, another instance of the Activity Analyzer needs to be deployed to another node in the cluster, preferably on a non-master node.

If any of the responses was helpful, please vote and accept as best answer.

; ;