Created on 07-16-2016 02:08 AM - edited 08-17-2019 11:21 AM
Update: apparently when you initiate a support case resolution capture, let's say for HBase service, it will pull HDFS namenode logs in addition to the HBase logs. You may be faced with the same issue and may have to apply the approach below to overcome timeouts.
In SmartSense 1.3.0 this will no longer be an issue. Until then, this is a way to avoid capture time outs. Firstly, lets discuss the difference between capture for analysis and support case resolution. Analysis bundles do not collect service logs. For support cases, you're going to fetch configuration and logs. Then based on how much anonymization you will want to apply, large log files will take a long time to collect. This is especially prominent with HDFS namenode logs. They tend to be big and this is exactly the scenario we're trying to address. Firstly, increase the threshold for agent time out in Ambari. In my case it was 30min. Feel free to raise it up to 2hrs on the Ambari SmartSense Operations page.
Then, we're going to exclude anything but hadoop-hdfs-namenode-*.log logs. That leaves .out and .out.* and .log.* files out of the collection. On the HST server host, where HST is analogous to SmartSense, go to /var/lib/smartsense/hst-agent/resources/scripts directory. Notice we're accessing hst-agent not hst-server directory. The collection scripts exist on agent hosts not on hst-server. Edit hdfs-scripts.xml file and go to line 100, it may be 10 lines give or take depending on which version of SmartSense you're running. On 1.2.2, it is line 100. Change the following lines
if [ `hostname -f` == "${MASTER}" ] && [ `echo "${SLAVES}" | grep -o ',' | wc -l` -gt 1 ] ; then find $LOG 2>/dev/null -type f -mtime -2 -iname '*' -exec cp '{}' ${outputdir} \; find $LOG 2>/dev/null -type f -mtime -2 -iname '*' -exec cp '{}' ${outputdir} \; else for file in `find $LOG 2>/dev/null -type f -mtime -2 -iname '*' ; find $LOG 2>/dev/null -type f -mtime -2 -iname '*' ; `
to
if [ `hostname -f` == "${MASTER}" ] && [ `echo "${SLAVES}" | grep -o ',' | wc -l` -gt 1 ] ; then # find $LOG 2>/dev/null -type f -mtime -2 -iname '*' -exec cp '{}' ${outputdir} \; find $LOG 2>/dev/null -type f -mtime -2 -iname '*.log' -exec cp '{}' ${outputdir} \; else for file in `find $LOG 2>/dev/null -type f -mtime -2 -iname '*.log' ; find $LOG 2>/dev/null -type f -mtime -2 -iname '*.log' ;
It is hard to see the difference, what we changed is actually comment out first find command, in 2nd find command, we replaced '*' to '*.log' and repeated the same in the for loop and again in the last find command. So for every iteration of '*', replace that with '*.log'. As the last step, let's restart SmartSense service and agents to propagate the changes to every agent; we only care about namenode nodes but depending on service and host components, I don't see why we couldn't restart all of them.
One other thing I'd like to point out is that that same directory /var/lib/smartsense/hst-agent/resources/scripts contains scripts for other services, so essentially you can apply the same steps for any other service. Granted this is a pretty corner use case but when you're investigating a high severity issue and you have no means of uploading logs besides going at it the hard way, this may be a good approach.
Finally, let's verify this approach. Go to SmartSense view and initiate a capture.
At this point, when capture is complete, go to the SmartSense server node and navigate to the local storage directory.
in that directory, you will find your latest bundle, uncompress it and cd into the new directory
In that directory, there will be another compressed file, uncompress that as well.
Finally CD into that directory and then into services directory. At this point you will see various services. We care about HDFS. Go inside it and finally into logs directory.
There you will find your *.logs
I want to highlight the fact that this is a hack and use it at your own risk. At the very least, notify your support engineer of the approach. I'd like to thank @Paul Codding and @sheetal for showing me the inner-workings of SmartSense. Your feedback is welcome.