<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Query on Hadoop logs in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Query-on-Hadoop-logs/m-p/162945#M33313</link>
    <description>&lt;P&gt;You can just delete anything ending in a timestamp that is old enough for you if you want. Other things I have seen are people using "find -mtime" to delete all older logs older than x days. Or you can configure the log4j settings of your hadoop components. ( ambari-&amp;gt;hdfs-&amp;gt;advanced hdfs-log4j ) &lt;/P&gt;&lt;P&gt;Unfortunately the very useful DailyRollingFileAppender currently does not support deleting older files. ( It does in a newer version some hadoop components may support that parameter ). However you could change the log appender to the RollingFileAppender which provides a maxBackupIndex attribute where you can keep up to x log files.  ( Don't use it for oozie though since oozie admin features depend on the dailyrollingfileappender ) &lt;/P&gt;&lt;P&gt;So as usual a plethora of options &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://www.tutorialspoint.com/log4j/log4j_logging_files.htm"&gt;http://www.tutorialspoint.com/log4j/log4j_logging_files.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Edit: the DailyRollingFileAppender in HDFS seems to be newer and has the following setting commented out in HDP 2.4. You can try just commenting that in and set it to a number you are comfortable with. The one below would keep 30 day of log files around. &lt;/P&gt;&lt;P&gt;#log4j.appender.DRFA.MaxBackupIndex=30&lt;/P&gt;</description>
    <pubDate>Wed, 29 Jun 2016 19:22:31 GMT</pubDate>
    <dc:creator>bleonhardi</dc:creator>
    <dc:date>2016-06-29T19:22:31Z</dc:date>
    <item>
      <title>Query on Hadoop logs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Query-on-Hadoop-logs/m-p/162942#M33310</link>
      <description>&lt;P&gt;Any idea on what are the various log files that are typically created under /var/log/hadoop/* folders? Is there a defined naming convention and mapping to hadoop deamons? The reason I ask is I see many files listed under /var/log/hadoop/hdfs folder, but don't understand / can't find documentation on what is the purpose of each log file. Any help please.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jun 2016 17:28:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Query-on-Hadoop-logs/m-p/162942#M33310</guid>
      <dc:creator>bigdata_superno</dc:creator>
      <dc:date>2016-06-29T17:28:41Z</dc:date>
    </item>
    <item>
      <title>Re: Query on Hadoop logs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Query-on-Hadoop-logs/m-p/162943#M33311</link>
      <description>&lt;P&gt;In general .log files are the java log files that tell you about any operations issues&lt;/P&gt;&lt;P&gt;.out files are the log files of the java process starter. So if you get any system faults like  jvm cannot start or segmentation faults you will find them there. &lt;/P&gt;&lt;P&gt;All logs will have a rollover. I.e. the file without any timestamp at the end is the newest one, and log4j will keep an amount of older rolled over logs with a timestamp in the name.&lt;/P&gt;&lt;P&gt;Apart from that the naming is pretty straightforward&lt;/P&gt;&lt;P&gt;hadoop-hdfs-datanode: the log of the datanode on the cluster&lt;/P&gt;&lt;P&gt;hadoop-hdfs-namenode: the log of the namenode&lt;/P&gt;&lt;P&gt;hadoop-hdfs-secondarynamenode: the log of the secondary namenode&lt;/P&gt;&lt;P&gt;hdfs-audit: the audit log of hdfs, logs all activities happening in the cluster. ( Users doing things ) &lt;/P&gt;&lt;P&gt;gc files: Garbage collection logs enabled for the namenode/datanode processes&lt;/P&gt;&lt;P&gt;So if you have any problems you normally find them in the hadoop-hdfs logs, if the problem is jvm configuration related in .out but normally in .log.&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jun 2016 18:11:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Query-on-Hadoop-logs/m-p/162943#M33311</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-06-29T18:11:12Z</dc:date>
    </item>
    <item>
      <title>Re: Query on Hadoop logs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Query-on-Hadoop-logs/m-p/162944#M33312</link>
      <description>&lt;P&gt;Thanks &lt;A rel="user" href="https://community.cloudera.com/users/168/bleonhardi.html" nodeid="168"&gt;@Benjamin Leonhardi&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Further to my question, what is the best strategy to remove old log files? Can I simply remove all the logs apart from the "current" ones without any issues? Is there any best practice around log management? Thanks&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jun 2016 18:14:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Query-on-Hadoop-logs/m-p/162944#M33312</guid>
      <dc:creator>bigdata_superno</dc:creator>
      <dc:date>2016-06-29T18:14:19Z</dc:date>
    </item>
    <item>
      <title>Re: Query on Hadoop logs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Query-on-Hadoop-logs/m-p/162945#M33313</link>
      <description>&lt;P&gt;You can just delete anything ending in a timestamp that is old enough for you if you want. Other things I have seen are people using "find -mtime" to delete all older logs older than x days. Or you can configure the log4j settings of your hadoop components. ( ambari-&amp;gt;hdfs-&amp;gt;advanced hdfs-log4j ) &lt;/P&gt;&lt;P&gt;Unfortunately the very useful DailyRollingFileAppender currently does not support deleting older files. ( It does in a newer version some hadoop components may support that parameter ). However you could change the log appender to the RollingFileAppender which provides a maxBackupIndex attribute where you can keep up to x log files.  ( Don't use it for oozie though since oozie admin features depend on the dailyrollingfileappender ) &lt;/P&gt;&lt;P&gt;So as usual a plethora of options &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://www.tutorialspoint.com/log4j/log4j_logging_files.htm"&gt;http://www.tutorialspoint.com/log4j/log4j_logging_files.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Edit: the DailyRollingFileAppender in HDFS seems to be newer and has the following setting commented out in HDP 2.4. You can try just commenting that in and set it to a number you are comfortable with. The one below would keep 30 day of log files around. &lt;/P&gt;&lt;P&gt;#log4j.appender.DRFA.MaxBackupIndex=30&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jun 2016 19:22:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Query-on-Hadoop-logs/m-p/162945#M33313</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-06-29T19:22:31Z</dc:date>
    </item>
    <item>
      <title>Re: Query on Hadoop logs</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Query-on-Hadoop-logs/m-p/162946#M33314</link>
      <description>&lt;P&gt;Perfect, Thanks!!&lt;/P&gt;</description>
      <pubDate>Wed, 29 Jun 2016 20:20:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Query-on-Hadoop-logs/m-p/162946#M33314</guid>
      <dc:creator>bigdata_superno</dc:creator>
      <dc:date>2016-06-29T20:20:22Z</dc:date>
    </item>
  </channel>
</rss>

