<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Check opening files on HDFS in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170174#M25312</link>
    <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/5296/tunglqit.html" nodeid="5296"&gt;@tunglq it&lt;/A&gt;, which files do you have in mind? After installing HDP cluster there are a number of system (cluster) file written in HDFS but no jobs are running (Ambari metrics in embedded mode and Ranger audit can write files all the time but they are a special case). Folders like apps, hdp, system, tmp, user etc. are created and populated automatically during the cluster upgrade.&lt;/P&gt;</description>
    <pubDate>Fri, 15 Apr 2016 16:46:44 GMT</pubDate>
    <dc:creator>pminovic</dc:creator>
    <dc:date>2016-04-15T16:46:44Z</dc:date>
    <item>
      <title>Check opening files on HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170172#M25310</link>
      <description>&lt;P&gt;Dear all,&lt;/P&gt;&lt;P&gt;    Currently, I have some files on HDFS and it is being writed for which jobs. But, I don't know this jobs.&lt;/P&gt;&lt;P&gt;  So, how to i check opening files is of which jobs?&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;</description>
      <pubDate>Fri, 15 Apr 2016 16:19:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170172#M25310</guid>
      <dc:creator>tunglq_it</dc:creator>
      <dc:date>2016-04-15T16:19:05Z</dc:date>
    </item>
    <item>
      <title>Re: Check opening files on HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170173#M25311</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/5296/tunglqit.html" nodeid="5296"&gt;@tunglq it&lt;/A&gt;&lt;P&gt;Are you curios to check for HDFS open files or local FS open files  ?&lt;/P&gt;</description>
      <pubDate>Fri, 15 Apr 2016 16:46:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170173#M25311</guid>
      <dc:creator>sshimpi</dc:creator>
      <dc:date>2016-04-15T16:46:40Z</dc:date>
    </item>
    <item>
      <title>Re: Check opening files on HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170174#M25312</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/5296/tunglqit.html" nodeid="5296"&gt;@tunglq it&lt;/A&gt;, which files do you have in mind? After installing HDP cluster there are a number of system (cluster) file written in HDFS but no jobs are running (Ambari metrics in embedded mode and Ranger audit can write files all the time but they are a special case). Folders like apps, hdp, system, tmp, user etc. are created and populated automatically during the cluster upgrade.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Apr 2016 16:46:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170174#M25312</guid>
      <dc:creator>pminovic</dc:creator>
      <dc:date>2016-04-15T16:46:44Z</dc:date>
    </item>
    <item>
      <title>Re: Check opening files on HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170175#M25313</link>
      <description>&lt;P&gt;thanks you with your answer!&lt;/P&gt;&lt;P&gt;    Example, I have 1 file on hdfs: hdfs://cluster/data/logs.txt, it is writing for a any job. Now, i want to know which job is writing this file logs.txt?????&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;</description>
      <pubDate>Fri, 15 Apr 2016 17:13:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170175#M25313</guid>
      <dc:creator>tunglq_it</dc:creator>
      <dc:date>2016-04-15T17:13:01Z</dc:date>
    </item>
    <item>
      <title>Re: Check opening files on HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170176#M25314</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/5296/tunglqit.html" nodeid="5296"&gt;@tunglq it&lt;/A&gt; &lt;/P&gt;&lt;P&gt;So there is no straight forward way to identify which file was written by which job, however we need little bit hand works to achieve this by parsing all job logs through a script and should look for that specific file path or name occurrences in the logs. In most cases if you ran a map reduce job then it is likely that Application master container logs should have that information, if not then better if you parse each job containers logs one by one through a script.&lt;/P&gt;&lt;P&gt;will that help?&lt;/P&gt;</description>
      <pubDate>Fri, 15 Apr 2016 17:26:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170176#M25314</guid>
      <dc:creator>jyadav</dc:creator>
      <dc:date>2016-04-15T17:26:25Z</dc:date>
    </item>
    <item>
      <title>Re: Check opening files on HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170177#M25315</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/5296/tunglqit.html" nodeid="5296"&gt;@tunglq it&lt;/A&gt;&lt;P&gt;You need to write a custom script[say bash/perl] which will check for mapreduce log files and accordingly you can capture the src/dest of any hdfs file which the job is using.&lt;/P&gt;&lt;P&gt;Some more login within the script may help you to track which are currently inuse files on hdfs.&lt;/P&gt;</description>
      <pubDate>Fri, 15 Apr 2016 17:57:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170177#M25315</guid>
      <dc:creator>sshimpi</dc:creator>
      <dc:date>2016-04-15T17:57:28Z</dc:date>
    </item>
    <item>
      <title>Re: Check opening files on HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170178#M25316</link>
      <description>&lt;P&gt;First you can check the hdfs audit log which tells the client name that created the file. Then based on the client name you may go and search the yarn application log to identify which job was writing the file.&lt;/P&gt;</description>
      <pubDate>Sat, 28 May 2016 08:10:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170178#M25316</guid>
      <dc:creator>jing</dc:creator>
      <dc:date>2016-05-28T08:10:24Z</dc:date>
    </item>
    <item>
      <title>Re: Check opening files on HDFS</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170179#M25317</link>
      <description>&lt;P&gt;If this problem happens a lot,I mean you always need know the mapping from file operations (create, delete, rename etc) to upper level applications, I think you can suggest users use caller context feature, which was released to HDP 2.2 and up.&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;The feature introduces a new setting &lt;CODE&gt;hadoop.caller.context.enabled&lt;/CODE&gt;. When set to additional fields are written into namenode audit log records to help identify the job or query that introduced each NameNode operation. This feature is &lt;STRONG&gt;enabled&lt;/STRONG&gt; by default starting with this release of HDP.&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;New Behavior&lt;/STRONG&gt;: This feature brings a new key-value pair at the end of each audit log record. The newly added key at is &lt;CODE&gt;callerContext&lt;/CODE&gt;, value &lt;CODE&gt;context:signature&lt;/CODE&gt;. The overall format would be callerContext=context:signature. If the signature is &lt;EM&gt;null&lt;/EM&gt; or &lt;EM&gt;empty&lt;/EM&gt;, the value will be context only, in the format of &lt;CODE&gt;callerContext=context&lt;/CODE&gt;. If the &lt;CODE&gt;hadoop.caller.context.enabled&lt;/CODE&gt; config key is false, the key-value pair will not be showing. The audit log format is not changed in this case. It is also possible to limit the maximum length of context and signature. Consider the &lt;CODE&gt;hadoop.caller.context.max.size&lt;/CODE&gt; config key (default 128 bytes) and &lt;CODE&gt;hadoop.caller.context.signature.max.size&lt;/CODE&gt;(default 40 bytes) config key respectively.&lt;P&gt;There is a chance that the new information in the audit log may break existing scripts/automation that was being used to analyze the audit log. In this case the scripts may need to be fixed. We do not recommend disabling this feature as it can be a useful troubleshooting aid.&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;Please refer to &lt;A href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4-Win/bk_HDP_RelNotes_Win/content/behavior_changes.html"&gt;release notes&lt;/A&gt;.&lt;/P&gt;</description>
      <pubDate>Sat, 28 May 2016 08:17:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Check-opening-files-on-HDFS/m-p/170179#M25317</guid>
      <dc:creator>mliu</dc:creator>
      <dc:date>2016-05-28T08:17:13Z</dc:date>
    </item>
  </channel>
</rss>

