<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Can I change the contents of a file present inside HDFS? If Yes, how and what are the Pros and cons ? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Can-I-change-the-contents-of-a-file-present-inside-HDFS-If/m-p/221124#M182998</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/18929/yaswanthmuppireddy.html" nodeid="18929"&gt;@Shu&lt;/A&gt; &lt;/P&gt;&lt;P&gt;I have one doubt - If we change the contents of a file, will this affect to the metadata information stored on the Namenode.&lt;/P&gt;&lt;P&gt;what happens if we keep on appending the data to the same file on daily basis? Also, what if we append large files, will this reduces performance ?&lt;/P&gt;&lt;P&gt;Do you recommend to appending data the existing file or creating the new file ?&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;</description>
    <pubDate>Tue, 12 Dec 2017 18:43:19 GMT</pubDate>
    <dc:creator>rakesh_an1992</dc:creator>
    <dc:date>2017-12-12T18:43:19Z</dc:date>
    <item>
      <title>Can I change the contents of a file present inside HDFS? If Yes, how and what are the Pros and cons ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-I-change-the-contents-of-a-file-present-inside-HDFS-If/m-p/221122#M182996</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have a text file stored in HDFS and I want to append some rows into it.&lt;/P&gt;&lt;P&gt;How can I resolve complete this task ?&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;</description>
      <pubDate>Mon, 11 Dec 2017 18:31:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-I-change-the-contents-of-a-file-present-inside-HDFS-If/m-p/221122#M182996</guid>
      <dc:creator>rakesh_an1992</dc:creator>
      <dc:date>2017-12-11T18:31:56Z</dc:date>
    </item>
    <item>
      <title>Re: Can I change the contents of a file present inside HDFS? If Yes, how and what are the Pros and cons ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-I-change-the-contents-of-a-file-present-inside-HDFS-If/m-p/221123#M182997</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/47553/rakeshan1992.html" nodeid="47553"&gt;@Rakesh AN&lt;/A&gt;&lt;P&gt;Yes, you can append some rows to the existing text file in
hdfs&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;appendToFile&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Usage: hdfs dfs -appendToFile &amp;lt;localsrc&amp;gt; ...
&amp;lt;dst&amp;gt;&lt;/P&gt;&lt;P&gt;Append single src, or multiple srcs from local file system
to the destination file system. Also reads input from stdin and appends to
destination file system.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;hdfs dfs -appendToFile localfile /user/hadoop/hadoopfile&lt;/LI&gt;&lt;LI&gt;hdfs dfs -appendToFile localfile1 localfile2
/user/hadoop/hadoopfile&lt;/LI&gt;&lt;LI&gt;hdfs dfs -appendToFile localfile
hdfs://nn.example.com/hadoop/hadoopfile&lt;/LI&gt;&lt;LI&gt;hdfs dfs -appendToFile -
hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.&lt;/LI&gt;&lt;LI&gt;echo "hi"|hdfs dfs -appendToFile - /user/hadoop/hadoopfile&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;pros:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;A small file is one which is significantly smaller than the HDFS block sizeEvery file, Directory and block in HDFS is represented as an object in the namenode’s memory, the problem is that HDFS can’t handle lots of files, it is
good to have large files in HDFS instead of small files.&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="http://blog.cloudera.com/blog/2009/02/the-small-files-problem/"&gt;more info&lt;/A&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Cons:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;When we wants append to hdfs file we must need to obtain a
lease which is essentially a lock, to ensure the single writer
semantics.&lt;A target="_blank" href="https://www.slideshare.net/dataera/inside-hdfs-append"&gt;more info&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;In addition if you are having n part files in hdfs directory then wants to merge them into 1 file then&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;hadoop jar
/usr/hdp/2.5.3.0-37/hadoop-mapreduce/hadoop-streaming-2.7.3.2.5.3.0-37.jar \ 
-Dmapred.reduce.tasks=1 \ 
-input "&amp;lt;path-to-input-directory&amp;gt;" \ 
-output "&amp;lt;path-to-output-directory&amp;gt;" \ 
-mapper cat \ 
-reducer cat&lt;/PRE&gt;&lt;P&gt;make sure which version of hadoop streaming jar you are
using by going to&lt;/P&gt;&lt;PRE&gt;/usr/hdp&lt;/PRE&gt;&lt;P&gt;then give the input path and make sure the output directory
is not existed as this job will merge the files and creates the output
directory for you.&lt;/P&gt;&lt;P&gt;Here what i tried:-&lt;/P&gt;&lt;PRE&gt;#hdfs dfs -ls /user/yashu/folder2/
Found 2 items 
-rw-r--r--  3 hdfs hdfs  150 2017-09-26 17:55 /user/yashu/folder2/part1.txt 
-rw-r--r--  3 hdfs hdfs  20 2017-09-27 09:07 /user/yashu/folder2/part1_sed.txt&lt;/PRE&gt;&lt;PRE&gt;#hadoop jar
/usr/hdp/2.5.3.0-37/hadoop-mapreduce/hadoop-streaming-2.7.3.2.5.3.0-37.jar \&amp;gt; 
-Dmapred.reduce.tasks=1 \&amp;gt; 
-input "/user/yashu/folder2/" \&amp;gt; 
-output "/user/yashu/folder1/" \&amp;gt; 
-mapper cat \&amp;gt; 
-reducer cat&lt;/PRE&gt;&lt;P&gt;Folder2 having 2 files after running the above command, i am
storing the merged files to folder1 directory and the 2 files got merged into 1
file as you can see below.&lt;/P&gt;&lt;PRE&gt;#hdfs dfs -ls /user/yashu/folder1/
Found 2 items 
-rw-r--r--  3 hdfs hdfs  0 2017-10-09 16:00   /user/yashu/folder1/_SUCCESS 
-rw-r--r--  3 hdfs hdfs  174 2017-10-09 16:00 /user/yashu/folder1/part-00000&lt;/PRE&gt;&lt;P&gt;If the Answer helped to resolve your issue, &lt;STRONG&gt;Click on Accept button below&lt;/STRONG&gt; to accept the answer, That would be great help to Community users to find solution quickly for these kind of errors. &lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2017 05:32:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-I-change-the-contents-of-a-file-present-inside-HDFS-If/m-p/221123#M182997</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2017-12-12T05:32:42Z</dc:date>
    </item>
    <item>
      <title>Re: Can I change the contents of a file present inside HDFS? If Yes, how and what are the Pros and cons ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-I-change-the-contents-of-a-file-present-inside-HDFS-If/m-p/221124#M182998</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/18929/yaswanthmuppireddy.html" nodeid="18929"&gt;@Shu&lt;/A&gt; &lt;/P&gt;&lt;P&gt;I have one doubt - If we change the contents of a file, will this affect to the metadata information stored on the Namenode.&lt;/P&gt;&lt;P&gt;what happens if we keep on appending the data to the same file on daily basis? Also, what if we append large files, will this reduces performance ?&lt;/P&gt;&lt;P&gt;Do you recommend to appending data the existing file or creating the new file ?&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2017 18:43:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-I-change-the-contents-of-a-file-present-inside-HDFS-If/m-p/221124#M182998</guid>
      <dc:creator>rakesh_an1992</dc:creator>
      <dc:date>2017-12-12T18:43:19Z</dc:date>
    </item>
    <item>
      <title>Re: Can I change the contents of a file present inside HDFS? If Yes, how and what are the Pros and cons ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-I-change-the-contents-of-a-file-present-inside-HDFS-If/m-p/221125#M182999</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/47553/rakeshan1992.html" nodeid="47553"&gt;@Rakesh AN&lt;/A&gt;
&lt;/P&gt;&lt;P&gt;Yes it need to update the metadata because let's assume your existing file in HDFS is 127 MB size and you are appending 3 MB file to the existing file i.e 130 MB.Now we are going to split the 130 MB size file to 2 (128+2 MB) and &lt;STRONG&gt;make sure all the replicated files&lt;/STRONG&gt; are also updated with the new data.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Example:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;$ hdfs dfs -ls /user/yashu/test4/
Found 1 items
-rw-r--r--   3 hdfs hdfs         21 2017-12-11 15:42 /user/yashu/test4/sam.txt
$ hadoop fs -appendToFile sam.txt /user/yashu/test4/sam.txt
$ hdfs dfs -ls /user/yashu/test4/
Found 1 items
-rw-r--r--   3 hdfs hdfs         30 2017-12-12 09:19 /user/yashu/test4/sam.txt
$ echo "hi"|hdfs dfs -appendToFile - /user/yashu/test4/sam.txt
$ hdfs dfs -ls /user/yashu/test4/
Found 1 items
-rw-r--r--   3 hdfs hdfs         33 2017-12-12 09:20 /user/yashu/test4/sam.txt&lt;/PRE&gt;&lt;P&gt;In this above example you can see my HDFS file is having size 21 and date is 2017-12-11 15:42 and then i appended the file then the size and date has changed. Name node needs to update the new metadata of the file and update the replicated blocks also.&lt;A target="_blank" href="https://hortonworks.com/blog/hdfs-metadata-directories-explained/"&gt;HDFS MetaData&lt;/A&gt;&lt;/P&gt;&lt;P&gt;It won't reduce the performance if you are having big file sizes also. Append new data to the existing file. &lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/16278/best-practises-beetwen-size-block-size-file-and-re.html" target="_blank"&gt;https://community.hortonworks.com/questions/16278/best-practises-beetwen-size-block-size-file-and-re.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Dec 2017 22:32:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-I-change-the-contents-of-a-file-present-inside-HDFS-If/m-p/221125#M182999</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2017-12-12T22:32:52Z</dc:date>
    </item>
    <item>
      <title>Re: Can I change the contents of a file present inside HDFS? If Yes, how and what are the Pros and cons ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-I-change-the-contents-of-a-file-present-inside-HDFS-If/m-p/221126#M183000</link>
      <description>&lt;P&gt;Thanks for the clarification &lt;A rel="user" href="https://community.cloudera.com/users/18929/yaswanthmuppireddy.html" nodeid="18929"&gt;@Shu&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 13 Dec 2017 20:50:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-I-change-the-contents-of-a-file-present-inside-HDFS-If/m-p/221126#M183000</guid>
      <dc:creator>rakesh_an1992</dc:creator>
      <dc:date>2017-12-13T20:50:57Z</dc:date>
    </item>
  </channel>
</rss>

