<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Merge and Rename files in HDFS - Pig? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-and-Rename-files-in-HDFS-Pig/m-p/157816#M33034</link>
    <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/2985/antonio-scp125.html" nodeid="2985"&gt;@João Souza,&lt;/A&gt; no problem. Yes you should still be able to use split, just with&lt;/P&gt;&lt;PRE&gt;IF (date=='2016-06-23')&lt;/PRE&gt;&lt;P&gt;comparing string type instead of date type.&lt;/P&gt;&lt;P&gt;Hope this helps!&lt;/P&gt;</description>
    <pubDate>Tue, 28 Jun 2016 07:21:27 GMT</pubDate>
    <dc:creator>emilysharpe</dc:creator>
    <dc:date>2016-06-28T07:21:27Z</dc:date>
    <item>
      <title>Merge and Rename files in HDFS - Pig?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-and-Rename-files-in-HDFS-Pig/m-p/157812#M33030</link>
      <description>&lt;P&gt;Hi experts,&lt;/P&gt;&lt;P&gt;I used Apache Pig to add a new column to my 3 text files inserted on HDFS. The three texts files was:&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;2016-06-25.txt&lt;/LI&gt;&lt;LI&gt;2016-06-24.txt&lt;/LI&gt;&lt;LI&gt;2016-06-23.txt&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;However after I execute my Pig code I've in my HDFS 7 files (because the Map Reduce):&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;part-m-0000&lt;/LI&gt;&lt;LI&gt;part-m-0001&lt;/LI&gt;&lt;LI&gt;part-m-0002&lt;/LI&gt;&lt;LI&gt;part-m-0003&lt;/LI&gt;&lt;LI&gt;...&lt;/LI&gt;&lt;LI&gt;part-m-0006&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;How can I obtain only 3 files with it orignally name? Basically I want to add the new column but still have the same files with the same name...&lt;/P&gt;&lt;P&gt;My code is:&lt;/P&gt;&lt;UL&gt;
&lt;LI&gt;Src = LOAD '/data/Src/' using PigStorage(' ','-tagFile'); &lt;/LI&gt;&lt;LI&gt;STORE Src INTO '/data/Src/Src2' USING PigStorage(' ');&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Sun, 26 Jun 2016 23:29:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-and-Rename-files-in-HDFS-Pig/m-p/157812#M33030</guid>
      <dc:creator>prodgers125</dc:creator>
      <dc:date>2016-06-26T23:29:38Z</dc:date>
    </item>
    <item>
      <title>Re: Merge and Rename files in HDFS - Pig?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-and-Rename-files-in-HDFS-Pig/m-p/157813#M33031</link>
      <description>&lt;P&gt;There's currently no mechanism to force the name of MapReduce output files. &lt;/P&gt;&lt;P&gt;Once you've loaded all the data and added the extra column, you can split your alias into one per date, then store each one in a different directory. e.g.&lt;/P&gt;&lt;PRE&gt;SPLIT Src INTO Src23 IF date==ToDate('2016-06-23', 'yyyy-MM-dd'), Src24 IF date==ToDate('2016-06-24', 'yyyy-MM-dd'), Src25 IF date==ToDate('2016-06-23', 'yyyy-MM-dd');&lt;/PRE&gt;&lt;PRE&gt;STORE Src23 INTO '/data/Src/2016-06-23' using PigStorage(' ');&lt;/PRE&gt;&lt;P&gt;This way, you could merge the output files in each date directory using -getmerge (and specify the resulting file name), and then copy them back onto HDFS.&lt;/P&gt;&lt;P&gt;Another option is to force a reduce job to occur (yours is map only), and and set PARALLEL 1. It will be a slower job, but you will get one output file. E.g.&lt;/P&gt;&lt;PRE&gt;Ordered23 = ORDER Src23 BY somecolumn PARALLEL 1;&lt;/PRE&gt;&lt;PRE&gt;STORE Ordered23 INTO '/data/Src/2016-06-23' using PigStorage(' ');
&lt;/PRE&gt;&lt;P&gt;You would still have to rename the files outside of this process.&lt;/P&gt;</description>
      <pubDate>Mon, 27 Jun 2016 08:05:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-and-Rename-files-in-HDFS-Pig/m-p/157813#M33031</guid>
      <dc:creator>emilysharpe</dc:creator>
      <dc:date>2016-06-27T08:05:26Z</dc:date>
    </item>
    <item>
      <title>Re: Merge and Rename files in HDFS - Pig?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-and-Rename-files-in-HDFS-Pig/m-p/157814#M33032</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/2985/antonio-scp125.html" nodeid="2985"&gt;@João Souza&lt;/A&gt;, it's not a good idea to base your design on file names in hdfs. You can use file names only in phase 1 of your processing flow (what you are already doing using "-tagFile"), after that just consider your input as a "data set". Using directories, what Emily suggested, is a much better idea, and is often used to partition data for MR jobs and Hive tables.&lt;/P&gt;</description>
      <pubDate>Mon, 27 Jun 2016 08:49:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-and-Rename-files-in-HDFS-Pig/m-p/157814#M33032</guid>
      <dc:creator>pminovic</dc:creator>
      <dc:date>2016-06-27T08:49:16Z</dc:date>
    </item>
    <item>
      <title>Re: Merge and Rename files in HDFS - Pig?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-and-Rename-files-in-HDFS-Pig/m-p/157815#M33033</link>
      <description>&lt;P&gt;Many thanks Emily. One problem I think: my column "date" isn't ideitified as date because it apperars like the filename "2016-06-23.txt". So I think it was created like a String. Can I do the Split in same way?&lt;/P&gt;</description>
      <pubDate>Mon, 27 Jun 2016 20:32:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-and-Rename-files-in-HDFS-Pig/m-p/157815#M33033</guid>
      <dc:creator>prodgers125</dc:creator>
      <dc:date>2016-06-27T20:32:20Z</dc:date>
    </item>
    <item>
      <title>Re: Merge and Rename files in HDFS - Pig?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-and-Rename-files-in-HDFS-Pig/m-p/157816#M33034</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/2985/antonio-scp125.html" nodeid="2985"&gt;@João Souza,&lt;/A&gt; no problem. Yes you should still be able to use split, just with&lt;/P&gt;&lt;PRE&gt;IF (date=='2016-06-23')&lt;/PRE&gt;&lt;P&gt;comparing string type instead of date type.&lt;/P&gt;&lt;P&gt;Hope this helps!&lt;/P&gt;</description>
      <pubDate>Tue, 28 Jun 2016 07:21:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-and-Rename-files-in-HDFS-Pig/m-p/157816#M33034</guid>
      <dc:creator>emilysharpe</dc:creator>
      <dc:date>2016-06-28T07:21:27Z</dc:date>
    </item>
    <item>
      <title>Re: Merge and Rename files in HDFS - Pig?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-and-Rename-files-in-HDFS-Pig/m-p/157817#M33035</link>
      <description>&lt;P&gt;Emily, only one more question. Mu current code is in attach. It execute succesfully however my final data sets it returns empty... Do you know why?&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/5315-pig-statement.txt"&gt;pig-statement.txt&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 28 Jun 2016 23:45:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-and-Rename-files-in-HDFS-Pig/m-p/157817#M33035</guid>
      <dc:creator>prodgers125</dc:creator>
      <dc:date>2016-06-28T23:45:04Z</dc:date>
    </item>
    <item>
      <title>Re: Merge and Rename files in HDFS - Pig?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-and-Rename-files-in-HDFS-Pig/m-p/157818#M33036</link>
      <description>&lt;P&gt;I haven't tested it, but I believe using -tagFile will prepend the file name, which will place it at position 0 instead of 1. I.e. &lt;/P&gt;&lt;PRE&gt;GENERATE
(chararray)$0 AS Filename, 
(chararray)$1 AS ID, etc.&lt;/PRE&gt;&lt;P&gt;Hope this solves it!&lt;/P&gt;</description>
      <pubDate>Thu, 30 Jun 2016 07:21:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-and-Rename-files-in-HDFS-Pig/m-p/157818#M33036</guid>
      <dc:creator>emilysharpe</dc:creator>
      <dc:date>2016-06-30T07:21:59Z</dc:date>
    </item>
  </channel>
</rss>

