<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Where do I get references for PiggyBank. in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Where-do-I-get-references-for-PiggyBank/m-p/122148#M84904</link>
    <description>&lt;P&gt;Had that problem before. I didn't find any great websites around it. However the source code of the piggybank functions contains some really good documentation in the javadocs.&lt;/P&gt;&lt;P&gt;&lt;A href="https://pig.apache.org/docs/r0.8.1/api/org/apache/pig/piggybank/storage/MultiStorage.html" target="_blank"&gt;https://pig.apache.org/docs/r0.8.1/api/org/apache/pig/piggybank/storage/MultiStorage.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;or directly the source code, many of the functions are pretty straight forward to understand from code:&lt;/P&gt;&lt;P&gt;&lt;A href="http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/MultiStorage.java?view=co" target="_blank"&gt;http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/MultiStorage.java?view=co&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I didn't find anything better, doesn't mean that it doesn't exist.&lt;/P&gt;</description>
    <pubDate>Fri, 11 Mar 2016 17:21:54 GMT</pubDate>
    <dc:creator>bleonhardi</dc:creator>
    <dc:date>2016-03-11T17:21:54Z</dc:date>
    <item>
      <title>Where do I get references for PiggyBank.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Where-do-I-get-references-for-PiggyBank/m-p/122147#M84903</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I was answering the practice exam and encountered a question where I had to store a PIG result in multiple files. This was a surprise question as I could not find any reference for this kind of storage in PIG documentation. Initially, I thought of using SPLIT which was very close to multiple file storage. But, when I googled about it I encountered a function 'MultiStorage()' which would serve the purpose.&lt;/P&gt;&lt;P&gt;So, where can I find such methods/functions and their usages? Can you please help me with the piggybank references that are documented so far?&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Saurabh&lt;/P&gt;</description>
      <pubDate>Fri, 11 Mar 2016 13:58:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Where-do-I-get-references-for-PiggyBank/m-p/122147#M84903</guid>
      <dc:creator>get2noesks</dc:creator>
      <dc:date>2016-03-11T13:58:10Z</dc:date>
    </item>
    <item>
      <title>Re: Where do I get references for PiggyBank.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Where-do-I-get-references-for-PiggyBank/m-p/122148#M84904</link>
      <description>&lt;P&gt;Had that problem before. I didn't find any great websites around it. However the source code of the piggybank functions contains some really good documentation in the javadocs.&lt;/P&gt;&lt;P&gt;&lt;A href="https://pig.apache.org/docs/r0.8.1/api/org/apache/pig/piggybank/storage/MultiStorage.html" target="_blank"&gt;https://pig.apache.org/docs/r0.8.1/api/org/apache/pig/piggybank/storage/MultiStorage.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;or directly the source code, many of the functions are pretty straight forward to understand from code:&lt;/P&gt;&lt;P&gt;&lt;A href="http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/MultiStorage.java?view=co" target="_blank"&gt;http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/MultiStorage.java?view=co&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I didn't find anything better, doesn't mean that it doesn't exist.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Mar 2016 17:21:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Where-do-I-get-references-for-PiggyBank/m-p/122148#M84904</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-03-11T17:21:54Z</dc:date>
    </item>
    <item>
      <title>Re: Where do I get references for PiggyBank.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Where-do-I-get-references-for-PiggyBank/m-p/122149#M84905</link>
      <description>&lt;P&gt;Similar question has been asked before &lt;A href="https://community.hortonworks.com/questions/20487/store-output-file-as-3-files-using-pig.html" target="_blank"&gt;https://community.hortonworks.com/questions/20487/store-output-file-as-3-files-using-pig.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I am going to repeat my findings over &lt;A rel="user" href="https://community.cloudera.com/users/164/rich.html" nodeid="164"&gt;@Rich Raposa&lt;/A&gt; answer. It's only relevant if it's not for the purposes of the exam. This question was bothering me and I needed to try it out.&lt;/P&gt;&lt;P&gt;here's a full script, piggybank is both in pig-client/lib and in pig-client directory&lt;/P&gt;&lt;PRE&gt;REGISTER /usr/hdp/current/pig-client/piggybank.jar;
A = LOAD 'data2' USING PigStorage() as (url, count);
fs -rm -R output;
STORE A INTO 'output' USING org.apache.pig.piggybank.storage.MultiStorage('output', '0');&lt;/PRE&gt;&lt;P&gt;dataset is&lt;/P&gt;&lt;PRE&gt;1
2
3
4
5&lt;/PRE&gt;&lt;P&gt;output is &lt;/P&gt;&lt;PRE&gt;-rw-r--r--   3 root hdfs          3 2016-03-18 01:51 /user/root/output/1/1-0,000
Found 1 items
-rw-r--r--   3 root hdfs          3 2016-03-18 01:51 /user/root/output/2/2-0,000
Found 1 items
-rw-r--r--   3 root hdfs          3 2016-03-18 01:51 /user/root/output/3/3-0,000
Found 1 items
-rw-r--r--   3 root hdfs          3 2016-03-18 01:51 /user/root/output/4/4-0,000
Found 1 items
-rw-r--r--   3 root hdfs          3 2016-03-18 01:51 /user/root/output/5/5-0,000
-rw-r--r--   3 root hdfs          0 2016-03-18 01:51 /user/root/output/_SUCCESS


&lt;/PRE&gt;&lt;P&gt;each file has one line&lt;/P&gt;&lt;PRE&gt;[root@sandbox ~]# hdfs dfs -cat /user/root/output/5/5-0,000
5&lt;/PRE&gt;&lt;P&gt;in case of &lt;A href="https://community.hortonworks.com/users/164/rich.html"&gt;@Rich Raposa&lt;/A&gt; example&lt;/P&gt;&lt;P&gt;the output directory would look like so:&lt;/P&gt;&lt;PRE&gt;[root@sandbox ~]# hdfs dfs -ls output3
Found 6 items
-rw-r--r--   3 root hdfs          0 2016-03-18 01:59 output3/_SUCCESS
-rw-r--r--   3 root hdfs          3 2016-03-18 01:59 output3/part-v003-o000-r-00000
-rw-r--r--   3 root hdfs          3 2016-03-18 01:59 output3/part-v003-o000-r-00001
-rw-r--r--   3 root hdfs          3 2016-03-18 01:59 output3/part-v003-o000-r-00002
-rw-r--r--   3 root hdfs          3 2016-03-18 01:59 output3/part-v003-o000-r-00003
-rw-r--r--   3 root hdfs          3 2016-03-18 01:59 output3/part-v003-o000-r-00004&lt;/PRE&gt;&lt;P&gt;which means with PARALLEL it creates multiple files within the same directory. In terms of MultiStorage, it created a separate directory and separate file. Additionally with MultiStorage you can pass compression, granted it's bz2, gz, no snappy and delimiter. It's clunky and documentation is not the best but if you need that type of control, it's an option.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Mar 2016 19:52:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Where-do-I-get-references-for-PiggyBank/m-p/122149#M84905</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-03-11T19:52:27Z</dc:date>
    </item>
    <item>
      <title>Re: Where do I get references for PiggyBank.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Where-do-I-get-references-for-PiggyBank/m-p/122150#M84906</link>
      <description>&lt;P&gt;Many of the Pig operators have a PARALLEL option for specifying the number of reducers, which also determines the number of output files. For the intent of the practice exam and the real exam, using PARALLEL is all you need to accomplish this task.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Mar 2016 20:09:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Where-do-I-get-references-for-PiggyBank/m-p/122150#M84906</guid>
      <dc:creator>rich1</dc:creator>
      <dc:date>2016-03-11T20:09:38Z</dc:date>
    </item>
    <item>
      <title>Re: Where do I get references for PiggyBank.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Where-do-I-get-references-for-PiggyBank/m-p/122151#M84907</link>
      <description>&lt;P&gt;@Rich Raposa,&lt;/P&gt;&lt;P&gt;Not sure on how to use PARALLEL on STORE command. I see PARALLEL option for GROUP, COGROUP, CROSS, DISTINCT, etc., I did not find it for STORE. Could you please me with an example? My exam is on this Monday and an example would be of great help at this point of time.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Saurabh&lt;/P&gt;</description>
      <pubDate>Sat, 12 Mar 2016 02:49:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Where-do-I-get-references-for-PiggyBank/m-p/122151#M84907</guid>
      <dc:creator>get2noesks</dc:creator>
      <dc:date>2016-03-12T02:49:40Z</dc:date>
    </item>
    <item>
      <title>Re: Where do I get references for PiggyBank.</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Where-do-I-get-references-for-PiggyBank/m-p/122152#M84908</link>
      <description>&lt;P&gt;Sure - the following simple script uses 3 reducers on the last operation, so there will be 3 output files:&lt;/P&gt;&lt;PRE&gt;a = load 'something';
b = order a by $1 parallel 3;
store b into 'somewhere';&lt;/PRE&gt;&lt;P&gt;PARALLEL is not an option on STORE, but it is an option on a lot of other Pig operations.&lt;/P&gt;</description>
      <pubDate>Sat, 12 Mar 2016 03:56:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Where-do-I-get-references-for-PiggyBank/m-p/122152#M84908</guid>
      <dc:creator>rich1</dc:creator>
      <dc:date>2016-03-12T03:56:10Z</dc:date>
    </item>
  </channel>
</rss>

