<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Store output file as 3 files using pig in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104596#M21470</link>
    <description>&lt;P&gt;here's a full script, piggybank is both in pig-client/lib and in pig-client directory&lt;/P&gt;&lt;PRE&gt;REGISTER /usr/hdp/current/pig-client/piggybank.jar;
A = LOAD 'data2' USING PigStorage() as (url, count);
fs -rm -R output;
STORE A INTO 'output' USING org.apache.pig.piggybank.storage.MultiStorage('output', '0');
&lt;/PRE&gt;&lt;P&gt;my dataset is&lt;/P&gt;&lt;PRE&gt;1
2
3
4
5&lt;/PRE&gt;&lt;P&gt;output would be&lt;/P&gt;&lt;PRE&gt;-rw-r--r--   3 root hdfs          3 2016-03-18 01:51 /user/root/output/1/1-0,000
Found 1 items
-rw-r--r--   3 root hdfs          3 2016-03-18 01:51 /user/root/output/2/2-0,000
Found 1 items
-rw-r--r--   3 root hdfs          3 2016-03-18 01:51 /user/root/output/3/3-0,000
Found 1 items
-rw-r--r--   3 root hdfs          3 2016-03-18 01:51 /user/root/output/4/4-0,000
Found 1 items
-rw-r--r--   3 root hdfs          3 2016-03-18 01:51 /user/root/output/5/5-0,000
-rw-r--r--   3 root hdfs          0 2016-03-18 01:51 /user/root/output/_SUCCESS

&lt;/PRE&gt;&lt;P&gt;and each file would contain one line&lt;/P&gt;&lt;PRE&gt;[root@sandbox ~]# hdfs dfs -cat /user/root/output/5/5-0,000
5
&lt;/PRE&gt;&lt;P&gt;in case of &lt;A rel="user" href="https://community.cloudera.com/users/164/rich.html" nodeid="164"&gt;@Rich Raposa&lt;/A&gt; example&lt;/P&gt;&lt;P&gt;the output directory would look like so:&lt;/P&gt;&lt;PRE&gt;[root@sandbox ~]# hdfs dfs -ls output3
Found 6 items
-rw-r--r--   3 root hdfs          0 2016-03-18 01:59 output3/_SUCCESS
-rw-r--r--   3 root hdfs          3 2016-03-18 01:59 output3/part-v003-o000-r-00000
-rw-r--r--   3 root hdfs          3 2016-03-18 01:59 output3/part-v003-o000-r-00001
-rw-r--r--   3 root hdfs          3 2016-03-18 01:59 output3/part-v003-o000-r-00002
-rw-r--r--   3 root hdfs          3 2016-03-18 01:59 output3/part-v003-o000-r-00003
-rw-r--r--   3 root hdfs          3 2016-03-18 01:59 output3/part-v003-o000-r-00004

&lt;/PRE&gt;&lt;P&gt;which means with PARALLEL it creates multiple files within the same directory. In terms of MultiStorage, it created a separate directory and separate file. Additionally with MultiStorage you can pass compression, granted it's bz2, gz, no snappy and delimiter. It's clunky and documentation is not the best but if you need that type of control, it's an option.&lt;/P&gt;</description>
    <pubDate>Fri, 18 Mar 2016 09:06:46 GMT</pubDate>
    <dc:creator>aervits</dc:creator>
    <dc:date>2016-03-18T09:06:46Z</dc:date>
    <item>
      <title>Store output file as 3 files using pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104589#M21463</link>
      <description>&lt;P&gt;Hello Friends,&lt;/P&gt;&lt;P&gt;Could any one please let me know how I can store the final output from the pig script as 3 files irrespective of source file/block size?&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Satish.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2016 23:25:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104589#M21463</guid>
      <dc:creator>SatishS</dc:creator>
      <dc:date>2016-03-01T23:25:31Z</dc:date>
    </item>
    <item>
      <title>Re: Store output file as 3 files using pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104590#M21464</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1373/satish-sarapuri.html" nodeid="1373"&gt;@Satish S&lt;/A&gt; great question, I just learned something new, you can use MultiStorage() as store function. Refer to for &lt;A href="http://stackoverflow.com/questions/9314449/how-to-store-grouped-records-into-multiple-files-with-pig" target="_blank"&gt;http://stackoverflow.com/questions/9314449/how-to-store-grouped-records-into-multiple-files-with-pig&lt;/A&gt; for example and javadoc for explanation of all parameters passed to the function &lt;A href="https://pig.apache.org/docs/r0.15.0/api/index.html?org/apache/pig/piggybank/storage/MultiStorage.html" target="_blank"&gt;https://pig.apache.org/docs/r0.15.0/api/index.html?org/apache/pig/piggybank/storage/MultiStorage.html&lt;/A&gt; and of course someone wrote a blog about it &lt;A href="http://margus.roo.ee/2014/12/18/apache-pig-how-to-save-output-into-different-places/" target="_blank"&gt;http://margus.roo.ee/2014/12/18/apache-pig-how-to-save-output-into-different-places/&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2016 23:40:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104590#M21464</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-03-01T23:40:28Z</dc:date>
    </item>
    <item>
      <title>Re: Store output file as 3 files using pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104591#M21465</link>
      <description>&lt;P&gt;Hi Artem,  thanks for the info.&lt;/P&gt;&lt;P&gt;I am trying use this way, but I am getting some java error.&lt;/P&gt;&lt;P&gt;STORE AFO INTO '/user/hortontest/final_3' USING org.apache.pig.piggybank.storage.MultiStorage('/user/horton/test/final_3','0','none',',');&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Error:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Could not resolve org.apache.pig.piggybank.storage.MultiStorage using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2016 00:28:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104591#M21465</guid>
      <dc:creator>SatishS</dc:creator>
      <dc:date>2016-03-02T00:28:10Z</dc:date>
    </item>
    <item>
      <title>Re: Store output file as 3 files using pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104592#M21466</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1373/satish-sarapuri.html" nodeid="1373"&gt;@Satish S&lt;/A&gt; you need to register piggybank jar. Please read the following &lt;A href="https://community.hortonworks.com/questions/8519/register-udf-in-pig.html" target="_blank"&gt;https://community.hortonworks.com/questions/8519/register-udf-in-pig.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 02 Mar 2016 00:42:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104592#M21466</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-03-02T00:42:31Z</dc:date>
    </item>
    <item>
      <title>Re: Store output file as 3 files using pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104593#M21467</link>
      <description>&lt;P&gt;first statement in your script should be&lt;/P&gt;&lt;PRE&gt;register  /usr/hdp/current/pig-client/lib/piggybank.jar;&lt;/PRE&gt;</description>
      <pubDate>Wed, 02 Mar 2016 00:44:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104593#M21467</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-03-02T00:44:15Z</dc:date>
    </item>
    <item>
      <title>Re: Store output file as 3 files using pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104594#M21468</link>
      <description>&lt;P&gt;Did you register the jar? Please confirm and I'll test it.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Mar 2016 19:54:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104594#M21468</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-03-11T19:54:20Z</dc:date>
    </item>
    <item>
      <title>Re: Store output file as 3 files using pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104595#M21469</link>
      <description>&lt;P&gt;In case someone is searching for this in regards to the Hortonworks Certified Developer exam, the question was asked here also:&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="https://community.cloudera.com/"&gt;https://community.hortonworks.com/questions/22439/where-do-i-get-references-for-piggybank.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Many of the Pig operators have a PARALLEL option for specifying the number of reducers, which also determines the number of output files. For the intent of the certification exam, using PARALLEL is all you need to accomplish this task, plus it is much simpler than trying to register the piggybank and use a special output class.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Mar 2016 20:13:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104595#M21469</guid>
      <dc:creator>rich1</dc:creator>
      <dc:date>2016-03-11T20:13:09Z</dc:date>
    </item>
    <item>
      <title>Re: Store output file as 3 files using pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104596#M21470</link>
      <description>&lt;P&gt;here's a full script, piggybank is both in pig-client/lib and in pig-client directory&lt;/P&gt;&lt;PRE&gt;REGISTER /usr/hdp/current/pig-client/piggybank.jar;
A = LOAD 'data2' USING PigStorage() as (url, count);
fs -rm -R output;
STORE A INTO 'output' USING org.apache.pig.piggybank.storage.MultiStorage('output', '0');
&lt;/PRE&gt;&lt;P&gt;my dataset is&lt;/P&gt;&lt;PRE&gt;1
2
3
4
5&lt;/PRE&gt;&lt;P&gt;output would be&lt;/P&gt;&lt;PRE&gt;-rw-r--r--   3 root hdfs          3 2016-03-18 01:51 /user/root/output/1/1-0,000
Found 1 items
-rw-r--r--   3 root hdfs          3 2016-03-18 01:51 /user/root/output/2/2-0,000
Found 1 items
-rw-r--r--   3 root hdfs          3 2016-03-18 01:51 /user/root/output/3/3-0,000
Found 1 items
-rw-r--r--   3 root hdfs          3 2016-03-18 01:51 /user/root/output/4/4-0,000
Found 1 items
-rw-r--r--   3 root hdfs          3 2016-03-18 01:51 /user/root/output/5/5-0,000
-rw-r--r--   3 root hdfs          0 2016-03-18 01:51 /user/root/output/_SUCCESS

&lt;/PRE&gt;&lt;P&gt;and each file would contain one line&lt;/P&gt;&lt;PRE&gt;[root@sandbox ~]# hdfs dfs -cat /user/root/output/5/5-0,000
5
&lt;/PRE&gt;&lt;P&gt;in case of &lt;A rel="user" href="https://community.cloudera.com/users/164/rich.html" nodeid="164"&gt;@Rich Raposa&lt;/A&gt; example&lt;/P&gt;&lt;P&gt;the output directory would look like so:&lt;/P&gt;&lt;PRE&gt;[root@sandbox ~]# hdfs dfs -ls output3
Found 6 items
-rw-r--r--   3 root hdfs          0 2016-03-18 01:59 output3/_SUCCESS
-rw-r--r--   3 root hdfs          3 2016-03-18 01:59 output3/part-v003-o000-r-00000
-rw-r--r--   3 root hdfs          3 2016-03-18 01:59 output3/part-v003-o000-r-00001
-rw-r--r--   3 root hdfs          3 2016-03-18 01:59 output3/part-v003-o000-r-00002
-rw-r--r--   3 root hdfs          3 2016-03-18 01:59 output3/part-v003-o000-r-00003
-rw-r--r--   3 root hdfs          3 2016-03-18 01:59 output3/part-v003-o000-r-00004

&lt;/PRE&gt;&lt;P&gt;which means with PARALLEL it creates multiple files within the same directory. In terms of MultiStorage, it created a separate directory and separate file. Additionally with MultiStorage you can pass compression, granted it's bz2, gz, no snappy and delimiter. It's clunky and documentation is not the best but if you need that type of control, it's an option.&lt;/P&gt;</description>
      <pubDate>Fri, 18 Mar 2016 09:06:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104596#M21470</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-03-18T09:06:46Z</dc:date>
    </item>
    <item>
      <title>Re: Store output file as 3 files using pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104597#M21471</link>
      <description>&lt;P&gt;There is some good information in this thread, but I worry that the discussion about the MultiStorage class in the piggybank is going to seem like it's needed on the HDPCD exam. The MultiStorage class is not a part of the exam objectives. For the exam, you need to know how to use the PARALLEL operator, which if used at the right time in a Pig script can determine the number of output files.&lt;/P&gt;&lt;P&gt;So to summarize: the HDPCD exam does not require the use of MutliStorage, but may require the use of PARALLEL.&lt;/P&gt;</description>
      <pubDate>Fri, 18 Mar 2016 11:05:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Store-output-file-as-3-files-using-pig/m-p/104597#M21471</guid>
      <dc:creator>rich1</dc:creator>
      <dc:date>2016-03-18T11:05:04Z</dc:date>
    </item>
  </channel>
</rss>

