<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: store multiple files using pig in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/store-multiple-files-using-pig/m-p/188211#M76035</link>
    <description>&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/26178/sanutopia.html" nodeid="26178"&gt;@Santanu Ghosh&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;You have to use PARALLEL with any operator that starts a reduce phase like GROUP, JOIN, CROSS, DISTINCT etc.&lt;/P&gt;&lt;P&gt;I have mentioned usage of PARALLEL with an example data set&lt;/P&gt;&lt;P&gt;1) Put the data.csv into hdfs&lt;/P&gt;&lt;PRE&gt;[qa@vnode-68 root]$ hdfs dfs -put data.csv /user/qa/&lt;/PRE&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;2) Check the content of the data file&lt;/P&gt;&lt;PRE&gt;[qa@vnode-68 root]$ hdfs dfs -cat /user/qa/data.csv
abhi,34,brown,5
john,35,green,6
amy,30,brown,6
Steve,38,blue,6
Brett,35,brown,6
Andy,34,brown,6
&lt;/PRE&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;3) Run the pig script which will group users by color and dump the output into hdfs&lt;/P&gt;&lt;PRE&gt;[qa@vnode-68 root]$ pig
grunt&amp;gt; data = LOAD '/user/qa/data.csv' using PigStorage(',') as (name:chararray,age:int, color:chararray,height:int);
grunt&amp;gt; b = group data by color parallel 3;
grunt&amp;gt; store b into '/user/qa/new' using PigStorage(',');
grunt&amp;gt; quit&lt;/PRE&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;4) Check the output folder to make sure that 3 files are created&lt;/P&gt;&lt;PRE&gt;[qa@vnode-68 root]$ hdfs dfs -ls /user/qa/new
Found 4 items
-rw-r--r--   3 qa hdfs          0 2018-03-17 16:28 /user/qa/new/_SUCCESS
-rw-r--r--   3 qa hdfs          0 2018-03-17 16:28 /user/qa/new/part-r-00000
-rw-r--r--   3 qa hdfs         80 2018-03-17 16:28 /user/qa/new/part-r-00001
-rw-r--r--   3 qa hdfs         51 2018-03-17 16:28 /user/qa/new/part-r-00002&lt;/PRE&gt;&lt;P&gt;Additional reference : &lt;A href="https://pig.apache.org/docs/r0.15.0/perf.html#parallel" target="_blank"&gt;https://pig.apache.org/docs/r0.15.0/perf.html#parallel&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If this helps you, please click on the Accept button to accept the answer. This will be really useful for other community users.&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;-Aditya&lt;/P&gt;</description>
    <pubDate>Sat, 17 Mar 2018 23:42:31 GMT</pubDate>
    <dc:creator>asirna</dc:creator>
    <dc:date>2018-03-17T23:42:31Z</dc:date>
    <item>
      <title>store multiple files using pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/store-multiple-files-using-pig/m-p/188210#M76034</link>
      <description>&lt;P&gt;Hi Friends, &lt;/P&gt;&lt;P&gt;I was practicing aws tasks for HDPCD exam. There is one question for which I need help. I am describing it briefly. &lt;/P&gt;&lt;P&gt;"From a Pig Script, Store the output as 3 Comma Separated files in HDFS directory" &lt;/P&gt;&lt;P&gt;Now, I used below command for that. &lt;/P&gt;&lt;P&gt;STORE output INTO '&amp;lt;hdfs directory&amp;gt;' USING PigStorage(',') PARALLEL 3; &lt;/P&gt;&lt;P&gt;It was running 3 reducers, but eventually stored only one part-r-00000 file to output hdfs path with all rows. &lt;/P&gt;&lt;P&gt;So, what is the simplest way to store 3 output comma separated files from Pig ? [ without using any additional jar file ] &lt;/P&gt;&lt;P&gt;Thanking you &lt;/P&gt;&lt;P&gt;Santanu&lt;/P&gt;</description>
      <pubDate>Sat, 17 Mar 2018 22:21:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/store-multiple-files-using-pig/m-p/188210#M76034</guid>
      <dc:creator>Santanu</dc:creator>
      <dc:date>2018-03-17T22:21:10Z</dc:date>
    </item>
    <item>
      <title>Re: store multiple files using pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/store-multiple-files-using-pig/m-p/188211#M76035</link>
      <description>&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/26178/sanutopia.html" nodeid="26178"&gt;@Santanu Ghosh&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;You have to use PARALLEL with any operator that starts a reduce phase like GROUP, JOIN, CROSS, DISTINCT etc.&lt;/P&gt;&lt;P&gt;I have mentioned usage of PARALLEL with an example data set&lt;/P&gt;&lt;P&gt;1) Put the data.csv into hdfs&lt;/P&gt;&lt;PRE&gt;[qa@vnode-68 root]$ hdfs dfs -put data.csv /user/qa/&lt;/PRE&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;2) Check the content of the data file&lt;/P&gt;&lt;PRE&gt;[qa@vnode-68 root]$ hdfs dfs -cat /user/qa/data.csv
abhi,34,brown,5
john,35,green,6
amy,30,brown,6
Steve,38,blue,6
Brett,35,brown,6
Andy,34,brown,6
&lt;/PRE&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;3) Run the pig script which will group users by color and dump the output into hdfs&lt;/P&gt;&lt;PRE&gt;[qa@vnode-68 root]$ pig
grunt&amp;gt; data = LOAD '/user/qa/data.csv' using PigStorage(',') as (name:chararray,age:int, color:chararray,height:int);
grunt&amp;gt; b = group data by color parallel 3;
grunt&amp;gt; store b into '/user/qa/new' using PigStorage(',');
grunt&amp;gt; quit&lt;/PRE&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;4) Check the output folder to make sure that 3 files are created&lt;/P&gt;&lt;PRE&gt;[qa@vnode-68 root]$ hdfs dfs -ls /user/qa/new
Found 4 items
-rw-r--r--   3 qa hdfs          0 2018-03-17 16:28 /user/qa/new/_SUCCESS
-rw-r--r--   3 qa hdfs          0 2018-03-17 16:28 /user/qa/new/part-r-00000
-rw-r--r--   3 qa hdfs         80 2018-03-17 16:28 /user/qa/new/part-r-00001
-rw-r--r--   3 qa hdfs         51 2018-03-17 16:28 /user/qa/new/part-r-00002&lt;/PRE&gt;&lt;P&gt;Additional reference : &lt;A href="https://pig.apache.org/docs/r0.15.0/perf.html#parallel" target="_blank"&gt;https://pig.apache.org/docs/r0.15.0/perf.html#parallel&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If this helps you, please click on the Accept button to accept the answer. This will be really useful for other community users.&lt;/P&gt;&lt;P&gt;.&lt;/P&gt;&lt;P&gt;-Aditya&lt;/P&gt;</description>
      <pubDate>Sat, 17 Mar 2018 23:42:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/store-multiple-files-using-pig/m-p/188211#M76035</guid>
      <dc:creator>asirna</dc:creator>
      <dc:date>2018-03-17T23:42:31Z</dc:date>
    </item>
    <item>
      <title>Re: store multiple files using pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/store-multiple-files-using-pig/m-p/188212#M76036</link>
      <description>&lt;P&gt;Thanks &lt;A rel="user" href="https://community.cloudera.com/users/14200/asirna.html" nodeid="14200"&gt;@Aditya Sirna&lt;/A&gt; for your response. It's working. I used this command.&lt;/P&gt;&lt;P&gt;relation_1 = ORDER relation_0 BY &amp;lt;col_2&amp;gt; DESC PARALLEL 3;&lt;/P&gt;&lt;P&gt;Thanking you&lt;/P&gt;&lt;P&gt;Santanu&lt;/P&gt;</description>
      <pubDate>Sun, 18 Mar 2018 12:27:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/store-multiple-files-using-pig/m-p/188212#M76036</guid>
      <dc:creator>Santanu</dc:creator>
      <dc:date>2018-03-18T12:27:02Z</dc:date>
    </item>
  </channel>
</rss>

