<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Using PIG Latin to replace multiple strings from same field in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Using-PIG-Latin-to-replace-multiple-strings-from-same-field/m-p/169378#M131692</link>
    <description>&lt;P&gt;HI &lt;A href="https://community.hortonworks.com/users/11288/gkeys.html"&gt;@Greg Keys&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Could you please provide input on my clarification&lt;/P&gt;</description>
    <pubDate>Tue, 03 Jan 2017 20:19:11 GMT</pubDate>
    <dc:creator>vamsi123</dc:creator>
    <dc:date>2017-01-03T20:19:11Z</dc:date>
    <item>
      <title>Using PIG Latin to replace multiple strings from same field</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Using-PIG-Latin-to-replace-multiple-strings-from-same-field/m-p/169375#M131689</link>
      <description>Hi experts,

I've this line from a .txt which results from a Group Operator:
1;(7287026502032012,18);{(706)};{(101200010)};{(17286)};{(oz)};2.5
&lt;P&gt;Basically I've 7 fields how can I obtain this:

1;7287026502032012,18;706;101200010;17286;oz;2.5

Many thanks!&lt;/P&gt;</description>
      <pubDate>Sun, 25 Sep 2016 21:54:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Using-PIG-Latin-to-replace-multiple-strings-from-same-field/m-p/169375#M131689</guid>
      <dc:creator>Stewart12586</dc:creator>
      <dc:date>2016-09-25T21:54:20Z</dc:date>
    </item>
    <item>
      <title>Re: Using PIG Latin to replace multiple strings from same field</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Using-PIG-Latin-to-replace-multiple-strings-from-same-field/m-p/169376#M131690</link>
      <description>&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;You need to FLATTEN your nested data&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Your grouped data set has (is a bag of) fields, tuples, and bags.  You need to extract the fields from the bags and tuples using the FLATTEN operator.  &lt;/P&gt;&lt;P&gt;Each of you grouped records can be seen as follows:&lt;/P&gt;&lt;PRE&gt;1;					-- field
(7287026502032012,18);			-- tuple
{(706)};				-- bag
{(101200010)};				-- bag
{(17286)};				-- bag
{(oz)};					-- bag
2.5 					-- field&lt;/PRE&gt;&lt;P&gt;Using FLATTEN with the tuple is simple but using it with a bag is more complicated.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;Flattening tuples &lt;/EM&gt; &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;To look at only tuples, let's assume your data looked like this:&lt;/P&gt;&lt;PRE&gt;1;					-- field
(7287026502032012,18);			-- bag&lt;/PRE&gt;&lt;P&gt;Then you would use:&lt;/P&gt;&lt;PRE&gt;data_flattened = FOREACH data GENERATE
   $0,
   FLATTEN $1;&lt;/PRE&gt;&lt;P style="margin-left: 20px;"&gt;which for the data above would produce 1; 7287026502032012; 18&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;Flattening bags&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Flattening bags is more complicated, because it flattens them to tuples but cross joins them with the other data in your GENERATE statement.  From the &lt;A href="https://pig.apache.org/docs/r0.10.0/basic.html#flatten"&gt;Apache Pig&lt;/A&gt; docs&lt;/P&gt;&lt;P&gt;&lt;EM&gt;For bags, the situation becomes more complicated. When we un-nest a bag, we create new tuples. If we have a relation that is made up of tuples of the form ({(b,c),(d,e)}) and we apply GENERATE flatten($0), we end up with two tuples (b,c) and (d,e). When we remove a level of nesting in a bag, sometimes we cause a cross product to happen. For example, consider a relation that has a tuple of the form (a, {(b,c), (d,e)}), commonly produced by the GROUP operator. If we apply the expression GENERATE $0, flatten($1) to this tuple, we will create new tuples: (a, b, c) and (a, d, e).&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Using Pig's builtin function BagToTuple() t&lt;/STRONG&gt;&lt;STRONG&gt;o help you out&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Pig has a builtin function BagToTuple() which as it says converts a bag to a tuple.  By converting your bags to tuples, you can then easily flatten them as above.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;Final code&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Your final code will look like this:&lt;/P&gt;&lt;PRE&gt;data_flattened = FOREACH data GENERATE 
	$0, 
	FLATTEN $1,
	FLATTEN(BagToTuple($2)),
	FLATTEN(BagToTuple($3)),
	FLATTEN(BagToTuple($4)),
	FLATTEN(BagToTuple($5)),
	$6; &lt;/PRE&gt;&lt;P&gt;to produce your desired data.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;Useful links:&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://pig.apache.org/docs/r0.10.0/basic.html#flatten" target="_blank"&gt;https://pig.apache.org/docs/r0.10.0/basic.html#flatten&lt;/A&gt;
&lt;A href="http://chimera.labs.oreilly.com/books/1234000001811/ch06.html#more_on_foreach" target="_blank"&gt;http://chimera.labs.oreilly.com/books/1234000001811/ch06.html#more_on_foreach&lt;/A&gt;
&lt;A href="https://pig.apache.org/docs/r0.11.0/api/org/apache/pig/builtin/BagToTuple.html" target="_blank"&gt;https://pig.apache.org/docs/r0.11.0/api/org/apache/pig/builtin/BagToTuple.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If this answers your question, let me know by accepting the answer.  Else, let me know the gaps or issues that are remaining.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Sep 2016 19:59:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Using-PIG-Latin-to-replace-multiple-strings-from-same-field/m-p/169376#M131690</guid>
      <dc:creator>gkeys</dc:creator>
      <dc:date>2016-09-26T19:59:30Z</dc:date>
    </item>
    <item>
      <title>Re: Using PIG Latin to replace multiple strings from same field</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Using-PIG-Latin-to-replace-multiple-strings-from-same-field/m-p/169377#M131691</link>
      <description>&lt;P&gt;HI &lt;A rel="user" href="https://community.cloudera.com/users/11288/gkeys.html" nodeid="11288"&gt;@Greg Keys&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Happy New year.Could you please provide below two clarifications.&lt;/P&gt;&lt;PRE&gt;clarification 1:-
Let us say my input is:-
1;(7287026502032012,18);{(706),(707)};{(101200010),(101200011)};{(17286),(17287)};{(oz),(oz1)};2.5

The expression for data_flattened is same and in that case whether my understanding is correct?
Is below output is correct?
Output:-
1;7287026502032012,18;706,707;101200010,101200011;17286,17287;oz,oz1;2.5

&lt;/PRE&gt;&lt;PRE&gt;clarification 2:-
Let us say my input is:-
1;(7287026502032012,18);{(706),(707)};{(101200010),(101200011)};{(17286),(17287)};{(oz),(oz1)};2.5

data_flattened_1 = FOREACH data GENERATE 
	$0, 
	FLATTEN ($1),
	FLATTEN($2),
	FLATTEN($3),
	FLATTEN($4),
	FLATTEN($5),
	$6; 
The expression for data_flattened_1 is mentioned above and in that case whether my understanding is correct?
Is below output is correct?
Output:-
1;7287026502032012,18;706;101200010;17286;oz;2.5
1;7287026502032012,18;707;101200011;17287;oz1;2.5
&lt;/PRE&gt;</description>
      <pubDate>Sun, 01 Jan 2017 12:58:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Using-PIG-Latin-to-replace-multiple-strings-from-same-field/m-p/169377#M131691</guid>
      <dc:creator>vamsi123</dc:creator>
      <dc:date>2017-01-01T12:58:59Z</dc:date>
    </item>
    <item>
      <title>Re: Using PIG Latin to replace multiple strings from same field</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Using-PIG-Latin-to-replace-multiple-strings-from-same-field/m-p/169378#M131692</link>
      <description>&lt;P&gt;HI &lt;A href="https://community.hortonworks.com/users/11288/gkeys.html"&gt;@Greg Keys&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Could you please provide input on my clarification&lt;/P&gt;</description>
      <pubDate>Tue, 03 Jan 2017 20:19:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Using-PIG-Latin-to-replace-multiple-strings-from-same-field/m-p/169378#M131692</guid>
      <dc:creator>vamsi123</dc:creator>
      <dc:date>2017-01-03T20:19:11Z</dc:date>
    </item>
  </channel>
</rss>

