<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Pig script/command to filter multiple files on particular STRING in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Pig-script-command-to-filter-multiple-files-on-particular/m-p/130945#M93629</link>
    <description>&lt;P&gt;Yes It worked&lt;/P&gt;&lt;P&gt;Thank you very much.&lt;/P&gt;</description>
    <pubDate>Fri, 02 Sep 2016 14:16:49 GMT</pubDate>
    <dc:creator>mohan221213</dc:creator>
    <dc:date>2016-09-02T14:16:49Z</dc:date>
    <item>
      <title>Pig script/command to filter multiple files on particular STRING</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pig-script-command-to-filter-multiple-files-on-particular/m-p/130939#M93623</link>
      <description>&lt;P&gt;I am trying to write Hadoop Pig script which will take 2 files and filter based on string i.e&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;words.txt&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;google 
facebook 
twitter 
linkedin&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;tweets.json&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;{"created_time":"18:47:31 ","text":"RT @Joey7Barton: ..give a facebook about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...","user_id":450990391,"id":252479809098223616,"created_date":"Sun Sep 30 2012"}&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;SCRIPT without using words.txt file&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;twitter  = LOAD 'Twitter.json' USING JsonLoader('created_time:chararray, text:chararray, user_id:chararray, id:chararray, created_date:chararray');
    filtered = FILTER twitter BY (text MATCHES '.*facebook.*');
    extracted = FOREACH filtered GENERATE 'facebook' AS pattern,id, user_id, created_time, created_date, text;final= GROUP extracted BY pattern;dumpfinal;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;OUTPUT&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;(facebook,{(facebook,252545104890449921,291041644,23:06:59,SunSep302012,RT @Joey7Barton:..give a facebook about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters.#fami ...)})&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;the output that im getting is, &lt;STRONG&gt;without loading the words.txt file&lt;/STRONG&gt; i.e by filtering the tweet directly.&lt;/P&gt;&lt;P&gt;I need to get the output as&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;(facebook)(complete tweet of that facebook word contained)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;i.e it should read the words.txt and as words are reading according to that it should get all the tweets from tweets.json file&lt;/P&gt;&lt;P&gt;Any help&lt;/P&gt;&lt;P&gt;Mohan.V&lt;/P&gt;</description>
      <pubDate>Thu, 01 Sep 2016 15:24:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pig-script-command-to-filter-multiple-files-on-particular/m-p/130939#M93623</guid>
      <dc:creator>mohan221213</dc:creator>
      <dc:date>2016-09-01T15:24:25Z</dc:date>
    </item>
    <item>
      <title>Re: Pig script/command to filter multiple files on particular STRING</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pig-script-command-to-filter-multiple-files-on-particular/m-p/130940#M93624</link>
      <description>&lt;P&gt;Okay this may not be optimal but it should work: upload words.txt to a certain directory on hdfs and do this&lt;/P&gt;&lt;PRE&gt;twitter  = LOAD 'Twitter.json' .... -- Like in your post
words = LOAD '/user/john/words' as word:chararray;
c = CROSS words, twitter;
res = FILTER c BY (twitter::text MATCHES CONCAT(CONCAT('.*',words::word),'.*'));
&lt;/PRE&gt;&lt;P&gt;And finally dump or store "res" somewhere.&lt;/P&gt;</description>
      <pubDate>Thu, 01 Sep 2016 19:31:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pig-script-command-to-filter-multiple-files-on-particular/m-p/130940#M93624</guid>
      <dc:creator>pminovic</dc:creator>
      <dc:date>2016-09-01T19:31:45Z</dc:date>
    </item>
    <item>
      <title>Re: Pig script/command to filter multiple files on particular STRING</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pig-script-command-to-filter-multiple-files-on-particular/m-p/130941#M93625</link>
      <description>&lt;P&gt;Hi Predrag Minovic&lt;/P&gt;&lt;P&gt;thanks for your replay.&lt;/P&gt;&lt;P&gt;i have tried the above but got an error.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Invalid field projection. Projected field [twitter::text] does not exist&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;I know it is a small thing but I am very new to PIG.&lt;/P&gt;&lt;P&gt;So please suggest to solve this error.&lt;/P&gt;</description>
      <pubDate>Fri, 02 Sep 2016 12:18:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pig-script-command-to-filter-multiple-files-on-particular/m-p/130941#M93625</guid>
      <dc:creator>mohan221213</dc:creator>
      <dc:date>2016-09-02T12:18:48Z</dc:date>
    </item>
    <item>
      <title>Re: Pig script/command to filter multiple files on particular STRING</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pig-script-command-to-filter-multiple-files-on-particular/m-p/130942#M93626</link>
      <description>&lt;P&gt;Can you insert "describe twitter; describe c;" after the CROSS statement, and find the output. If you loaded "twitter" like in your post, twitter::text should be there...&lt;/P&gt;</description>
      <pubDate>Fri, 02 Sep 2016 12:27:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pig-script-command-to-filter-multiple-files-on-particular/m-p/130942#M93626</guid>
      <dc:creator>pminovic</dc:creator>
      <dc:date>2016-09-02T12:27:45Z</dc:date>
    </item>
    <item>
      <title>Re: Pig script/command to filter multiple files on particular STRING</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pig-script-command-to-filter-multiple-files-on-particular/m-p/130943#M93627</link>
      <description>&lt;P&gt;hey Predrag Minovic.&lt;/P&gt;&lt;P&gt;Its my mistake and i corrected it.&lt;/P&gt;&lt;P&gt;It worked.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;OutPut&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;(buddy,05:29:31 ,RT @ns0lar1: "Yo, buddy my dad knows I smoke lighters." @wadegreen35 #Brotherhood,635838152,252278984384077825,Sun Sep 30 2012) &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;(facebook,16:10:24 ,RT @KeydetInFocus: Nobody would know that Mitt Romney is at VMI today......... holy tweets and facebook statuses,286719616,255339370154975232,Mon Oct 08 2012) &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;(facebook,21:25:39 ,RT @cineworld: facebook Tough on Cineworld, tough on the causes of Cineworld. RT&amp;amp;Vote for me to win The Campaign merch &lt;A href="http://t.co/qItR8e2C"&gt;http://t.co/qItR8e2C&lt;/A&gt; O ...,328175259,252519600917458944,Sun Sep 30 2012) &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;(google,11:33:16 ,@MaarionYmcmb google mere ta dit tu va resté chez toi dnc tu restes !,845912316,252370526411051008,Sun Sep 30 2012) &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;(google,23:06:59 ,RT @Nevada: Obama Arrives in google for Debate Preparation. &lt;A href="http://t.co/jJxh0bF6,291041644,252545104890449921,Sun"&gt;http://t.co/qItR8e2C&lt;/A&gt; Sep 30 2012) &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;but here i would like to get the output as&lt;/P&gt;&lt;P&gt;example:-&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;(facebook,{(16:10:24 ,RT @KeydetInFocus: Nobody would know that Mitt Romney is at VMI today......... holy tweets and facebook statuses,286719616,255339370154975232,Mon Oct 08 2012),&lt;/STRONG&gt;&lt;STRONG&gt;(21:25:39 ,RT @cineworld: facebook Tough on Cineworld, tough on the causes of Cineworld. RT&amp;amp;Vote for me to win The Campaign merch &lt;A href="http://t.co/qItR8e2C"&gt;http://t.co/qItR8e2C&lt;/A&gt; O ...,328175259,252519600917458944,Sun Sep 30 2012)}&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;(google,{(11:33:16 ,@MaarionYmcmb google mere ta dit tu va resté chez toi dnc tu restes !,845912316,252370526411051008,Sun Sep 30 2012),&lt;/STRONG&gt;&lt;STRONG&gt;(23:06:59 ,RT @Nevada: Obama Arrives in google for Debate Preparation. &lt;A href="http://t.co/jJxh0bF6,291041644,252545104890449921,Sun"&gt;http://t.co/qItR8e2C&lt;/A&gt; Sep 30 2012)}&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;i.e I need to bag the all tweets of facebook together likewise.&lt;/P&gt;&lt;P&gt;How can i get this.&lt;/P&gt;&lt;P&gt;Please suggest me Predrag Minovic.&lt;/P&gt;&lt;P&gt;Thanks in Advance.&lt;/P&gt;&lt;P&gt;Mohan.V&lt;/P&gt;</description>
      <pubDate>Fri, 02 Sep 2016 13:09:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pig-script-command-to-filter-multiple-files-on-particular/m-p/130943#M93627</guid>
      <dc:creator>mohan221213</dc:creator>
      <dc:date>2016-09-02T13:09:09Z</dc:date>
    </item>
    <item>
      <title>Re: Pig script/command to filter multiple files on particular STRING</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pig-script-command-to-filter-multiple-files-on-particular/m-p/130944#M93628</link>
      <description>&lt;P&gt;Okay, after "res" insert this:&lt;/P&gt;&lt;PRE&gt;res1 = foreach (group res BY word) {
     tweets = foreach res generate id, user_id, created_time, created_date, text;
     generate group as pattern, tweets;
}&lt;/PRE&gt;&lt;P&gt;The inner foreach is to get rid of the "word" associated which each output recored. Try "res2 = group res BY word" to see the difference. And please accept &amp;amp; up-vote the answer.&lt;/P&gt;</description>
      <pubDate>Fri, 02 Sep 2016 13:39:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pig-script-command-to-filter-multiple-files-on-particular/m-p/130944#M93628</guid>
      <dc:creator>pminovic</dc:creator>
      <dc:date>2016-09-02T13:39:26Z</dc:date>
    </item>
    <item>
      <title>Re: Pig script/command to filter multiple files on particular STRING</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pig-script-command-to-filter-multiple-files-on-particular/m-p/130945#M93629</link>
      <description>&lt;P&gt;Yes It worked&lt;/P&gt;&lt;P&gt;Thank you very much.&lt;/P&gt;</description>
      <pubDate>Fri, 02 Sep 2016 14:16:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pig-script-command-to-filter-multiple-files-on-particular/m-p/130945#M93629</guid>
      <dc:creator>mohan221213</dc:creator>
      <dc:date>2016-09-02T14:16:49Z</dc:date>
    </item>
  </channel>
</rss>

