<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Pig: OTHERWISE keyword in SPLIT not working.,Pig &amp;quot;Otherwise&amp;quot; does not work on Centos 7 Cluster. in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-OTHERWISE-keyword-in-SPLIT-not-working-Pig-quot/m-p/104385#M46335</link>
    <description>&lt;P&gt;Good to hear.  Would love to know why it is working (especially since my stripped down version replicated the issue).&lt;/P&gt;</description>
    <pubDate>Thu, 17 Nov 2016 23:01:57 GMT</pubDate>
    <dc:creator>gkeys</dc:creator>
    <dc:date>2016-11-17T23:01:57Z</dc:date>
    <item>
      <title>Pig: OTHERWISE keyword in SPLIT not working.,Pig "Otherwise" does not work on Centos 7 Cluster.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-OTHERWISE-keyword-in-SPLIT-not-working-Pig-quot/m-p/104380#M46330</link>
      <description>&lt;PRE&gt;I have a pig script where I want to split an alias depending on some conditions. The script is:
&lt;/PRE&gt;&lt;PRE&gt;SPLIT i1 INTO
   top IF((user_followers_count &amp;gt; i3.avg_user_followers_count) AND (avl_user_engagements &amp;gt; i3.avg_avl_user_engagements)),
   bot IF((user_followers_count &amp;lt; i3.avg_user_followers_count) AND (avl_user_engagements &amp;lt; i3.avg_avl_user_engagements)),
   med OTHERWISE;&lt;/PRE&gt;&lt;P&gt;But looking at the output, I think the code is not working as expected.&lt;/P&gt;&lt;PRE&gt;Input(s):
Successfully read 165 records (362590 bytes) from: "/tmp/user"

Output(s):
Successfully stored 0 records (815 bytes) in: "/tmp/inf_med"
Successfully stored 1 records (163 bytes) in: "/tmp/inf_top"
Successfully stored 113 records (18068 bytes) in: "/tmp/inf_bot"&lt;/PRE&gt;&lt;P&gt;There are about 50 records that have to go into the med category.&lt;/P&gt;&lt;P&gt;My pig version is: 0.15.0.2.4.2.0-258&lt;/P&gt;&lt;P&gt;HDP: 2.4.2.0-258&lt;/P&gt;</description>
      <pubDate>Wed, 16 Nov 2016 01:13:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-OTHERWISE-keyword-in-SPLIT-not-working-Pig-quot/m-p/104380#M46330</guid>
      <dc:creator>sree_kuppa</dc:creator>
      <dc:date>2016-11-16T01:13:12Z</dc:date>
    </item>
    <item>
      <title>Re: Pig: OTHERWISE keyword in SPLIT not working.,Pig "Otherwise" does not work on Centos 7 Cluster.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-OTHERWISE-keyword-in-SPLIT-not-working-Pig-quot/m-p/104381#M46331</link>
      <description>&lt;P&gt;Otherwise was introduced in pig 0.10 and is a very solid feature.  There should not be an issue with it.&lt;/P&gt;&lt;P&gt;Could you:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;provide full script&lt;/LI&gt;&lt;LI&gt;provide sample data set&lt;/LI&gt;&lt;LI&gt;verify that i1 has 165 records&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;You can add to these comments&lt;/P&gt;</description>
      <pubDate>Wed, 16 Nov 2016 01:41:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-OTHERWISE-keyword-in-SPLIT-not-working-Pig-quot/m-p/104381#M46331</guid>
      <dc:creator>gkeys</dc:creator>
      <dc:date>2016-11-16T01:41:11Z</dc:date>
    </item>
    <item>
      <title>Re: Pig: OTHERWISE keyword in SPLIT not working.,Pig "Otherwise" does not work on Centos 7 Cluster.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-OTHERWISE-keyword-in-SPLIT-not-working-Pig-quot/m-p/104382#M46332</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/11288/gkeys.html" nodeid="11288"&gt;@Greg Keys&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/11288/gkeys.html" nodeid="11288"&gt;&lt;/A&gt;Thanks for your time.&lt;/P&gt;&lt;P&gt;1. My full script:&lt;/P&gt;&lt;PRE&gt;register /usr/hdp/current/pig-client/lib/piggybank.jar
register /opt/elephantbird-jars/elephant-bird-core-4.5.jar
register /opt/elephantbird-jars/elephant-bird-hadoop-compat-4.5.jar
register /opt/elephantbird-jars/elephant-bird-pig-4.5.jar
register /opt/elephantbird-jars/json-simple-1.1.1.jar

data_input_2 = LOAD '/tmp/user' USING com.twitter.elephantbird.pig.load.JsonLoader() AS (json:map []); i1 = FOREACH data_input_2 GENERATE json#'user_id' as user_id, (int)json#'user_followers_count' as user_followers_count, (int)json#'avl_user_total_retweets' as avl_user_total_retweets, (int)json#'avl_user_total_likes' as avl_user_total_likes, (int)json#'avl_user_total_replies' as avl_user_total_replies, (int)json#'avl_user_engagements' as avl_user_engagements;

i2 = GROUP i1 ALL;
i3 = FOREACH i2 GENERATE AVG(i1.user_followers_count) AS avg_user_followers_count,  AVG(i1.avl_user_total_retweets) AS avg_avl_user_total_retweets, AVG(i1.avl_user_total_likes) AS avg_avl_user_total_likes, AVG(i1.avl_user_total_replies) AS avg_avl_user_total_replies, AVG(i1.avl_user_engagements) AS avg_avl_user_engagements;

SPLIT i1 INTO
   top IF((user_followers_count &amp;gt; i3.avg_user_followers_count) AND (avl_user_engagements &amp;gt; i3.avg_avl_user_engagements) AND (avl_user_total_retweets &amp;gt; i3.avg_avl_user_total_retweets) AND (avl_user_total_likes &amp;gt; i3.avg_avl_user_total_likes) AND (avl_user_total_replies &amp;gt; i3.avg_avl_user_total_replies)),
   bot IF((user_followers_count &amp;lt; i3.avg_user_followers_count) AND (avl_user_engagements &amp;lt; i3.avg_avl_user_engagements) AND (avl_user_total_retweets &amp;lt; i3.avg_avl_user_total_retweets) AND (avl_user_total_likes &amp;lt; i3.avg_avl_user_total_likes) AND (avl_user_total_replies &amp;lt; i3.avg_avl_user_total_replies)),
   med OTHERWISE;

STORE top INTO '/tmp/inf_top' USING JsonStorage();
STORE bot INTO '/tmp/inf_bot' USING JsonStorage();
STORE med INTO '/tmp/inf_med' USING JsonStorage();&lt;/PRE&gt;&lt;P&gt;2. sample 1 record from Input:&lt;/P&gt;&lt;PRE&gt;{"user_id":"777576939330699265","user_created_at":"2016-09-18T14:36:03.000-04:00","user_type":"corporate","tweet_created_at":"2016-10-13T22:18:47.000-04:00","avl_user_days_active":"25","user_notifications":"false","user_follow_request_sent":"false","user_following_count":"136","user_name":"VG RetroDeals Sega","user_time_zone":"Paris","user_profile_background_color":"000000","user_translation_enabled":"false","user_profile_link_color":"1B95E0","user_UTC_offset":"7200","user_profile_sidebar_border_color":"000000","user_has_extended_profile":"true","user_profile_background_tile":"false","user_profile_use_background_image":"false","user_default_profile_image":"false","user_description":"#Sega Videogames RetroDeals available on eBay:  #retrogaming  #Genesis #MegaDrive #SegaCD #MegaCD #GameGear #SegaSaturn #Dreamcast #MasterSystem #Mark3 RT","user_profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","user_profile_sidebar_fill_color":"000000","user_followers_count":"133","user_profile_image_url":"http://pbs.twimg.com/profile_images/777581486266576897/7WB52yjV_normal.jpg","user_geo_enabled":"false","user_entities_description_urls":"[]","user_screen_name":"VGDealsSega","user_total_liked":"43","user_url":"https://t.co/Px0pVPxM1E","user_total_posts":"2162","user_default_profile":"false","user_language":"fr","user_protected":"false","avl_user_brand_association":"{\"virtua\":2}","user_total_public_lists":"13","user_profile_image_url_https":"https://pbs.twimg.com/profile_images/777581486266576897/7WB52yjV_normal.jpg","user_contributors_enabled":"false","user_following":"false","user_verified":"false","avl_user_post_frequency":86.48,"avl_user_like_frequency":1.72,"avl_user_follower_following_ratio":0.9779411764705882,"user_is_translator":"false","user_location":"Planet RetroGaming","user_profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","user_profile_banner_url":"https://pbs.twimg.com/profile_banners/777576939330699265/1474224793","user_profile_text_color":"000000","avl_user_total_retweets":1,"avl_user_total_likes":1,"avl_user_total_replies":null,"avl_user_engagements":2,"user_reply_to_reply_count":null,"avl_user_reply_rate":0}&lt;/PRE&gt;&lt;P&gt;3. Verification that i1 has 165 records:&lt;/P&gt;&lt;P&gt;It reads 165 records from the input data file. Also, I just added the line "STORE i1 INTO '/tmp/inf' after generating i1 and the output is:&lt;/P&gt;&lt;PRE&gt;Input(s):
Successfully read 165 records (362590 bytes) from: "/user/alenza/virtua/marketanalysis/data/results/analysis_name/tweets_combined/user"
Output(s):
Successfully stored 165 records (27243 bytes) in: "/tmp/inf"
Successfully stored 0 records (815 bytes) in: "/tmp/inf_med"
Successfully stored 0 records in: "/tmp/inf_top"
Successfully stored 0 records in: "/tmp/inf_bot"&lt;/PRE&gt;</description>
      <pubDate>Wed, 16 Nov 2016 01:52:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-OTHERWISE-keyword-in-SPLIT-not-working-Pig-quot/m-p/104382#M46332</guid>
      <dc:creator>sree_kuppa</dc:creator>
      <dc:date>2016-11-16T01:52:03Z</dc:date>
    </item>
    <item>
      <title>Re: Pig: OTHERWISE keyword in SPLIT not working.,Pig "Otherwise" does not work on Centos 7 Cluster.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-OTHERWISE-keyword-in-SPLIT-not-working-Pig-quot/m-p/104383#M46333</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/14345/sreekuppa.html" nodeid="14345"&gt;@Sree Kupp&lt;/A&gt; I have tested this extensively and believe there is a bug in SPLIT OTHERWISE against pig EVAL functions (like avg).  I will submit a JIRA and update with the number.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;But there is a workaround shown below&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;-------------------------&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Here is what I did to replicate the issue and identify a workaround:&lt;/P&gt;&lt;H4&gt;Replicate issue&lt;/H4&gt;&lt;P&gt;&lt;STRONG&gt;dataset&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;777576939330699265,0,3
777576939330699261,1,3
777576939330699262,2,2
777576939330699263,3,1
777576939330699264,4,1&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;script 1 (replicates your issue)&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;data_input_2 = LOAD 'data_tweets' USING PigStorage(','); 
i1 = FOREACH data_input_2 GENERATE $0 as user_id, (int)$1 as user_followers_count, (int)$2 as avl_user_total_retweets;
i2 = GROUP i1 ALL;
i3 = FOREACH i2 GENERATE AVG(i1.user_followers_count) AS avg_user_followers_count,  AVG(i1.avl_user_total_retweets) AS avg_avl_user_total_retweets;

SPLIT i1 INTO
   top IF(user_followers_count &amp;gt; i3.avg_user_followers_count),
   bot IF(user_followers_count &amp;lt; i3.avg_user_followers_count),
   med OTHERWISE;
 
STORE i1 INTO 'tmp/inf_1' USING JsonStorage();
STORE i2 INTO 'tmp/inf_2' USING JsonStorage();
STORE i3 INTO 'tmp/inf_3' USING JsonStorage();  
STORE top INTO 'tmp/split_top' USING JsonStorage();
STORE bot INTO 'tmp/split_bot' USING JsonStorage();
STORE med INTO 'tmp/split_med' USING JsonStorage();&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;Results&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;tmp/inf_1 and tmp/inf_2: results as expected (5 records, correct grouping)
tmp/inf_3: {"avg_user_followers_count":2.0,"avg_avl_user_total_retweets":2.0}&lt;/PRE&gt;&lt;PRE&gt;tmp/split_top 
{"user_id":"777576939330699263","user_followers_count":3,"avl_user_total_retweets":1}
{"user_id":"777576939330699264","user_followers_count":4,"avl_user_total_retweets":1} 
tmp/split_med: no records 
tmp/split_bot 
{"user_id":"777576939330699265","user_followers_count":0,"avl_user_total_retweets":3}
{"user_id":"777576939330699261","user_followers_count":1,"avl_user_total_retweets":3}&lt;/PRE&gt;&lt;H4&gt;&lt;STRONG&gt;Workaround&lt;/STRONG&gt;&lt;/H4&gt;&lt;P&gt;Here I took the average and hard-coded it into the split.  This works (but is an inconvenient hack).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;script 2 (hard-coding of average, which works)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Everything the same except I used the following for split, based on dump of i3&lt;/P&gt;&lt;PRE&gt;SPLIT i1 INTO
   top IF(user_followers_count &amp;gt; 2.0),
   bot IF(user_followers_count &amp;lt; 2.0),
   med OTHERWISE; &lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;Results&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Results were identical to first script, except:&lt;/P&gt;&lt;PRE&gt;tmp/split_med:
{"user_id":"777576939330699262","user_followers_count":2,"avl_user_total_retweets":2}&lt;/PRE&gt;&lt;H4&gt;&lt;STRONG&gt;Workaround for your script&lt;/STRONG&gt;&lt;/H4&gt;&lt;P&gt;Find the values in i3 and then hardcode them in the SPLIT statement.&lt;/P&gt;&lt;P&gt;&lt;EM&gt;-------------------------&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;If this answers your question (I hope so &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; ), please let me know by accepting the answer; else let me know if there are remaining gaps.&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Nov 2016 22:52:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-OTHERWISE-keyword-in-SPLIT-not-working-Pig-quot/m-p/104383#M46333</guid>
      <dc:creator>gkeys</dc:creator>
      <dc:date>2016-11-17T22:52:17Z</dc:date>
    </item>
    <item>
      <title>Re: Pig: OTHERWISE keyword in SPLIT not working.,Pig "Otherwise" does not work on Centos 7 Cluster.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-OTHERWISE-keyword-in-SPLIT-not-working-Pig-quot/m-p/104384#M46334</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/11288/gkeys.html" nodeid="11288"&gt;@Greg Keys&lt;/A&gt;&lt;/P&gt;&lt;P&gt;SPLIT OTHERWISE is working now in my code. Not sure how and why. I was sure it did not work when I first used it. &lt;/P&gt;&lt;P&gt;Thanks for all your time and effort.&lt;/P&gt;</description>
      <pubDate>Thu, 17 Nov 2016 22:57:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-OTHERWISE-keyword-in-SPLIT-not-working-Pig-quot/m-p/104384#M46334</guid>
      <dc:creator>sree_kuppa</dc:creator>
      <dc:date>2016-11-17T22:57:32Z</dc:date>
    </item>
    <item>
      <title>Re: Pig: OTHERWISE keyword in SPLIT not working.,Pig "Otherwise" does not work on Centos 7 Cluster.</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-OTHERWISE-keyword-in-SPLIT-not-working-Pig-quot/m-p/104385#M46335</link>
      <description>&lt;P&gt;Good to hear.  Would love to know why it is working (especially since my stripped down version replicated the issue).&lt;/P&gt;</description>
      <pubDate>Thu, 17 Nov 2016 23:01:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-OTHERWISE-keyword-in-SPLIT-not-working-Pig-quot/m-p/104385#M46335</guid>
      <dc:creator>gkeys</dc:creator>
      <dc:date>2016-11-17T23:01:57Z</dc:date>
    </item>
  </channel>
</rss>

