<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Pig: Streaming through python in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Pig-Streaming-through-python/m-p/107094#M69970</link>
    <description>&lt;P&gt;Do you know if there is a way to specify a python virtual environment for streaming_python to use instead of it using the base python installation?&lt;/P&gt;</description>
    <pubDate>Tue, 11 Jun 2019 00:59:48 GMT</pubDate>
    <dc:creator>betocolsf</dc:creator>
    <dc:date>2019-06-11T00:59:48Z</dc:date>
    <item>
      <title>Pig: Streaming through python</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pig-Streaming-through-python/m-p/107089#M69965</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have a small Voters list(name,gender,place,age) where I wanted to eliminate the voters whose age is &amp;lt;= 20.
I wanted to try streaming in pig.&lt;/P&gt;&lt;P&gt;When I run the dump on stream its fails and is unable to idenetify python commands. I have attached python script, input data file, pig script and log file.
Could you guide where should I install the python in Sandbox. Thank you.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Input:&lt;/STRONG&gt; &lt;/P&gt;&lt;PRE&gt;AAA,Female,Blr,40 
BBB,Female,London,35
YYY,Female,Pondy,12
JJJ,Male,London,4
SSS,Female,Pondy,30&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;pig script in tez_local mode:&lt;/U&gt;&lt;/STRONG&gt; &lt;/P&gt;&lt;PRE&gt;
grunt&amp;gt; Voters = LOAD 'file:///user/revathy/pig/Voters.txt' USING PigStorage(',') AS (VoterName:chararray,Gender:chararray,Place:chararray,Age:int); 
 
grunt&amp;gt; Eligible = STREAM Voters THROUGH `/root/revathy/pig/hello.py` AS (VoterName:chararray,Gender:chararray,Place:chararray,Age:int);&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Python script:(Tested in Python editor)&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;import sys 
THRESHOLD = 20 def filterVal(line,val4): 
if int(val4) &amp;gt; THRESHOLD: 
          sys.stdout.writelines(line) 
         
        return 
try: 
         
    for line in sys.stdin.readlines(): 
                 val1,val2,val3,val4 = str(line).split(",") 
                 filterVal(line,val4) 
except: 
         
print "Error in try block"&lt;/PRE&gt;&lt;P&gt;&lt;U&gt;Log:&lt;/U&gt; &lt;/P&gt;&lt;PRE&gt;/root/revathy/pig/hello.py: 
line 1: import: command not found                                                                                        
/root/revathy/pig/hello.py:
line 2: THRESHOLD: command not found                                                                                    
/root/revathy/pig/hello.py: 
line 3:                                                                                                                  
: command not found              &lt;/PRE&gt;</description>
      <pubDate>Wed, 20 Apr 2016 09:26:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pig-Streaming-through-python/m-p/107089#M69965</guid>
      <dc:creator>Eukrev</dc:creator>
      <dc:date>2016-04-20T09:26:13Z</dc:date>
    </item>
    <item>
      <title>Re: Pig: Streaming through python</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pig-Streaming-through-python/m-p/107090#M69966</link>
      <description>&lt;P&gt;You did not include the python interpreter line in your python script and it has difficulty understanding its python. For what you're trying to achieve, you can skip streaming and just use Pig built-in filter function. It will perform better than streaming. &lt;A href="http://pig.apache.org/docs/r0.15.0/" target="_blank"&gt;http://pig.apache.org/docs/r0.15.0/&lt;/A&gt;&lt;/P&gt;&lt;PRE&gt;SSN_NAME = load 'students.txt' using PigStorage() as (ssn:long, name:chararray);

/* do a left outer join of SSN with SSN_Name */
X = JOIN SSN by ssn LEFT OUTER, SSN_NAME by ssn;

/* only keep those ssn's for which there is no name */
Y = filter X by IsEmpty(SSN_NAME);&lt;/PRE&gt;</description>
      <pubDate>Wed, 20 Apr 2016 17:18:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pig-Streaming-through-python/m-p/107090#M69966</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-04-20T17:18:34Z</dc:date>
    </item>
    <item>
      <title>Re: Pig: Streaming through python</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pig-Streaming-through-python/m-p/107091#M69967</link>
      <description>&lt;P&gt;Thank you for providing an alternation approach.

I am learning Pig and would like to try the stream command - see how to run python in pig.&lt;/P&gt;&lt;P&gt;Is this the line, to be added as first line so that execution engine understands its python?
#! /usr/bin/env python
I tried but still get the same error. Could you please help. Thank you!!!&lt;/P&gt;</description>
      <pubDate>Thu, 21 Apr 2016 06:17:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pig-Streaming-through-python/m-p/107091#M69967</guid>
      <dc:creator>Eukrev</dc:creator>
      <dc:date>2016-04-21T06:17:45Z</dc:date>
    </item>
    <item>
      <title>Re: Pig: Streaming through python</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pig-Streaming-through-python/m-p/107092#M69968</link>
      <description>&lt;P&gt;Checkout my UDF examples using streaming &lt;A href="https://github.com/dbist/pig/tree/master/udfs"&gt;https://github.com/dbist/pig/tree/master/udfs&lt;/A&gt;&lt;/P&gt;&lt;P&gt;specifically formathtml.pig script and it's associated UDF written in python&lt;/P&gt;</description>
      <pubDate>Thu, 21 Apr 2016 07:04:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pig-Streaming-through-python/m-p/107092#M69968</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-04-21T07:04:37Z</dc:date>
    </item>
    <item>
      <title>Re: Pig: Streaming through python</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pig-Streaming-through-python/m-p/107093#M69969</link>
      <description>&lt;P&gt;Thank you. Its a good simple example for me to understand.&lt;/P&gt;</description>
      <pubDate>Thu, 21 Apr 2016 23:31:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pig-Streaming-through-python/m-p/107093#M69969</guid>
      <dc:creator>Eukrev</dc:creator>
      <dc:date>2016-04-21T23:31:48Z</dc:date>
    </item>
    <item>
      <title>Re: Pig: Streaming through python</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Pig-Streaming-through-python/m-p/107094#M69970</link>
      <description>&lt;P&gt;Do you know if there is a way to specify a python virtual environment for streaming_python to use instead of it using the base python installation?&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jun 2019 00:59:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Pig-Streaming-through-python/m-p/107094#M69970</guid>
      <dc:creator>betocolsf</dc:creator>
      <dc:date>2019-06-11T00:59:48Z</dc:date>
    </item>
  </channel>
</rss>

