- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Pig: Streaming through python
- Labels:
-
Apache Pig
Created ‎04-20-2016 02:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a small Voters list(name,gender,place,age) where I wanted to eliminate the voters whose age is <= 20. I wanted to try streaming in pig.
When I run the dump on stream its fails and is unable to idenetify python commands. I have attached python script, input data file, pig script and log file. Could you guide where should I install the python in Sandbox. Thank you.
Input:
AAA,Female,Blr,40 BBB,Female,London,35 YYY,Female,Pondy,12 JJJ,Male,London,4 SSS,Female,Pondy,30
pig script in tez_local mode:
grunt> Voters = LOAD 'file:///user/revathy/pig/Voters.txt' USING PigStorage(',') AS (VoterName:chararray,Gender:chararray,Place:chararray,Age:int); grunt> Eligible = STREAM Voters THROUGH `/root/revathy/pig/hello.py` AS (VoterName:chararray,Gender:chararray,Place:chararray,Age:int);
Python script:(Tested in Python editor)
import sys THRESHOLD = 20 def filterVal(line,val4): if int(val4) > THRESHOLD: sys.stdout.writelines(line) return try: for line in sys.stdin.readlines(): val1,val2,val3,val4 = str(line).split(",") filterVal(line,val4) except: print "Error in try block"
Log:
/root/revathy/pig/hello.py: line 1: import: command not found /root/revathy/pig/hello.py: line 2: THRESHOLD: command not found /root/revathy/pig/hello.py: line 3: : command not found
Created ‎04-20-2016 10:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You did not include the python interpreter line in your python script and it has difficulty understanding its python. For what you're trying to achieve, you can skip streaming and just use Pig built-in filter function. It will perform better than streaming. http://pig.apache.org/docs/r0.15.0/
SSN_NAME = load 'students.txt' using PigStorage() as (ssn:long, name:chararray); /* do a left outer join of SSN with SSN_Name */ X = JOIN SSN by ssn LEFT OUTER, SSN_NAME by ssn; /* only keep those ssn's for which there is no name */ Y = filter X by IsEmpty(SSN_NAME);
Created ‎04-20-2016 10:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You did not include the python interpreter line in your python script and it has difficulty understanding its python. For what you're trying to achieve, you can skip streaming and just use Pig built-in filter function. It will perform better than streaming. http://pig.apache.org/docs/r0.15.0/
SSN_NAME = load 'students.txt' using PigStorage() as (ssn:long, name:chararray); /* do a left outer join of SSN with SSN_Name */ X = JOIN SSN by ssn LEFT OUTER, SSN_NAME by ssn; /* only keep those ssn's for which there is no name */ Y = filter X by IsEmpty(SSN_NAME);
Created ‎04-20-2016 11:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for providing an alternation approach. I am learning Pig and would like to try the stream command - see how to run python in pig.
Is this the line, to be added as first line so that execution engine understands its python? #! /usr/bin/env python I tried but still get the same error. Could you please help. Thank you!!!
Created ‎04-21-2016 12:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Checkout my UDF examples using streaming https://github.com/dbist/pig/tree/master/udfs
specifically formathtml.pig script and it's associated UDF written in python
Created ‎06-10-2019 05:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you know if there is a way to specify a python virtual environment for streaming_python to use instead of it using the base python installation?
Created ‎04-21-2016 04:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. Its a good simple example for me to understand.
