Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3574 | 05-03-2017 05:13 PM | |
| 2945 | 05-02-2017 08:38 AM | |
| 3197 | 05-02-2017 08:13 AM | |
| 3159 | 04-10-2017 10:51 PM | |
| 1632 | 03-28-2017 02:27 AM |
02-01-2016
06:53 PM
8 Kudos
it is not completely obvious but you can certainly run Python scripts within Oozie workflows using the Shell action. Here's a sample job.properties file, nothing special about it. nameNode=hdfs://sandbox.hortonworks.com:8020
jobTracker=sandbox.hortonworks.com:8050
queueName=defaultexamplesRoot=oozie
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/python Here's a sample workflow that will look for a script called script.py inside scripts folder <workflow-app xmlns="uri:oozie:workflow:0.4" name="python-wf">
<start to="python-node"/>
<action name="python-node">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>script.py</exec>
<file>scripts/script.py</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Python action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
here's my sample script.py #! /usr/bin/env python
import os, pwd, sys
print "who am I? " + pwd.getpwuid(os.getuid())[0]
print "this is a Python script"
print "Python Interpreter Version: " + sys.version directory tree for my workflow assuming the workflow directory is called python is as such [root@sandbox python]# tree
.
├── job.properties
├── scripts
│ └── script.py
└── workflow.xml
1 directory, 3 files
now you can execute the workflow like any other Oozie workflow. If you wanted to leverage Python3, make sure Python3 is installed on every node. My Python3 script.py looks like this #! /usr/bin/env /usr/local/bin/python3.3
import os, pwd, sys
print("who am I? " + pwd.getpwuid(os.getuid())[0])
print("this is a Python script")
print("Python Interpreter Version: " + sys.version)
Everything else above holds true. You can find my sample workflow source code at the following link, including Python3.
... View more
Labels:
02-01-2016
06:21 PM
1 Kudo
@Pradeep Allu try to pass just the directory path rather than full path including hdfs. Maybe put the path in quotes also?
... View more
02-01-2016
05:15 PM
@Peter Bartal everything goes into /usr/hdp/version can you elaborate on your use case?
... View more
02-01-2016
05:12 PM
@Wes Floyd @Benjamin Leonhardi I was also thinking load using PigStorage() without delimiter and then do either regex or split or filter and route to output file.
... View more
02-01-2016
05:10 PM
1 Kudo
@Ancil McBarnett their API guide shows every which possible way. I'd use the rest processors in Nifi for that.
... View more
02-01-2016
04:52 PM
you can use if statement @Wes Floyd or maybe in your case since you use PigStorage(',') you can filter on commas and filter out pipe. Then load it again but PigStorage('|'). here's an example with split A = LOAD 'data' AS (f1:int,f2:int,f3:int);
DUMP A;
(1,2,3)
(4,5,6)
(7,8,9)
SPLIT A INTO X IF f1<7, Y IF f2==5, Z IF (f3<6 OR f3>6);
DUMP X;
(1,2,3)
(4,5,6)
DUMP Y;
(4,5,6)
DUMP Z;
(1,2,3)
(7,8,9)
... View more
02-01-2016
02:45 PM
@Akshay Shingote chown on the MirrorTest directory and make sure ambari-qa is owner.
... View more
02-01-2016
02:43 PM
hbase-client and hadoop-client. If you're running version 2.7.1 of Hadoop and HBase 1.1.2 there are two distinct versions of both jars available. @Avraha Zilberman
... View more
02-01-2016
02:38 PM
@Ram D Changes the priority of the job. Allowed priority values are VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW. Change one job's priority from NORMAL and it will work. You can also look at preemption.
... View more
02-01-2016
02:34 PM
message is from HDP, you can ignore it. I highly recommend to create an article and show step by step your work with RStudio and Pig. A lot of people would be interested. @Roberto Sancho
... View more