Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
02-22-2017
09:35 PM
5 Kudos
Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html In this tutorial, I will demonstrate an awesome feature of WFM to publish and import assets to and from HDFS for easy reuse. What it means is any user on the WFM View node can author an action node and publish it for later user by him/her or another colleague. This allows for collaboration and action repurpose. Consider you author an action node that contains long complicated URL strings or whatnot, save it to HDFS or local database and off you go. Yes, you can publish assets to HDFS as well as local Ambari database. I'll demonstrate HDFS but steps are identical for database. Publishing to HDFS has a benefit of sharing across entire organization rather than single Ambari Views instance. Let's start with creating an email action. Fill out your email address, subject and body. At this point you can submit and see if it works, in my case it worked and I received an email. At this point I'd like to publish this action node to HDFS. It is the last button on the right, you can see that there are two buttons to publish and to import, those are for database and HDFS respectively. Since I've chosen HDFS, I need to supply path. WFM saves assets with .wfasset extension, feel free to share with a global directory. [centos@aervits-hdp0 ~]$ hdfs dfs -ls assets/
Found 1 items
-rw-r--r-- 3 centos hdfs 212 2017-02-22 20:56 assets/asset.wfasset Let's now create a new wf where I'm going to execute a Shell action and then reuse this email action to send myself an email when job successfully finishes. Once finished with shell node, let's click on the arrow between shell and end node and click plus to add another action node. In this case, we're going to hit on import asset from HDFS instead of any of the available actions. Then go ahead and enter the HDFS path where asset is located. Once done, change action node names to something that makes sense. Finally, you can execute the wf. Another reason to rename action nodes to meaningful names is ability to identify them quickly in Oozie/WFM dashboard. Notice the Transition column. Finally, let's look at job result and what do you know? I received an email. This is by no means the best way to send email on job result, I think handling job result via email is best handled via decision node or via kill node. This example is meant to demonstrate flexibility of WFM and also show something that is not available in Oozie itself. This is one of many features that will separate Oozie from WFM going forward. Until next time!
... View more
Labels:
02-18-2017
07:36 PM
5 Kudos
Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html In this tutorial, I will walk you through creating a Sqoop action using WFM on HDP 2.5+. First we need a table, we're going to use MySQL as source database and table. create table imported (rowkey int, value varchar(25));
insert into imported (rowkey, value) values (1, "john doe");
insert into imported (rowkey, value) values (2, "jane doe");
I want to make sure that all cluster nodes can access this table and going to grant access to user centos on the LAN, you may have different restrictions on the network and by all means consult your DBAs. GRANT ALL PRIVILEGES ON *.* TO 'centos'@'172.22.65.%'
IDENTIFIED BY 'password'
WITH GRANT OPTION;
FLUSH PRIVILEGES;
GRANT ALL PRIVILEGES ON *.* TO 'centos'@'localhost'
IDENTIFIED BY 'password'
WITH GRANT OPTION;
FLUSH PRIVILEGES;
I want to make sure user centos can access the table mysql –u centos –p
➢ password
mysql> select * from test.imported;
+--------+----------+
| rowkey | value |
+--------+----------+
| 1 | john doe |
| 2 | jane doe |
+--------+----------+
Finally, I'd like to test my sqoop works sqoop list-tables --connect jdbc:mysql://source-1/test --username centos --password password
17/02/18 15:13:13 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
imported
Also, in case of Oozie with HCatalog and Sqoop, every node that will execute job attempts must have HCat and Sqoop client installed. I want to save a password in a file so that I could access it w/out a prompt and not in clear text. echo -n "password" > .password
hdfs dfs -put .password /user/$USER/
hdfs dfs -chmod 400 /user/$USER/.password
rm .password
[centos@source-1 ~]$ hdfs dfs -ls
Found 1 items
-r-------- 3 centos hdfs 8 2017-02-18 15:13 .password
[centos@source-1 ~]$ hdfs dfs -cat .password
password[centos@source-1 ~]$
Let's run the list command again referencing the file instead of --password argument sqoop list-tables --connect jdbc:mysql://source-1/test --username centos --password-file /user/centos/.password
17/02/18 15:14:43 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
imported
You can find more details in our comprehensive documentation on data movement http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_data-movement-and-integration/content/sqoop_hcatalog_integration.html Also, make sure mysql-connector-java is at an appropriate version. RHEL 6 bundles version 5.1.17 which does not work in later versions of HDP, we bundle 5.1.37 in HDP-UTILS and the only way to active it is to run the following yum downgrade mysql-connector-java then in your /usr/share/java directory you should be able to see correct connectors lrwxrwxrwx. 1 root root 31 Feb 18 15:29 jdbc-mysql.jar -> mysql-connector-java-5.1.37.jar
lrwxrwxrwx. 1 root root 31 Feb 18 15:29 mysql-connector-java.jar -> mysql-connector-java-5.1.37.jar
You have a choice to update the Oozie sharelib with this connector or bundle it as part of workflow lib. I'm going to do the latter for time's sake. Before I start authoring a workflow, I'd like to confirm my sqoop import works, I will execute it on the command line first sqoop import --connect jdbc:mysql://172.22.65.123/test --username centos --password-file /user/$USER/.password --table imported --hcatalog-table imported --create-hcatalog-table --hcatalog-storage-stanza "STORED AS ORCFILE" --hcatalog-home /usr/hdp/current/hive-webhcat --map-column-hive value=STRING --split-by rowkey
I'm choosing HCatalog import as it is more efficient than --hive-import, in case of latter, it needs to make one extra step of moving imported data from staging directory to hive and spawning an extra container. With --hcatalog-table everything happens in one shot. Also, benefit here is that you can create an ORC table from the command line instead of going into Hive and altering a table to set it to ORC. Let's see what we got as a result Map-Reduce Framework
Map input records=2
Map output records=2
Input split bytes=213
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=837
CPU time spent (ms)=8890
Physical memory (bytes) snapshot=718036992
Virtual memory (bytes) snapshot=9154256896
Total committed heap usage (bytes)=535298048
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
17/02/18 15:32:05 INFO mapreduce.ImportJobBase: Transferred 628 bytes in 70.6267 seconds (8.8918 bytes/sec)
17/02/18 15:32:05 INFO mapreduce.ImportJobBase: Retrieved 2 records.
17/02/18 15:32:05 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
Let's see what it looks like in Hive [centos@source-1 ~]$ beeline
Beeline version 1.2.1000.2.6.0.0-493 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000 "" ""
Connecting to jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 1.2.1000.2.6.0.0-493)
Driver: Hive JDBC (version 1.2.1000.2.6.0.0-493)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> select * from default.imported;
+------------------+-----------------+--+
| imported.rowkey | imported.value |
+------------------+-----------------+--+
| 1 | john doe |
| 2 | jane doe |
+------------------+-----------------+--+
2 rows selected (6.414 seconds)
Let's truncate the table in order to prepare for Oozie imports and additionally describe the table to demonstrate it is in fact in ORC 0: jdbc:hive2://localhost:10000> truncate table default.imported;
No rows affected (0.4 seconds)
0: jdbc:hive2://localhost:10000> describe formatted imported;
| SerDe Library: | org.apache.hadoop.hive.ql.io.orc.OrcSerde | NULL |
| InputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
I'm ready to start working on a workflow, let's import sqoop action and save workflow to create a directory for it. I want to make sure I have a valid directory so I could upload a few files that are necessary for this to complete successfully. hdfs dfs -mkdir /user/centos/sqoop/lib
hdfs dfs -put /usr/share/java/mysql-connector-java-5.1.37.jar /user/centos/sqoop/lib/
hdfs dfs -put /etc/hive/conf/hive-site.xml /user/centos/sqoop/lib/
hdfs dfs -put /etc/tez/conf/tez-site.xml /user/centos/sqoop/lib/
I'm going to use my own MySQL driver than the one in the sharelib and therefore I'm uploading it to my wf, again if you update the sharelib with associated jar you don't have to do that. Secondly, I'm going to include hive-site.xml and tez-site.xml. Until 2.5, you only needed hive-site.xml but now we also need tez-site.xml. It is a small fact that will save you a lot of hours of debugging, trust me I know. Your wf lib directory should look like so hdfs dfs -ls /user/centos/sqoop/lib/
Found 3 items
-rw-r--r-- 3 centos hdfs 19228 2017-02-18 15:38 /user/centos/sqoop/lib/hive-site.xml
-rw-r--r-- 3 centos hdfs 977873 2017-02-18 15:37 /user/centos/sqoop/lib/mysql-connector-java-5.1.37.jar
-rw-r--r-- 3 centos hdfs 6737 2017-02-18 15:38 /user/centos/sqoop/lib/tez-site.xml
Finally, I want to modify my sqoop command as I no longer need --create-hcatalog-table command and want to replace $USER argument with my username, you can also use Oozie EL functions for string replacement. import --connect jdbc:mysql://172.22.65.123/test --username centos --password-file /user/centos/.password --table imported --hcatalog-table imported --hcatalog-home /usr/hdp/current/hive-webhcat --map-column-hive value=STRING --split-by rowkey That's what my command will look like in Oozie, notice missing "sqoop" command, it's inferred when you select sqoop action in WFM. Edit the sqoop action on WFM canvas and enter the command in. We are working on refreshing the UI before WFM is released and your dialog box may look slightly different but fields should remain the same. Let's tell WFM that we also expect tez-site.xml and hive-site.xml files Finally, we need to tell Oozie that we will pull in HCatalog and Hive jars for this to work At this point my wf is finished, let's inspect the XML. When you submit the job, it should succeed and you can look at the results. Again, this is more efficient and actually works on HDP 2.5+, I highly recommend checking out WFM and hcatalog options in Sqoop.
... View more
Labels:
02-16-2017
08:01 PM
5 Kudos
Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html Welcome back folks, in this tutorial, I'm going to demonstrate how to easily import existing Spark workflows and execute them in WFM as well as create your own Spark workflows. As of today, Apache Spark 2.x is not supported in Apache Oozie bundled with HDP. There is community work around making Spark2 run in Oozie but it is not released yet. I'm going to concentrate on Spark 1.6.3 today. First things first, I'm going to import a workflow into WFM from Oozie examples https://github.com/apache/oozie/tree/master/examples/src/main/apps/spark My cluster setup is: Ambari 2.5.0 HDP 2.6 HDFS HA RM HA Oozie HA Kerberos Luckily for Spark action in Kerberos environment I didn't need to add anything else (i.e. credential). First thing I need is to get dfs.nameservices property from HDFS Ambari > HDFS > Configs I'm going to use that for nameNode variable. I'm ready to import this workflow into WFM, for the details, please review one of my earlier tutorials. I'm presented with spark action node Click on the spark-node and hit the gear icon to preview the properties. let's also review any arguments for input and output as well as RM and NameNode, also notice prepare step, we can select to delete a directory if exists. We're going to leave everything as is. When we submit the workflow, we're going to supply nameNode and resourceManager address, below are my properties notice jobTracker and resourceManager both appear, ignore jobTracker, since it was in the original wf, it was inherited, we're concerned about RM going forward. Also nameNode value is the dfs.nameservices property from core-site.xml as I stated earlier. Once the job completes, you can navigate to the output directory and see that file was copied. hdfs dfs -ls /user/aervits/examples/output-data/spark/
Found 3 items
-rw-r--r-- 3 aervits hdfs 0 2017-02-16 17:16 /user/aervits/examples/output-data/spark/_SUCCESS
-rw-r--r-- 3 aervits hdfs 706 2017-02-16 17:16 /user/aervits/examples/output-data/spark/part-00000
-rw-r--r-- 3 aervits hdfs 704 2017-02-16 17:16 /user/aervits/examples/output-data/spark/part-00001
In my case sample input was a book in the examples directory hdfs dfs -cat /user/aervits/examples/output-data/spark/part-00000
To be or not to be, that is the question;
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles,
And by opposing, end them. To die, to sleep;
No more; and by a sleep to say we end
The heart-ache and the thousand natural shocks
That flesh is heir to ? 'tis a consummation
Next up, I'm going to demonstrate authoring a new Spark action instead of importing one. I'm following a guide http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_spark-component-guide/content/run-sample-apps.html#run_spark_pi to demonstrate how to add this Pi job to Oozie workflow via WFM. First you need to create a workflow directory on HDFS along with lib folder. Then upload the Spark jar to that directory. hdfs dfs -mkdir -p oozie/spark/lib
cd /usr/hdp/current/spark-client/lib
hdfs dfs -put spark-examples-1.6.3.2.6.0.0-502-hadoop2.7.3.2.6.0.0-502.jar oozie/spark/lib next, let's add a spark action to WFM and edit it. Fill out the properties as below and make sure to select Yarn Cluster, Yarn Client in Oozie will be deprecated soon. Notice you can pass Spark options on its own line. I also need to add an argument to SparkPi job, in this case it's 10 If you didn't figure out already, I'm trying to recreate the following command in Oozie ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 1 --driver-memory 512m --executor-memory 512m --executor-cores 1 lib/spark-examples*.jar 10 Aside from changing yarn-client to yarn-cluster, everything else is as in the command above. I'd like to preview my XML now. I'm ready to submit the job and run it. Next I'm going to demonstrate how to run a PySpark job in Oozie via WFM. The code I'm going to run is below from pyspark import SparkContext, SparkConf
import sys
datain = sys.argv[1]
dataout = sys.argv[2]
conf = SparkConf().setAppName('counts_with_pyspark')
sc = SparkContext(conf=conf)
text_file = sc.textFile(str(datain))
counts = text_file.flatMap(lambda line: line.split(" ")) .map(lambda word: (word, 1)) .reduceByKey(lambda a, b: a + b)
counts.saveAsTextFile(str(dataout))
It's taken from http://spark.apache.org/examples.html, I only added an option to pass input and output directories from command line. I'm going to run the code to make sure it works with the following command /usr/hdp/current/spark-client/bin/spark-submit counts.py hdfs://mycluster/user/aervits/examples/input-data/text/ hdfs://mycluster/user/aervits/pyspark-output This will produce the output in the pyspark-output HDFS directory with a count for each instance of a word. Expected output is below hdfs dfs –cat pyspark-output/part-0000 | less
(u'and', 7)
(u'slings', 1)
(u'fardels', 1)
(u'mind', 1)
(u'natural', 1)
(u'sea', 1)
(u'For', 2)
(u'arrows', 1)
(u'is', 2)
(u'ills', 1)
(u'resolution', 1)
(u'merit', 1)
(u'death,', 1)
(u'say', 1)
(u'pause.', 1)
(u'bare', 1)
(u'Devoutly', 1)
Next, I'm ready to add a Spark action node to WFM and edit it by populating the properties below. Notice I'm passing the Spark options as well as yarn-cluster as deployment mode. Next I need to configure input/output and prepare step. I need to delete output directory so that I can re-run my wf w/out manually deleting the output directory. Nothing new here, I'm passing the input and output as arguments to the action. I'm ready to preview the XML. Last step here is to create the lib directory in the pyspark workflow directory and upload the counts.py file there. hdfs dfs -mkdir oozie/pyspark/lib
hdfs dfs -put counts.py oozie/pyspark/lib/ Now I am ready to submit the job, luckily it succeeds. As usual, you can find my code here https://github.com/dbist/oozie/tree/master/apps/pyspark https://github.com/dbist/oozie/tree/master/apps/spark
... View more
Labels:
02-15-2017
07:54 PM
2 Kudos
Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html In this tutorial, we're going to leverage Oozie's SLA monitoring features via Workflow Manager. To read more about SLA features in Oozie, please look at the official documentation https://oozie.apache.org/docs/4.2.0/DG_SLAMonitoring.html We will begin with a simple shell action that will sleep for duration of time. Create a new file called script.sh and paste add the code below. echo “start of script execution”
sleep 60
echo “end of script execution”
We're also going to create a workflow HDFS directory and upload this script to it. hdfs dfs –mkdir oozie/shell-sla
hdfs dfs -put script.sh oozie/shell-sla/
Let's begin with adding a shell action and populating the script name and file attribute. Don't forget to check the capture output box. We want to see the output of this action. I want to submit the workflow to make sure everything works as expected before configuring SLA features. It's a good idea to preview the XML to make sure file tag and exec tags are filled correctly. Once job completes, I want to drill down to the job and view the output. Everything looks good, we're ready to enable SLA features of Oozie via Workflow Manager. Click on the shell action and then gear icon. At the bottom of the configuration page, you will see SLA section. Expand that and check the enabled box. Each field is described as below:
nominal-time: As the name suggests, this is
the time relative to which your jobs' SLAs will be calculated. Generally
since Oozie workflows are aligned with synchronous data dependencies, this
nominal time can be parameterized to be passed the value of your
coordinator nominal time. Nominal time is also required in case of
independent workflows and you can specify the time in which you expect the
workflow to be run if you don't have a synchronous dataset associated with
it. should-start: Relative to nominal-time this is the amount of time
(along with time-unit - MINUTES, HOURS, DAYS) within which your job
should start running to meet SLA. This is optional. should-end: Relative to nominal-time this is the amount of time
(along with time-unit - MINUTES, HOURS, DAYS) within which your job
should finish to meet SLA. max-duration: This is the maximum amount of
time (along with time-unit - MINUTES, HOURS, DAYS) your job is expected to
run. This is optional. alert-events: Specify the types of events for
which Email alerts should be sent. Allowable values in
this comma-separated list are start_miss, end_miss and duration_miss.
*_met events can generally be deemed low priority and hence email alerting
for these is not necessary. However, note that this setting is only for
alerts via email alerts and not via JMS messages, where
all events send out notifications, and user can filter them using desired
selectors. This is optional and only applicable when alert-contact is
configured. alert-contact: Specify a comma separated list of email addresses where you wish your
alerts to be sent. This is optional and need not be configured if you just want
to view your job SLA history in the UI and do not want to receive email alerts. I'm going to simulate each one of the SLA patterns, i.e. my job started later than scheduled, my job completed outside the SLA threshold and finally, my job took longer to complete than we were expecting. To fill out the nominal time, feel free to choose the date and clock icon below the date picker for correct time. Click x when ready. Finally, I'd like to change my script to run for 120 seconds instead of 60 to simulate long duration. My script should look like so: echo “start of script execution”
sleep 120
echo “end of script execution”
When ready re-upload the script. At this point, I want to make sure sending mail from the cluster is possible and will test that by sending a sample email. Enabling mail is beyond the scope of this tutorial, I followed the procedure below, adjust as necessary for your environment. sudo su
yum install postfix
/etc/init.d/postfix restart
exit Now we're able to send mail from our node, mail needs to work on any of the nodes Oozie will execute a wf. mail -s "test" email@email.com
hit ctrl-D, you should get an email shortly. Finally, there are some changes we need to implement on the Oozie side. I'm not going to enable JMS alerting and only concentrate on email piece. Please consult Oozie docs for JMS part. This is HDP 2.5.3 and things may look/act differently on your Oozie instance. Let's go to Ambari > Configs filter by the following property oozie.services.ext We're going to add these services to the existing list: org.apache.oozie.service.EventHandlerService,
org.apache.oozie.sla.service.SLAService Once ready, add a couple of more custom properties in Oozie, again, in my environment these properties did not exist. oozie.service.EventHandlerService.event.listeners and the value should be org.apache.oozie.sla.listener.SLAJobEventListener,
org.apache.oozie.sla.listener.SLAEmailEventListener
Oozie docs also recommend adding the following property to improve performance of event processing, we're going to add this property and set value to 15. oozie.service.SchedulerService.threads Once I saved the changes and restarted Oozie, it failed to start, looking at the logs I noticed the following in the oozie-error.log: 2017-02-15 18:14:32,757 WARN ConfigUtils:523 - SERVER[wfmanager-test-1.openstacklocal] Using a deprecated configuration property [oozie.service.AuthorizationService.security.enabled], should use [oozie.service.AuthorizationService.authorization.enabled]. Please delete the deprecated property in order for the new property to take effect.
I found the property in Ambari > Configs and set it to false, I was not able to delete it. Once done, restart all Oozie services and now you're able to see a new tab in Oozie called SLA Remember, we only configured Email service, not JMS. We're ready to test our wf. Before that, I'd like to preview the XML for good measure. At this point, I'm ready to submit the workflow and watch my inbox. I'm expecting to miss my job start, job end and duration. This is an email output of my workflow. Until next time folks!
... View more
Labels:
02-14-2017
04:08 AM
10 Kudos
Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html In this tutorial, I'm going to cover a Java action along with Decision node. I'm also going to show you how to leverage EL functions in Oozie within WFM. I'm also going to show you how to manage kill nodes for workflows. All source code for this tutorial is available at https://github.com/dbist/oozie/tree/master/apps/java. Inspiration for this tutorial is from https://cwiki.apache.org/confluence/display/OOZIE/Java+Cookbook example 2. Problem statement for this tutorial is "This example is to illustrate how action output data captured using capture output can be used in decision nodes." First thing we need to do is pick java action node and drop it onto canvas. In the link, java-node is called java3, I prefer my name. This is important as decision node EL function relies on the name of the node so if you decide to change it to your liking, make sure you change it in EL function as well. Same goes with the Java code, by default there is no package name, I decided to place the code into org.apache.oozie.examples package, that will affect my java node configuration which I'm going to show next. Notice the difference between the Oozie wiki link and my main class, it contains full Java package name. Also, I'm going to check the capture output as I want to see what the output of the process is going to be at the end. Next we're going to configure argument for my Java, it will affect the path for decision node. Looking at the code, if yes, we expect "value1" and "value2" printed, else "novalue" for key1 and key2 respectively. Next, looking at the Oozie XML in the wiki, I'm going to add parameter to pass YARN queue. Let's preview XML for good measure. we're pretty much done with Java node, let's switch to Decision node, no pun intended. We add a new node by clicking on the arrow following your java node and then hitting the plus sign. This is what it looks like after adding decision node. To configure Decision node, you click on it and then hitting the gear icon. For condition node, we're going to paste the EL function ${(wf:actionData('java-node')['key1'] == "value1") and (wf:actionData('java-node')['key2'] == "value2")} notice I changed java3 to java-node to reflect the name of my java node. Next, I'd like to modify my kill node as I expect default to go to "fail" rather than kill and WFM does not know what that is. Click on the button called "kill nodes" Now we're going to click on create kill node and configure it. We're going to paste our custom kill node message and name in the text boxes below. Notice I can edit the existing kill nodes as well, let's hit the trash icon to delete the existing kill node. We now have only one kill node called fail. Now, I need to be honest, I had an issue here to configure decision node via WFM to configure default condition to go to fail rather than end. I ended up manually editing the XML rather than handling it within WFM, this may be a bug and I will provide feedback to engineering. Remember this is still an unreleased version of product! So what I ended up doing is editing XML, uploading and then importing the wf as I've shown in the earlier tutorial. Now everything looks good. Notice, default points to fail instead of end like below. Now is a good time to preview XML to match the wiki. Notice bad syntax in EL, that's another issue we're going to address as part of bug bash :). Since I've modified the XML manually and a big promise with WFM is to reduce syntax errors, it is a good time to validate the wf, let's click on the validate button. It prompts to save the wf and pass the YARN queue name, remember we added that property in Java action node. All looks good, let's submit, again we need to pass that YARN queue parameter. Also we need to remember to upload the Java jar to lib directory, once validation step is executed, java-wf directory in HDFS is created, then you just need to create a lib folder and upload compiled Java code as jar into it. hdfs dfs -mkdir oozie/java-wf/lib
hdfs dfs -put OozieJavaExample-1.0.0.jar oozie/java-wf/lib/ The workflow finished successfully. Notice four rows of wf as we have two nodes, start node and end node. All OK. Let's check out the flow graph before we see job result. Notice it is all green as it was all successful. Finally, let's see the result. As expected, we're passing a yes and if so, decision will print key1=value1 and key2=value2. Let's switch things up a bit and pass a no as argument. We need to modify that in the Java node again. Once changed, I submitted the job and it succeeded, however, we see something a bit different. Again the four rows look good and all passed. However if we look at the flow graph, color scheme is different. Decision resulted in triggering default condition which was expecting to go to fail. This is what it looks like currently in that case. Finally, let's see the job results. And as expected, we pass a "no" and as a result, we expect key1=novalue and key2=novalue. All done here. Until next time!
... View more
Labels:
02-12-2017
03:45 PM
12 Kudos
Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html In this tutorial, we're going to run a Pig script against Hive tables via HCatalog integration. You can find more information in the following HDP document http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_data-movement-and-integration/content/ch_data_movement_using_oozie.html#errata_pig_with_oozie First thing you need to do in WFM is create a new Pig action. Now you can start editing the properties of the action. Since we're going to run a Pig script, let's add that property into the wf. This is just saying that we're going to execute a script, you still need to add the <file> attribute to the wf. This expects a file in the workflow directory, we will need to upload a pig script later. Next, since we're going to run Pig against Hive, we need to provide thrift metastore information for it or include hive-site.xml file into the wf directory, since that usually changes, it's probably best to add the property as part of wf. You can find the property in the Ambari > Hive > Configs and search for hive.metastore.uris. Now in WFM, you add that into the configuration section of the Pig action. I also want to compress output coming from Mappers to improve performance for intermediate IO, I'm going to use property of Mapreduce called mapreduce.map.output.compress and set it to true At this point, I'd like to see how I'm doing and I will preview the workflow in XML form. You can find it under workflow action. This is also a good time to confirm your thrift URI and commonly forgotten property <script> and <file>. Now finally, let's add the script to the wf directory. Use your favorite editor and paste the code for Pig and save file as sample.pig set hcat.bin /usr/bin/hcat;
sql show tables;
A = LOAD 'data' USING org.apache.hive.hcatalog.pig.HCatLoader();
B = LIMIT A 1;
DUMP B; I have a Hive table called 'data' and that's what I'm going to load as part of Pig, I'm going to peek into the table and dump one relation to console. In the 2nd line of the script, I'm also executing a Hive "show tables;" command. I also recommend to execute this script manually to make sure it works, command for it is pig -x tez -f sample.pig –useHCatalog Once it executes, you can see the output on the console, for brevity, I will only show the output we're looking for 2017-02-12 14:29:52,403 [main] INFO org.apache.pig.tools.grunt.GruntParser - Going to run hcat command: show tables;
OK
data
wfd
2017-02-12 14:30:09,205 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(abc,xyz)
Notice the output of show tables and then (abc,xyz) that's the data I have in my 'data' table. Finally, upload the file to wf directory. Save the wf first in WFM to create directory or point wf to an existing directory with the script in it. hdfs dfs -put sample.pig oozie/pig-hcatalog/ We are finally ready to execute the wf. As the last step, we need to tell wf that we're going to use Pig with Hive and HCatalog and we need to add a property oozie.share.lib.for.pig=hive,pig,hcatalog. This property tells Oozie that we need to use more than just Pig libraries to execute the action. Let's check the status of the wf, click the Dashboard button. Luckily wf succeeded. Let's click on the job and go to flow graph tab. All nodes appear in green, means it succeeded but we already knew that. Navigate to action tab, we'll be able to drill to Resource Manager job history from that tab. Let's click the arrow facing up to continue to RM. Go through the logs in job history and in stdout log you can find the output, we're looking for output of show tables and output of dump command. Looks good to me. Thanks!
... View more
Labels:
02-11-2017
08:27 PM
9 Kudos
This is the third in the series of articles on WFM. Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html In this tutorial, I will import an existing Python3 workflow and modify it to work with WFM. You need to make sure that Python3 exists on every node where YARN nodemanager is installed and Oozie is allowed to execute on that node (in essence, you want to make sure there are no YARN queue limitations on nodes with Python3 installed). Installing Python3 is beyond the scope of this tutorial, I am using Centos 6 and I followed this tutorial http://ask.xmodulo.com/install-python3-centos.html Once Python3 is deployed across all of your nodemanagers, you can import the workflow in the same way I've shown before and configure the python3-node to your liking Nothing is out of the ordinary yet. We learned the mistake in our previous tutorial and let WFM assign Resource Manager property rather than using inherited $jobTracker. I am ready to submit, notice I left the inherited queue property in the wf and WFM prompts me to input it. On submission, we can navigate to the dashboard to track the status. My job succeeded, I want to look at the result. I'm going to click on the arrow on the right and navigate to the YARN job. Deep in the logs, I can find my desired output My python3 code by the way is below #! /usr/bin/env /usr/local/bin/python3
import os, pwd, sys
print("who am I? " + pwd.getpwuid(os.getuid())[0])
print("this is a Python script")
print("Python Interpreter Version: " + sys.version)
As usual, my repo has more samples you can play around with https://github.com/dbist/oozie
... View more
Labels:
02-11-2017
07:53 PM
13 Kudos
This is a second in the series of articles on WFM. Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html In this tutorial, we're going to import an existing workflow with a Python script wrapped in a shell action. The existing workflows can exist on HDFS or in your local file system. Let's fetch it onto the canvas. My workflow is already on HDFS and therefore that's the option I select. WFM view is integrated with WEBHDFS browser and it makes navigating the directory tree very easy. Navigate to the directory in HDFS with the desired workflow and hit select. Once imported, WFM will run validation on the syntax and present it for further modification. Now you can modify the python-node by hovering over it and clicking the gear icon. Once clicked, you can configure the rest of the action to your liking. it inherits all of the old properties of your workflow. Notice you can specify a directory and script file in the File text box. My Oozie workflow also has old properties like specification of YARN Queue, WFM correctly parses and inherits that property. Also notice I have capture output as I'd like to see the result of the output to the console by my script. At this point, I'm ready to preview my workflow, WFM comes with a handy XML preview. Looks all right to me, I'm ready to submit. Notice WFM doesn't know what $jobTracker is and prompts me to fill that out along with queue. At this point we can navigate to the WFM Dashboard tab as you've seen in my previous tutorial and track the job status. My job failed, I can debug the job status directly from WFM Turns out, issue is with my parameter $jobTracker, in WFM, it was renamed to $resourceManager and it comes by default, I need to remove my custom parameter and let WFM do what it does best. Here's preview of my XML after the change Back in the dashboard, I can click on the job and investigate the status. My job completed successfully, I can navigate to the YARN job status straight from WFM. I need to click on my succeeded wf and click on the arrow icon. It is right there on the right, same row asthe python-node Finally, navigate to the logs of your YARN job to view the output And that's all for this tutorial, you learned how to import an existing Python Oozie workflow and further edit it via WFM. My Python script by the way has the following code #! /usr/bin/env python
import os, pwd, sysprint
"who am I? " + pwd.getpwuid(os.getuid())[0]
print "this is a Python script"
print "Python Interpreter Version: " + sys.version You can find my workflow along with other samples on my github page https://github.com/dbist/oozie Stay tuned!
... View more
Labels:
02-11-2017
06:51 PM
19 Kudos
Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html Apache Ambari 2.5 will have a slew of new views that were not available before. One of them is called Workflow Manager. It is a new experience to work with Apache Oozie workflows. It is meant to be easy to use and help users in building powerful workflows without ever touching XML or dated Oozie UI. Let's get Started! We will begin with authoring a simple Shell Action; first thing we need to do is login to Ambari views instance with a user that has permissions working with Workflow Manager. In my case user is centos. Once logged in, you will be greeted with selection of views that your particular user has permissions to: We're going to select WFD, don't worry about the acronym name, that's configurable by your Ambari admin and can be named anything you prefer. Also, going forward, for brevity, I will refer to Workflow Manager as WFM. Once in WFM view, you're greeted with a canvas that you'll be using for the rest of the tutorial. We're going to create a new workflow by selecting it in the options menu. Notice there are options to create coordinator and bundle types of workflows as well. Since it's a shell action, we're going to rename the workflow to shell-action. You can see in the picture above that any action in WFM has a beginning and end. If you hover over the arrow connecting two nodes, you will notice a plus sign, this is what you're going to use to add new actions. List of all built-in actions is visible once you click the plus sign. Select shell action as that's what we're going to use in this scenario. Once clicked, you can edit the shell action node to rename it or configure it for your specific needs. Click the gear icon to modify the configuration. I'm going to run a simple echo command and capture its output as this is meant to be a simple example. You can also enter name of a script file you want to execute, I'm going to demonstrate it in a future tutorial. Then, you can scroll down into the advanced properties section to modify parameters and arguments to the command. Notice my argument is just a statement I want to echo on the console. Since I also want to capture the output of the command for further processing or just for visual perspective, in the bottom of the dialog, I will check the capture output box. At this point my workflow is ready and I can submit it. Click on that next, you will see a few more options. You need to specify the location of the workflow, path must exist on hdfs but not the xml, if xml exists, you need to overwrite it. Then you can select run on submit. Another option in the dialog is to validate your workflow. Clicking that will tell you whether your wf is valid or not. Finally, let's submit the wf. Notice anything familiar? WFM uses Oozie REST API to work with Oozie and doesn't introduce anything new to the formula except for an awesome UI, when you submit a wf, as a result you will get an Ooozie JOB ID. So what do you do next, go to Oozie UI right? No, WFM got you covered there too, it has a dashboard tab that loads the whole Oozie wf job history as well as anything current. You can now monitor your job status in this UI! Once job completes, you can navigate to the output and view the result of the wf. The output will be in the STDOUT logs of the associated YARN job.
... View more
Labels:
01-11-2017
07:11 PM
@Saumil Mayani please confirm the steps to add user/group described in this thread https://community.hortonworks.com/questions/50073/adding-new-ambari-user-with-assigned-group-with-ap.html work.
... View more