About jdb

jdb · ‎01-26-2016

I'm a newbie in Oozie and I've read some Oozie shell action examples but this got me confused about certain things. There are examples I've seen where there is no <file> tag. Some example, like in Cloudera here, repeats the shell script in file tag: <shell xmlns="uri:oozie:shell-action:0.2"> <exec>check-hour.sh</exec> <argument>${earthquakeMinThreshold}</argument> <file>check-hour.sh</file> </shell> While in Oozie's website, writes the shell script (the reference `${EXEC}` from job.properties, which points to script.sh file) twice, separated by #. <shell xmlns="uri:oozie:shell-action:0.1"> ... <exec>${EXEC}</exec> <argument>A</argument> <argument>B</argument> <file>${EXEC}#${EXEC}</file> </shell> There are also examples I've seen where the path (HDFS or local?) is prepended before the `script.sh#script.sh` within the <file> tag. <shell xmlns="uri:oozie:shell-action:0.1"> ... <exec>script.sh</exec> <argument>A</argument> <argument>B</argument> <file>/path/script.sh#script.sh</file> </shell> As I understand, any shell script file can be included in the workflow HDFS path (same path where workflow.xml resides). Can someone explain the differences in these examples and how `<exec>`, `<file>`, `script.sh#script.sh`, and the `/path/script.sh#script.sh` are used?

jdb · ‎01-21-2016

I have installed Cloudera CDH QuickStart VM 5.5, and I'm running a Sqoop action in my Oozie workflow. I encountered an error that says MySQL JDBC driver is missing and I came across to a SO answer that says the mysql-connector-java.jar should be placed in Oozie's HDFS shared lib path, under `sqoop` path. When I browse the Oozie's HDFS shared lib path, however, I've noticed two `sqoop` subdirectories to copy the jar. /user/oozie/share/lib/sqoop and /user/oozie/share/lib/lib_20151118030154/sqoop Aside from `sqoop`, `hive`, `pig`, `distcp`, and `mapreduce-streaming` paths also exist on both `lib` and `lib/lib_20151118030154`. So the question is: where do I place my connector jar: on the first or the second one? What's the difference (or difference of purpose) of these two paths in relation to jars of `sqoop`, `hive`, `pig`, `distcp`, and `mapreduce-streaming` for Oozie?

jdb · ‎12-08-2015

Thanks for your response. The problem is already solved. My Java action uses an instance (say variable fs) of org.apache.hadoop.fs.FileSystem class. At the end of the Java action, I use fs.close(), which will cause the problem on the next period of Oozie job. So when I removed this line, everything went well again.

jdb · ‎12-06-2015

I have Hive tables that point to JSON files as contents and these tables need JSON SerDe jar (from here) in order to query the tables. In Cloudera VM (Quickstart CDH 5.4.0), I can simply execute in Hive or Beeline CLI: ADD JAR /<local-path>/json-serde-1.0.jar; and then I am able to perform SELECT queries on my Hive tables. I need to use these Hive tables as data sources for my Tableau (installed in Windows, my host machine), so I start the Thrift server in Spark. For Hive tables that does not contain JSON (and does not require the SerDe), Tableau can connect and read the tables easily. When it comes to the Hive tables that contain JSON data, however, it looks like Tableau cannot find the Hive JSON SerDe jar, and I get the following error: 'java.lang.RuntimeException: MetaException(message:java.lang.ClassNotFoundException Class org.openx.data.jsonserde.JsonSerDe not found)'. How do I add the Hive JSON SerDe jar so that Tableau can read the Hive JSON tables?

jdb · ‎11-23-2015

I have an Oozie coordinator that runs a workflow every hour. The workflow is composed of two sequential actions: a shell action and a Java action. When I run the coordinator, the shell action seems to execute successfully, however, when it's time for the Java action, the Job Browser in Hue always show: There was a problem communicating with the server: Job application_<java-action-id> has expired. When I click on the application_id, here's the snapshot: This seems to point on views.py and api.py. When I looked into server logs: [23/Nov/2015 02:25:22 -0800] middleware INFO Processing exception: Job application_1448245438537_0010 has expired.: Traceback (most recent call last): File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/core/handlers/base.py", line 112, in get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.6.10-py2.6.egg/django/db/transaction.py", line 371, in inner return func(*args, **kwargs) File "/usr/lib/hue/apps/jobbrowser/src/jobbrowser/views.py", line 67, in decorate raise PopupException(_('Job %s has expired.') % jobid, detail=_('Cannot be found on the History Server.')) PopupException: Job application_1448245438537_0010 has expired. When I run the workflow as a standalone, I've got a 50-50 chance of success and expiration on the Java action part, but on coordinator, all Java action's are expiring. I'm using Cloudera Quickstart CDH 5.4.0

jdb · ‎07-20-2015

I coded SparkSQL that accesses Hive tables, in Java, and packaged a jar file that can be run using spark-submit. Now I want to run this jar as an Oozie workflow (and coordinator, if I make workflow to work). When I try to do that, the job fails and I get in Oozie job logs java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf What I did was to look for the jar in $HIVE_HOME/lib that contains that class, copy that jar in the lib path of my Oozie workflow root path and add this to workflow.xml in the Spark Action: <spark-opts> --jars lib/*.jar</spark-opts> But this leads to another java.lang.NoClassDefFoundError that points to another missing class, so I did the process again of looking for the jar and copying, run the job and the same thing goes all over. It looks like it needs the dependency to many jars in my Hive lib. What I don't understand is when I use spark-submit in the shell using the jar, it runs OK, I can SELECT and INSERT into my Hive tables. It is only when I use Oozie that this occurs. It looks like that Spark can't see the Hive libraries anymore when contained in an Oozie workflow job. Can someone explain how this happens? How do I add or reference the necessary classes / jars to the Oozie path? I am using Cloudera Quickstart VM CDH 5.4.0, Spark 1.4.0, Oozie 4.1.0.

Online	Offline
Last Visited	‎09-16-2016 09:19 PM

Member Since	‎07-19-2015 07:13 PM
Last Visited	‎09-16-2016 09:19 PM
Posts	10
Kudos received	1

Cloudera Community

Re: Java action expiring in Oozie

Oozie shell action: exec and file tags

Oozie Shared Lib: where to place jars

Re: Java action expiring in Oozie

Adding Hive SerDe jar on SparkSQL Thrift Server

Java action expiring in Oozie

Add CLASSPATH to Oozie workflow job