Support Questions

Find answers, ask questions, and share your expertise

Why Falcon pipeline is failing?

avatar
Super Collaborator

Hello guys,

Currently,my requirement is to process data of 3 hive tables and store it's result into another hive table.Therefore,I have created 3 hive uri input feed ,1 hive uri output feed and 1 process which is accepting this 3 feed entity as input and genereting output in output feed.Process entity giving error as:

FAILED: RuntimeException MetaException(message:java.lang.ClassNotFoundException Classorg.openx.data.jsonserde.JsonSerDe not found)

Actually i am getting this error in oozie when scheduled process entity gets failed

I understood that error is so simple and it's caused because of serDe jar is missing somewhere within hive/lib or oozir/share/lib.

I tried the following solutions:

1) add a jar to hive lib folder or add it to oozie lib.

Jar:

json-serde-1.3.8-SNAPSHOT-jar- with-dependencies.jar;.

But still getting same error.if i add the jar through hive cli using ADD JAR command then query run smoothly there.

My xml are:

1) observationInputFeed.xml

<feed xmlns='uri:falcon:feed:0.1' name='observationInputFeed' description='This is observation table'> <tags>table=observation</tags> <frequency>hours(1)</frequency> <timezone>UTC</timezone> <clusters> <cluster name='hiveCluster' type='source'> <validity start='2016-05-31T09:00Z' end='2016-06-06T09:00Z'/> <retention limit='days(1)' action='delete'/> <table uri='catalog:falconexample:observation1#ds=${YEAR}-${MONTH}-${DAY}-${HOUR}'/> </cluster> </clusters> <table uri='catalog:falconexample:observation1#ds=${YEAR}-${MONTH}-${DAY}-${HOUR}'/> <ACL owner='ambari-qa' group='users' permission='0755'/> <schema location='hcat' provider='hcat'/>

______________________________________________

2) patientInputFeed.xml

<feed xmlns='uri:falcon:feed:0.1' name='patientInputFeed' description='This is Patient table'> <tags>table=patient</tags> <frequency>hours(1)</frequency> <timezone>UTC</timezone> <clusters> <cluster name='hiveCluster' type='source'> <validity start='2016-05-31T09:00Z' end='2016-06-06T09:00Z'/> <retention limit='days(1)' action='delete'/> <table uri='catalog:falconexample:patient1#ds=${YEAR}-${MONTH}-${DAY}-${HOUR}'/> </cluster> </clusters> <table uri='catalog:falconexample:patient1#ds=${YEAR}-${MONTH}-${DAY}-${HOUR}'/> <ACL owner='ambari-qa' group='users' permission='0755'/> <schema location='hcat' provider='hcat'/> </feed>

____________________________________________

3) patientprocessedOutputFeed.xml

<feed xmlns='uri:falcon:feed:0.1' name='patientprocessedOutputFeed' description='This is patientprocessed table'> <tags>table=observationInputFeed</tags> <frequency>hours(1)</frequency> <timezone>UTC</timezone> <clusters> <cluster name='hiveCluster' type='source'> <validity start='2016-05-31T09:00Z' end='2016-06-06T09:00Z'/> <retention limit='days(1)' action='delete'/> <table uri='catalog:falconexample:Patient_proce#ds=${YEAR}-${MONTH}-${DAY}-${HOUR}'/> </cluster> </clusters> <table uri='catalog:falconexample:Patient_proce#ds=${YEAR}-${MONTH}-${DAY}-${HOUR}'/> <ACL owner='ambari-qa' group='users' permission='0755'/> <schema location='hcat' provider='hcat'/> </feed>

_____________________________________________

4) And 4th one is also very similar to above feed xml except table name.

______________________________________________

Please help my guys,

i dont understand why oozie throwing ClassNotFoundException message:java.lang.ClassNotFoundException even though jar is present at proper location.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hello Guys,

The error is has been solved ,I have solved it by adding additional statement in hive script along with above query as

statement:-

add jar hdfs://<hostname>:8020//user/oozie/share/lib/lib_20160503082834/hive/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar;

View solution in original post

5 REPLIES 5

avatar
Rising Star

Try restarting Hive to pick up the jar.

avatar
Super Collaborator

Thanks cnormile,

Could you please tell me,generally at which location we should put that serDe jar so that falcon/oozie will also pick that jar while running the data pipeline.

As I already mentioned that jar is present at both the locations(i.e, In user/oozie/share/lib/hive and usr/hdp/hdp-<version>/hive/lib) and restarted already hive and oozie both but stiill prompting same error.

avatar

@Manoj Dhake

Jar file is missing:

0: jdbc:hive2://192.168.56.101:10000> ADD JAR /tmp/hive-json-serde-0.2.jar; No rows affected (0.231 seconds)

0: jdbc:hive2://192.168.56.101:10000> select * from my_table;

Link might give you more example.

avatar
Super Collaborator

Thank you Sri,

I followed the link which you have sent to me but still getting same error

1) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.openx.data.jsonserde.JsonSerDe not found.

Actually query which am running using falcon is,

INSERT OVERWRITE TABLE falconexample.Patient_proce PARTITION (${falcon_output_partitions_hive}) select p.id,p.gender, p.Age, p.birthdate, o.component[1].valuequantity.value, o.component[1].valuequantity.unit from (select *, floor(datediff(to_date(from_unixtime(unix_timestamp())), to_date(birthdate)) / 365.25) as Age FROM falconexample. patient1) p inner join falconexample.DiagnosticReport1 d on p.id = substr(d.subject.reference,9) inner join falconexample.Observation1 o on p.id = substr(o.subject.reference,9) where p.Age>17 and p.Age<86 and o.component[1].valuequantity.value <140;

If I write statement in hive script as "ADD JAR /user/oozie/share/lib/lib_20160503082834/hive/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar;" then I get an error like

2) java.lang.IllegalArgumentException: /user/oozie/share/lib/lib_20160503082834/hive/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar does not exist.

and if I remove that statement then I get 1st error.

Following solutions I have tried to resolve this issue.

1)added serDe jar at oozie share lib folder.(HDFS location)

2)added serDe jar at hive lib(Local FS).

3)added serDe jat at "falcon.libpath" location(i.e,/apps/falcon/pigCluster/working/lib)(HDFS location)

4)added jar at /apps/falcon/pigCluster/staging/falcon/workflows/process/patientDataProcess/6b7edfbfe5bcfc50e0fc845f71cd9122_1464767974691/DEFAULT/lib(HDFS location)

every where I have put the jar but again getting 1st error.I dont know what is happening with this stuff?

why falcon does not getting that jar?

I am posting log file which I found under falcon working direcoty user-action-hive-failed.txt

avatar
Super Collaborator

Hello Guys,

The error is has been solved ,I have solved it by adding additional statement in hive script along with above query as

statement:-

add jar hdfs://<hostname>:8020//user/oozie/share/lib/lib_20160503082834/hive/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar;