Created on 11-12-2014 10:18 PM - edited 09-16-2022 02:12 AM
Hi,
We have CDH 5.1 running on production for a few month already with no issues.
Recently we've created another cluster for qa environment and installed CDH 5.2 through Cloudera Manager.
And when we tried to some Oozie workflows (the same jobs as on production) we got following error:
Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.LongWritable, received org.apache.hadoop.io.Text
I've figured this error occured because hadoop tried to use IdentityMapper class instead of our streaming processors.
I've tried a lot of different options but nothing helped so far.
The closest I could get is to compare actual jobConf files that we get on production (CDH 5.1) and on new cluster (CDH 5.2). And I figured that on new cluster jobConf doesn't contain following properties:
stream.map.streamprocessor
stream.reduce.streamprocessor
which if I understand correctly are used by hadoop-streaming.jar
Have no idea where to look next.
I would really appreciate any help with this issue.
Thanks,
Anatoly
Created 12-27-2014 08:19 PM
Created 11-24-2014 09:33 PM
Anyone?
Created 11-30-2014 03:50 AM
Created 11-30-2014 12:10 PM
Yes, I'm using the Oozie streaming action. And streaming jar is the one which is bundled with CDH.
What is strange is that we use the same workflows and coordinators in our production cluster which is on CDH 5.1.
Here is example workflow action we use:
<action name="raw-pass" retry-max="3" retry-interval="1"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${rawPassOutput}"/> </prepare> <streaming> <mapper>mapreducers/bin/mapred run MyJobMapper</mapper> <reducer>mapreducers/bin/mapred run MyJobReducer</reducer> </streaming> <configuration> <property> <!-- This will add avro.jar and avro-mapred.jar dependencies to the job (@see mapred.input.format.class property below) --> <name>oozie.action.sharelib.for.map-reduce</name> <value>mapreduce-streaming,hcatalog,sqoop</value> </property> <property> <name>mapred.reduce.tasks</name> <value>1</value> </property> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>mapred.input.dir</name> <value>${rawPassInput}</value> </property> <property> <name>mapred.output.dir</name> <value>${rawPassOutput}</value> </property> <property> <!-- This input format will automagically decode avro files so that our mappers will get plain json as input. --> <name>mapred.input.format.class</name> <value>org.apache.avro.mapred.AvroAsTextInputFormat</value> </property> </configuration> <archive>${mapreducersArchive}#mapreducers</archive> </map-reduce> <ok to="aggregate-pass"/> <error to="failure-email-notification"/> </action>
I can not share actual mapper and reducer scripts though.
Also - is there a relatively easy way to downgrade CDH installation from CDH 5.2 to 5.1? Or at least to install 5.1 from scratch? I'm not sure there was an option in Cloudera Manager to install 5.1, only 5.2...
Thanks!
Created 12-01-2014 12:18 PM
Ok, so I've managed to downgrade cluster to 5.1 and start my coordinator jobs. It seems to be working now as expected.
So it seems to me there is some changes which breaks BC, but I really don't know what it is. At least I know that we shouldn't upgrade to 5.2 until we figure out how to solve this.
Created 12-27-2014 08:19 PM
Created 12-30-2014 04:37 PM
Thanks Harsh!
Really looking forward for this to be released! Just to clarify: is it going to be included in 5.3.x or only 5.4?
Created 01-06-2015 10:15 PM