Support Questions

Anatoly · ‎11-12-2014

Hi,

We have CDH 5.1 running on production for a few month already with no issues.

Recently we've created another cluster for qa environment and installed CDH 5.2 through Cloudera Manager.

And when we tried to some Oozie workflows (the same jobs as on production) we got following error:

Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.LongWritable, received org.apache.hadoop.io.Text

I've figured this error occured because hadoop tried to use IdentityMapper class instead of our streaming processors.

I've tried a lot of different options but nothing helped so far.

The closest I could get is to compare actual jobConf files that we get on production (CDH 5.1) and on new cluster (CDH 5.2). And I figured that on new cluster jobConf doesn't contain following properties:

stream.map.streamprocessor

stream.reduce.streamprocessor

which if I understand correctly are used by hadoop-streaming.jar

Have no idea where to look next.

I would really appreciate any help with this issue.

Thanks,

Anatoly

Harsh J · ‎12-27-2014

This is being caused due to
https://issues.apache.org/jira/browse/OOZIE-2102. It will be resolved
in a future release of CDH5.

View solution in original post

Anatoly · ‎11-24-2014

Anyone?

Harsh J · ‎11-30-2014

Which streaming jar are you using specifically? Are you using the Oozie streaming action?

Can you share your actual job-launching command (or the relevant Oozie workflow and associated scripts/files)?

Anatoly · ‎11-30-2014

Yes, I'm using the Oozie streaming action. And streaming jar is the one which is bundled with CDH.

What is strange is that we use the same workflows and coordinators in our production cluster which is on CDH 5.1.

Here is example workflow action we use:

<action name="raw-pass" retry-max="3" retry-interval="1">
        <map-reduce>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${rawPassOutput}"/>
            </prepare>
            <streaming>
                <mapper>mapreducers/bin/mapred run MyJobMapper</mapper>
                <reducer>mapreducers/bin/mapred run MyJobReducer</reducer>
            </streaming>
            <configuration>
                <property>
                    <!--
                        This will add avro.jar and avro-mapred.jar dependencies to the job
                        (@see mapred.input.format.class property below)
                    -->
                    <name>oozie.action.sharelib.for.map-reduce</name>
                    <value>mapreduce-streaming,hcatalog,sqoop</value>
                </property>
                <property>
                    <name>mapred.reduce.tasks</name>
                    <value>1</value>
                </property>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
                <property>
                    <name>mapred.input.dir</name>
                    <value>${rawPassInput}</value>
                </property>
                <property>
                    <name>mapred.output.dir</name>
                    <value>${rawPassOutput}</value>
                </property>
                <property>
                    <!--
                        This input format will automagically decode avro files so
                        that our mappers will get plain json as input.
                    -->
                    <name>mapred.input.format.class</name>
                    <value>org.apache.avro.mapred.AvroAsTextInputFormat</value>
                </property>
            </configuration>
            <archive>${mapreducersArchive}#mapreducers</archive>
        </map-reduce>
        <ok to="aggregate-pass"/>
        <error to="failure-email-notification"/>
    </action>

I can not share actual mapper and reducer scripts though.

Also - is there a relatively easy way to downgrade CDH installation from CDH 5.2 to 5.1? Or at least to install 5.1 from scratch? I'm not sure there was an option in Cloudera Manager to install 5.1, only 5.2...

Thanks!

Anatoly · ‎12-01-2014

Ok, so I've managed to downgrade cluster to 5.1 and start my coordinator jobs. It seems to be working now as expected.

So it seems to me there is some changes which breaks BC, but I really don't know what it is. At least I know that we shouldn't upgrade to 5.2 until we figure out how to solve this.

Harsh J · ‎12-27-2014

This is being caused due to
https://issues.apache.org/jira/browse/OOZIE-2102. It will be resolved
in a future release of CDH5.

Anatoly · ‎12-30-2014

Thanks Harsh!

Really looking forward for this to be released! Just to clarify: is it going to be included in 5.3.x or only 5.4?

Harsh J · ‎01-06-2015

It will be resolved in the next 5.2.x and 5.3.x bugfix releases (along with 5.4.0 in future).

Cloudera Community

Support Questions

Oozie streaming fails: stream.map.streamprocessor not set In JobConf [CDH 5.2]