Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Oozie streaming fails: stream.map.streamprocessor not set In JobConf [CDH 5.2]

avatar
Explorer

 Hi,

 

We have CDH 5.1 running on production for a few month already with no issues.

Recently we've created another cluster for qa environment and installed CDH 5.2 through Cloudera Manager.

And when we tried to some Oozie workflows (the same jobs as on production) we got following error:

 

Error: java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.LongWritable, received org.apache.hadoop.io.Text

 

I've figured this error occured because hadoop tried to use IdentityMapper class instead of our streaming processors.

I've tried a lot of different options but nothing helped so far.

 

The closest I could get is to compare actual jobConf files that we get on production (CDH 5.1) and on new cluster (CDH 5.2). And I figured that on new cluster jobConf doesn't contain following properties:

 

stream.map.streamprocessor

stream.reduce.streamprocessor

 

which if I understand correctly are used by hadoop-streaming.jar

 

Have no idea where to look next.

I would really appreciate any help with this issue. 

 

Thanks,

Anatoly

1 ACCEPTED SOLUTION

avatar
Mentor
This is being caused due to
https://issues.apache.org/jira/browse/OOZIE-2102. It will be resolved
in a future release of CDH5.

View solution in original post

7 REPLIES 7

avatar
Explorer

Anyone?

avatar
Mentor
Which streaming jar are you using specifically? Are you using the Oozie streaming action?

Can you share your actual job-launching command (or the relevant Oozie workflow and associated scripts/files)?

avatar
Explorer

Yes, I'm using the Oozie streaming action. And streaming jar is the one which is bundled with CDH.

 

What is strange is that we use the same workflows and coordinators in our production cluster which is on CDH 5.1.

 

Here is example workflow action we use:

 

<action name="raw-pass" retry-max="3" retry-interval="1">
        <map-reduce>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${rawPassOutput}"/>
            </prepare>
            <streaming>
                <mapper>mapreducers/bin/mapred run MyJobMapper</mapper>
                <reducer>mapreducers/bin/mapred run MyJobReducer</reducer>
            </streaming>
            <configuration>
                <property>
                    <!--
                        This will add avro.jar and avro-mapred.jar dependencies to the job
                        (@see mapred.input.format.class property below)
                    -->
                    <name>oozie.action.sharelib.for.map-reduce</name>
                    <value>mapreduce-streaming,hcatalog,sqoop</value>
                </property>
                <property>
                    <name>mapred.reduce.tasks</name>
                    <value>1</value>
                </property>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
                <property>
                    <name>mapred.input.dir</name>
                    <value>${rawPassInput}</value>
                </property>
                <property>
                    <name>mapred.output.dir</name>
                    <value>${rawPassOutput}</value>
                </property>
                <property>
                    <!--
                        This input format will automagically decode avro files so
                        that our mappers will get plain json as input.
                    -->
                    <name>mapred.input.format.class</name>
                    <value>org.apache.avro.mapred.AvroAsTextInputFormat</value>
                </property>
            </configuration>
            <archive>${mapreducersArchive}#mapreducers</archive>
        </map-reduce>
        <ok to="aggregate-pass"/>
        <error to="failure-email-notification"/>
    </action>

 

I can not share actual mapper and reducer scripts though.

 

Also - is there a relatively easy way to downgrade CDH installation from CDH 5.2 to 5.1? Or at least to install 5.1 from scratch? I'm not sure there was an option in Cloudera Manager to install 5.1, only 5.2...

 

Thanks!

avatar
Explorer

Ok, so I've managed to downgrade cluster to 5.1 and start my coordinator jobs. It seems to be working now as expected.

 

So it seems to me there is some changes which breaks BC, but I really don't know what it is. At least I know that we shouldn't upgrade to 5.2 until we figure out how to solve this.

avatar
Mentor
This is being caused due to
https://issues.apache.org/jira/browse/OOZIE-2102. It will be resolved
in a future release of CDH5.

avatar
Explorer

Thanks Harsh!

 

Really looking forward for this to be released! Just to clarify: is it going to be included in 5.3.x or only 5.4?

avatar
Mentor
It will be resolved in the next 5.2.x and 5.3.x bugfix releases (along with 5.4.0 in future).