03-06-2019 11:56 AM - edited 03-06-2019 11:57 AM
I'm new to Hadoop. I'm managing a legacy system and can't work out how it's configured.
I'm trying to run a Hadoop MR job inside a Docker container, but it behaves differently than when it's outside a container. Inside the container, it fails with:
19/03/06 19:22:47 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid "mapreduce.jobtracker.address" configuration value for LocalJobRunner : "hadoop-prod01:8021"
This made me think that maybe it should be configured with mapreduce.framework.name=yarn. The thing is, in the debug logs outside the container, I see no evidence of yarn (nor does mapred-site.xml show any sign of it). Moreover, even the messages of the form "mapreduce.Cluster: Trying ClientProtocolProvider ..." don't appear there, so I can't tell what it's trying to do.
Both versions of Hadoop are the same (2.0.0-cdh4.2.1) and the config files in /etc/hadoop are identical. Clearly I've got something misconfigured, but I don't even know where to start.
03-06-2019 05:05 PM
Okay, seems related to this: https://issues.apache.org/jira/browse/SQOOP-1464
But I don't understand how to install both Hadoop 2.0.0-cdh4.2.1 and Hadoop 2.0 (if that's what he's saying).