SYMPTOM: Hive jobs failing on Production Aggregation cluster by giving j"ava.net.UnknownHostException: Matrix-Aggr" Error. Matrix-Aggr is the nameservice for Namenode HA.
ERROR: Error log is as below -
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: Matrix-Aggr
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:312)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:178)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:665)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:601)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:148)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2619)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2635)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.getSchemaFromFS(AvroSerdeUtils.java:149)
at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:110)
at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.getSchema(AvroGenericRecordReader.java:112)
at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.<init>(AvroGenericRecordReader.java:70)
at org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:65)
... 16 more
Caused by: java.net.UnknownHostException: Matrix-Aggr
... 33 more
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
ROOT CAUSE: HDP-2.2.4 has a bug where they reset client configuration with proper HA settings in AvroSerdeUtils.java at below line and hence we get UnknownHostException
Schema s = getSchemaFromFS(schemaString, new Configuration());
RESOLUTION: This is fixed in recent version via HIVE-9299. We can workaround it by using file:// for avro.schema.url and keeping the schema file in all NodeManager machines. You might need to request for patch to HWX as workaround else get upgraded HDP to latest version.