The NameNode is reporting this error - Googling returns nothing helpful, unfortunately.
IPC Server handler 15 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 192.168.1.69:54475: error: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:329) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1409) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:413) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:172) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44938) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)
I'm going over the documents again, I realized the issues around Hive and the metastore, however this is a new experience with regards to this error (I have performed a previous upgrade on a very similar setup with no issues.)
The one aspect that does stand out is all of the datanodes are operating fine, the issue is only seen on some 'helper' nodes that interact with HDFS but do not provide HDFS services.
I've finalized the upgrade, removed old packages, restarted the Cloudera agents, checked for old libraries in use... presumably there has to be something from 4.1.3 mucking up the works here or something missing, what would you speculate?
Your time is valuable and I thank you for sharing with us.
Yes, deployed client configurations a few times, shutdown all services on a node, added the HDFS gateway service to all machines, even performed a reboot just in case....
Pretty odd. The machines still behave as though they are connected fine and are able to interact with HDFS, except for reading files. This is a dev environment, so we also have security disabled at the moment.
Is there any known issues with installing the j2sdk? I selected Install Oracle Java SE Development Kit (JDK) during the install...
The Host Inspector is happy with the setup. All services are healthy besides the HDFS canary check (failed to read file.) It is on an affected node.
Also attempted re-run the upgrade wizard, and that results in hosts being stuck at Acquiring installation lock...