Created on 11-14-2014 10:40 AM - edited 09-16-2022 02:12 AM
Hi,
We're dealing with an odd issue - we can access HDFS from all nodes, eg: hdfs dfs -ls / works on all systems.
We try hdfs dfs -cat /foo.txt and it does not work on some systems, but works fine on others.
The affected systems have this behavior:
$ hdfs dfs -cat /foo.txt
cat: java.lang.NullPointerException
Any ideas?
Thank you for any responses.
Created 11-17-2014 02:30 PM
It's working now.
After 'refreshing the cluster' and restarting HDFS things picked up. As mentioned, I had already deployed client configs and restarted HDFS, which didn't work for whatever reason...
I'm not sure why this didn't occur before in any previous install or why it only happened to some systems while others worked. Bizarre experience overall.
In any case - this issue has been resolved.
Kudos 🙂
Created 11-14-2014 10:49 AM
Adding a note -
The affected nodes are able to put files into HDFS as well as move files already within HDFS, but cat and get operations fail.
Created 11-14-2014 07:17 PM
What did you upgrade from?
What is in the NN logging when you are seeing these commands fail?
The upgrade process is documented here, the extended requirements might be necessary based on what version you were on when you started...
Created 11-17-2014 07:57 AM
Thanks for the assist! 🙂
The previous version was 4.1.3.
The NameNode is reporting this error - Googling returns nothing helpful, unfortunately.
IPC Server handler 15 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 192.168.1.69:54475: error: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:329)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1409)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:413)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:172)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44938)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)
I'm going over the documents again, I realized the issues around Hive and the metastore, however this is a new experience with regards to this error (I have performed a previous upgrade on a very similar setup with no issues.)
The one aspect that does stand out is all of the datanodes are operating fine, the issue is only seen on some 'helper' nodes that interact with HDFS but do not provide HDFS services.
I've finalized the upgrade, removed old packages, restarted the Cloudera agents, checked for old libraries in use... presumably there has to be something from 4.1.3 mucking up the works here or something missing, what would you speculate?
Your time is valuable and I thank you for sharing with us.
Created 11-17-2014 11:14 AM
Did you pdate client configs with a "deploy client configurations" by any chance? I assume you are running gateway instances on the nodes things are failing from?
Created 11-17-2014 12:37 PM
Yes, deployed client configurations a few times, shutdown all services on a node, added the HDFS gateway service to all machines, even performed a reboot just in case....
Pretty odd. The machines still behave as though they are connected fine and are able to interact with HDFS, except for reading files. This is a dev environment, so we also have security disabled at the moment.
Is there any known issues with installing the j2sdk? I selected Install Oracle Java SE Development Kit (JDK) during the install...
The Host Inspector is happy with the setup. All services are healthy besides the HDFS canary check (failed to read file.) It is on an affected node.
Also attempted re-run the upgrade wizard, and that results in hosts being stuck at Acquiring installation lock...
Any ideas?
Thanks again.
Created 11-17-2014 02:30 PM
It's working now.
After 'refreshing the cluster' and restarting HDFS things picked up. As mentioned, I had already deployed client configs and restarted HDFS, which didn't work for whatever reason...
I'm not sure why this didn't occur before in any previous install or why it only happened to some systems while others worked. Bizarre experience overall.
In any case - this issue has been resolved.
Kudos 🙂