Support Questions

Find answers, ask questions, and share your expertise

upgraded CDH to 4.7 via CM, all successful, some HDFS ops failing on some nodes

avatar
Contributor

Hi,

 

We're dealing with an odd issue - we can access HDFS from all nodes, eg:  hdfs dfs -ls / works on all systems.

 

We try hdfs dfs -cat /foo.txt and it does not work on some systems, but works fine on others.

 

The affected systems have this behavior:

 

$ hdfs dfs -cat /foo.txt
cat: java.lang.NullPointerException

 

Any ideas?

 

Thank you for any responses.

1 ACCEPTED SOLUTION

avatar
Contributor

It's working now.

 

After 'refreshing the cluster' and restarting HDFS things picked up.  As mentioned, I had already deployed client configs and restarted HDFS, which didn't work for whatever reason... 

 

I'm not sure why this didn't occur before in any previous install or why it only happened to some systems while others worked.  Bizarre experience overall.

 

In any case - this issue has been resolved.  

 

Kudos 🙂

View solution in original post

6 REPLIES 6

avatar
Contributor

Adding a note - 

 

The affected nodes are able to put files into HDFS as well as move files already within HDFS, but cat and get operations fail.

avatar
Master Collaborator

What did you upgrade from?

 

What is in the NN logging when you are seeing these commands fail?

 

The upgrade process is documented here, the extended requirements might be necessary based on what version you were on when you started...

 

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.7.0/Cloudera-Manager-Install...

avatar
Contributor

Thanks for the assist! 🙂 

 

The previous version was 4.1.3.

 

The NameNode is reporting this error - Googling returns nothing helpful, unfortunately.

 

IPC Server handler 15 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 192.168.1.69:54475: error: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:329)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1409)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:413)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:172)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44938)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

 

I'm going over the documents again, I realized the issues around Hive and the metastore, however this is a new experience with regards to this error (I have performed a previous upgrade on a very similar setup with no issues.)  

 

The one aspect that does stand out is all of the datanodes are operating fine, the issue is only seen on some 'helper' nodes that interact with HDFS but do not provide HDFS services.

 

I've finalized the upgrade, removed old packages, restarted the Cloudera agents, checked for old libraries in use... presumably there has to be something from 4.1.3 mucking up the works here or something missing, what would you speculate?

 

Your time is valuable and I thank you for sharing with us. 

avatar
Master Collaborator

Did you pdate client configs with a "deploy client configurations" by any chance?  I assume you are running gateway instances on the nodes things are failing from?

avatar
Contributor

Yes, deployed client configurations a few times, shutdown all services on a node, added the HDFS gateway service to all machines, even performed a reboot just in case....

 

Pretty odd.  The machines still behave as though they are connected fine and are able to interact with HDFS, except for reading files.  This is a dev environment, so we also have security disabled at the moment. 

 

Is there any known issues with installing the j2sdk? I selected Install Oracle Java SE Development Kit (JDK) during the install...

 

The Host Inspector is happy with the setup.  All services are healthy besides the HDFS canary check (failed to read file.) It is on an affected node.

 

Also attempted re-run the upgrade wizard, and that results in hosts being stuck at Acquiring installation lock...

 

Any ideas? 

 

Thanks again.

 

 

 

 

avatar
Contributor

It's working now.

 

After 'refreshing the cluster' and restarting HDFS things picked up.  As mentioned, I had already deployed client configs and restarted HDFS, which didn't work for whatever reason... 

 

I'm not sure why this didn't occur before in any previous install or why it only happened to some systems while others worked.  Bizarre experience overall.

 

In any case - this issue has been resolved.  

 

Kudos 🙂