I am running hdfs fsck / on a cluster with HDP 2.4.2. Unfortunately, it breaks with an "IOException: Premature EOF":
$ hadoop version Hadoop 184.108.40.206.4.2.10-1 Subversion firstname.lastname@example.org:hortonworks/hadoop.git -r 183deee535d7360e487dccb66071a10995a304ac Compiled by jenkins on 2016-07-12T20:29Z Compiled with protoc 2.5.0 From source with checksum 2a2d95f05ec6c3ac547ed58cab713ac This command was run using /usr/hdp/220.127.116.11-1/hadoop/hadoop-common-18.104.22.168.4.2.10-1.jar $ sudo -u hdfs hdfs fsck / [...] .................. Exception in thread "main" java.io.IOException: Premature EOF at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:565) at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609) at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696) at java.io.FilterInputStream.read(FilterInputStream.java:133) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3336) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) at java.io.BufferedReader.readLine(BufferedReader.java:389) at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:346) at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:73) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:151) at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:148) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:379)
Any hints would be highly appreciated.
You are running into a security issue. This seems like a Kerberized cluster and you probably need to do a kinit first before you can run the command. If you have done kinit, then the user you did kinit with appears to not have permission to impersonate user "hdfs".
Thanks a lot for your advice.
It might not be security related. Impersonation works just fine when checking for a subset of the filesystem, e.g.:
$ sudo -u hdfs hdfs fsck /apps Connecting to namenode via http://********:50070/fsck?ugi=hdfs&path=%2Fapps FSCK started by hdfs (auth:SIMPLE) from /***.***.***.*** for path /apps at Fri Nov 25 19:14:54 UTC 2016 ..Status: HEALTHY Total size: 269336194 B Total dirs: 6 Total files: 2 Total symlinks: 0 Total blocks (validated): 3 (avg. block size 89778731 B) Minimally replicated blocks: 3 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 40 Number of racks: 4 FSCK ended at Fri Nov 25 19:14:54 UTC 2016 in 0 milliseconds The filesystem under path '/apps' is HEALTHY
But a check on the full HDFS fails with Premature EOF. Any other ideas?
That's probably because the user you did a kinit with has access to "/apps" but not to "/". I am also surprised by "auth:Simple". If your cluster is Kerberized then auth:Simple should not work. Have you done a kinit?
If you look at your stacktrace, it has the following exceptions which indicates a Kerberos/Permission issue
at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
Thanks for your reply. The cluster is not kerberized. I am running fsck as user "hdfs", which owns both directories.
The stacktrace seems to indicate that doPrivileged called DFSck$1.run. So impersonation should have been successful.
@mqureshi Thanks for your reply. I checked the namenode logs. Unfortunately I see no messages which seem to relate to the fsck. Also, there are no WARN or ERROR messages during the timeframe of the fsck. The only message that seems correlated is:
2016-11-28 16:55:49,198 INFO namenode.NameNode (NamenodeFsck.java:fsck(327)) - FSCK started by hdfs (auth:SIMPLE) from /***.***.***.*** for path / at Mon Nov 28 16:55:49 UTC 2016
Also restarting the namenode did not improve the situation.
Do you think it could be related to https://issues.apache.org/jira/browse/HDFS-7175 ?
I don't think it's this issue because we don't see socket timeout in your stack trace. I have a tedious suggestion if you don't mind trying. It might help us narrow down our cause. Like you ran fsck on /app folder, if you don't have many folders in your root, then can you try running fsck on each of these one by one?