Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDP on Isilon - fsck problem

avatar
Explorer

Hello,

I have a working hdp 2.3 on EMC Isilon. I'm able to write to hdfs and "hdfs dfs" set of commands is working ok.

But I have a problem to run a "hdfs fsck /" command, I get an error.

Maybe some of you guys have any idea obout it? I will appreciate.

[hdfs@hostname ~]$ hdfs fsck /

Connecting to namenode via http://xx.yy.ad.si:8082/fsck?ugi=hdfs&path=%2F Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: http://xx.yy.ad.si:8082/fsck?ugi=hdfs&path=%2F

at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1839)

at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)

at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:339)

at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:73)

at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:151)

at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:148)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:147)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:379)

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Matjaz Skerjanec:

As @emaxwell said, this stack trace error is clunky but really it's a good thing. OneFS is unable to respond successfully to "hdfs fsck" because fsck is not how data integrity is protected, managed or alerted by Isilon.

Here are a few quotes from Isilon documentation.

In the OneFS Technical Overview (https://www.emc.com/collateral/hardware/white-papers/h10719-isilon-onefs-technical-overview-wp.pdf):

"No expensive 'fsck' or 'disk-check' processes are ever required. No drawn-out resynchronization ever needs to take place. Writes are never blocked due to a failure. The patented transaction system is one of the ways that OneFS eliminates single -- and even multiple -- points of failure."

From OneFS Hardware Fault Tolerance (https://community.emc.com/community/products/isilon/blog/2015/03/25/onefs-hardware-fault-tolerance)

In the event that the recomputed checksum does not match the stored checksum, OneFS will generate a system alert, log the event, retrieve and return the corresponding error correcting code (ECC) block to the client and attempt to repair the suspect data block.

If you'd like to learn more, I suggest a web search for "Isilon fsck" or "Isilon data integrity" and looking through the material that comes up. It's a window into the value of OneFS as the storage host in your Hadoop cluster.

View solution in original post

6 REPLIES 6

avatar

@Matjaz Skerjanec

An HTTP 403 error indicates that the user does not have permission to access something on the server (403 = "Forbidden"). This would be returned by the Isilon node in response to your request for a fsck command. Isilon uses its Integrity Check mechanism to ensure filesystem integrity. Remember that HDFS is just an interface that Isilon provides to the OneFS filesystem. Filesystem integrity checks will be handled internally by OneFS and command like "hdfs fsck" become unnecessary.

avatar
Expert Contributor

@Matjaz Skerjanec:

As @emaxwell said, this stack trace error is clunky but really it's a good thing. OneFS is unable to respond successfully to "hdfs fsck" because fsck is not how data integrity is protected, managed or alerted by Isilon.

Here are a few quotes from Isilon documentation.

In the OneFS Technical Overview (https://www.emc.com/collateral/hardware/white-papers/h10719-isilon-onefs-technical-overview-wp.pdf):

"No expensive 'fsck' or 'disk-check' processes are ever required. No drawn-out resynchronization ever needs to take place. Writes are never blocked due to a failure. The patented transaction system is one of the ways that OneFS eliminates single -- and even multiple -- points of failure."

From OneFS Hardware Fault Tolerance (https://community.emc.com/community/products/isilon/blog/2015/03/25/onefs-hardware-fault-tolerance)

In the event that the recomputed checksum does not match the stored checksum, OneFS will generate a system alert, log the event, retrieve and return the corresponding error correcting code (ECC) block to the client and attempt to repair the suspect data block.

If you'd like to learn more, I suggest a web search for "Isilon fsck" or "Isilon data integrity" and looking through the material that comes up. It's a window into the value of OneFS as the storage host in your Hadoop cluster.

avatar
Explorer

Hello @Rob Ketcherside, @emaxwell, thanks a lot to remind me about that. It looks I still think to much in a hdfs way here.

You see first I deployed a HDP on VM's and running it in a production for a few months. After few months I had about 300 mio of records in hive and I had to move everything to a new infrastructure HPD on Isilon. The deployment and migration was successfull and I'm loading new data now every day... It was easy to check if hdfs is healthy before, I could check it from ambari or command line, but now with Isilon I'm not so sure about it. Ambari is not showing me some useful info about it and also from shell I could not find some real info.

I can run hdfs dfsadmin -report, but I'm no sure if this is a right way now. There is about 350GB of data in hdfs in the moment, but report shows a different number...

hdfs dfsadmin -report

Configured Capacity: 356881289379840 (324.58 TB) Present Capacity: 356881289379840 (324.58 TB) DFS Remaining: 350800989585408 (319.05 TB) DFS Used: 6080299794432 (5.53 TB) DFS Used%: 1.70% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0

How do you cope with that? How can you realy monitor hdp and hdfs on Isilon to be sure everything is running ok?

avatar
Expert Contributor

Are you also the administrator of the Isilon cluster? If not, you should ask the administrator to grant you access to view the web administrator UI or ssh into the cluster. Or, ask to be added to email notification on alerts for your zone and an HDFS traffic report from InsightIQ.

Currently from the HDP side, the only direct reassurance you can get is to see that the HDFS service is green. We recognize that doesn't match the expectations of an Ambari administrator, so we hope to provide more data directly to Ambari in the near future.

Otherwise within Ambari you will be looking at the cluster from an application perspective. For example, you can check that services are running correctly -- the Yarn service check is a great test. If OneFS is not configured correctly it won't pass.

Within OneFS, the first page of the administrator dashboard will tell you the status of the cluster. From the console, run isi status for the same data.

InsightIQ, our cluster analytics dashboard package, is now offered as a free option by Isilon. With it you can watch HDFS protocol traffic and inspect cluster capacity and other metrics over time. Give that a spin or ask your Isilon administrator to set up a report for you!

avatar
Explorer

Thank you @Rob Ketcherside for useful information!

Yes, I do have green lights on hdfs, yarn and all other services in Ambari. If I restart hdp it comes up reasonable fast. I will also ask Isilon administrator to send me required reports from Isilon.

In the moment everyting looks ok, sqoop import is working, hive queries also (they are very very slow, but that is another problem). My concern was to set up a reliable monitoring and I now I have enough data thanks to you guys.

All the best

avatar
Expert Contributor

Great @Matjaz Skerjanec , glad to help! How about selecting one of the answers as "Accepted" so that other folks know this question is closed?

Also, I'm curious what version of OneFS your cluster is running. Is it 7.2.something or 8.0.something?