About kgopal

iHazem · ‎01-27-2020

Thanks for the information. In using this command, it did cause some serious performance degradation when writing to HDFS. Every 128MB block would take about 20-30 secs to write to HDFS. The issue had to do with trying to compress the tar file. It's better to remove the "z" flag in tar and not compress. Just to provide some numbers, writing almost 1TB of data from local disk to HDFS would take 13+ hours with compression (z) and it would actually eventually fail due to kerberos ticket expiration. Removing the "z" flag, the copy to HDFS took less than an hour for the same 1TB of data!

ThiagoSantiago · ‎06-12-2016

The easiest way to do it: Just log in to the Ambari using these credentials: User: admin Pass: 4o12t0n cheers

TimothySpann · ‎02-14-2017

Tensorflow on Spark by Yahoo http://yahoohadoop.tumblr.com/post/157196317141/open-sourcing-tensorflowonspark-distributed-deep https://github.com/yahoo/TensorFlowOnSpark https://github.com/yahoo/TensorFlowOnSpark/wiki/GetStarted_YARN https://github.com/yahoo/TensorFlowOnSpark/wiki/Conversion https://github.com/yahoo/TensorFlowOnSpark/tree/master/examples/slim https://github.com/tensorflow/models/tree/master/slim

LH · ‎01-03-2019

Hi, I'd like to share a situation we encountered where 99% of our HDFS blocks were reported missing and we were able to recover them. We had a system with 2 namenodes with high availability enabled. For some reason, under the data folders of the datanodes, i.e /data0x/hadoop/hdfs/data/current - we had 2 Block Pools folders listed (example of such folder is BP-1722964902-1.10.237.104-1541520732855). There was one folder containing the IP of namenode1 and another containing the IP of namenode 2. All the data was under the BlockPool of namenode 1, but inside the VERSION files of the namenodes (/data0x/hadoop/hdfs/namenode/current/) the BlockPool id and the namespace ID were of namenode 2 - the namenode was looking for blocks in the wrong block pool folder. I don't know how we got to the point of having 2 block pools folders, but we did. In order to fix the problem - and get HDFS healthy again - we just needed to update the VERSION file on all the namenode disks (on both NN machines) and on all the journal node disks (on all JN machines), to point to Namenode 1. We then restarted HDFS and made sure all the blocks are reported and there's no more missing blocks.

ripu · ‎05-30-2016

Hi @Rushikesh Deshmukh The following table provides an overview for quickly comparing these approaches, which I’ll describe in detail below. http://blog.cloudera.com/blog/2013/11/approaches-to-backup-and-disaster-recovery-in-hbase/ i used distcp as well but that did not work for me , in the sense data was copied but while running hbck i had issue if you want to create backup on same cluster then copytable and sanpshot are very easy for inter cluster snapshot works good let me know if you need more details Also this below link is really very useful and clear http://hbase.apache.org/0.94/book/ops.backup.html

Online	Offline
Last Visited	‎04-25-2018 11:56 AM

Member Since	‎02-02-2016 08:22 AM
Last Visited	‎04-25-2018 11:56 AM
Posts	31
Kudos received	41

Cloudera Community

Re: what is the difference between exec and the ru...

Re: Where to start Ambari Server?

Re: How to fix missing and under replicated blocks...

Re: Is using data compression is better practice w...

Re: Append in HDFS?

Re: How to put a compressed folder into HDFS?

Re: No admin permission for the latest sandbox of ...

Re: Does HDP support installation of TensorFlow?

Re: Best way of handling corrupt or missing blocks...

Re: Which is best method for taking backup of hbas...