About mliu

mliu · ‎09-21-2016

You have a need of debugging, testing and operating a Hadoop cluster, especially when you run dangerous dfsadmin commands, try customized packages with changes of Hadoop/Spark source code, trying aggressive configuration values. You have a laptop and you have a production Hadoop cluster. You don't dare to operate the production cluster blindly, which is appreciated by your manager. You want to try something on a hadoop cluster and even you breaks it, no one blames you. You have several choices (perhaps you're using one of them now): psudo-distributed Hadoop cluster on a single machine, which is nontrivial to run HA, to use per-node configurations, to pause and launch multiple nodes, or to test HDFS balancer/mover etc. setting up a real cluster, which is complex and heavy to use, and in the first place you can afford a real cluster. building Ambari cluster using vbox/vmware virtual machines, nice try. But if you run 5 nodes cluster, you'll see your CPU is overloaded and memory is eaten up. How about using Docker containers instead of virtualbox virtual machines? Caochong is a tool that does this exactly! Specially, it outperforms its counterparts in that it is: Customizable: you can specify the cluster specs easily, e.g. how many nodes to launch, Ambari version, Hadoop version repository, per-node Hadoop configurations. Meanwhile, you have the choice of full Hadoop eco-system stack, HDFS, Yarn, Spark, Hbase, Hive, Pig, Oozie... you name one! Lightweight: imagine your physical machine can run as many containers as you wish. I ran 10 without any problem (well, my laptop was made slow though). Using docker, you can also pause and start the containers (consider you have to restart your laptop for an OS security update, you will need a snapshot, right). Standard: The caochong tool employs Apache Ambari to set up a cluster, which is a tool for provisioning, managing, and monitoring Apache Hadoop clusters. Automatic: you don't have to be Ambari, Docker or Hadoop experts to use it! To use caochong, you only need to follow 9 steps. Only nine, indeed! 0. Download caochong, and install Docker. 1. [Optional] Choose Ambari version in from-ambari/Dockerfile file (default Ambari 2.2) 2. Run from-ambari/run.sh to set up an Ambari cluster and launch it $ ./run.sh --help Usage: ./run.sh [--nodes=3] [--port=8080] --nodes Specify the number of total nodes --port Specify the port of your local machine to access Ambari Web UI (8080 - 8088) 3. Hit http://localhost:port from your browser on your local computer. The port is the parameter specified in the command line of running run.sh. By default, it is http://localhost:8080. NOTE: Ambari Server can take some time to fully come up and ready to accept connections. Keep hitting the URL until you get the login page. 4. Login the Ambari webpage with the default username:password is admin:admin. 5. [Optional] Customize the repository Base URLs in the Select Stack step. 6. On the Install Options page, use the hostnames reported by run.sh as the Fully Qualified Domain Name (FQDN). For example: ------------------------------ Using the following hostnames: 85f9417e3d94 9037ffd878dk b5077ffd9f7f ------------------------------ 7. Upload from-ambari/id_rsa as your SSH Private Key to automatically register hosts when asked. 8. Follow the onscreen instructions to install Hadoop (YARN + MapReduce2, HDFS) and Spark. 9. [Optional] Log in to any of the nodes and you're all set to use an Ambari cluster! # login to your Ambari server node $ docker exec -it caochong-ambari-0 /bin/bash To know more or to get updates, please star the Caochong project at GitHub.com.

mliu · ‎08-10-2016

1. Recover the lease for the file When you do "hdfs dfs -cat file1" from the command line, you get the exception saying that it "Cannot obtain block length for LocatedBlock". Usually this means the file is still in being-written state, i.e., it has not been closed yet, and the reader cannot successfully identify its current length by communicating with corresponding DataNodes. Suppose you're pretty sure the writer client is dead, killed, or lost connection to the servers. You're wondering what else you can do other than waiting. hdfs debug recoverLease -path <path-of-the-file> [-retries <retry-times>] This command will ask the NameNode to try to recover the lease for the file, and based on the NameNode log you may track to detailed DataNodes to understand the states of the replicas. The command may successfully close the file if there are still healthy replicas. Otherwise we can get more internal details about the file/block state. Please refer to https://community.hortonworks.com/questions/37412/cannot-obtain-block-length-for-locatedblock.html for discussion, especially answer made by @Jing Zhao. This is a lightweight operation so the server should not crash if you run it. This is an idempotent operation so the server should not crash if you run the this command multiple times against the same path. 2. Trigger block report on DataNodes You think a DataNode is not stable and you need to update, or you think there is a potential unknown bug in name-node (NN) replica accounting and you need to work around. As an operator, if you suspect such an issue, you might be tempted to restart a DN, or all of the DNs in a cluster, in order to trigger full block reports. It'd be much lighter weight if instead you could just manually trigger a full BR instead of having to restart the DN and therefore need to scan all the DN data dirs, etc. hdfs dfsadmin -triggerBlockReport [-incremental] <datanode_host:ipc_port> This command is to help you. If "-incremental" is specified, it will be incremental block report (IBR). Otherwise, it will be a full block report. 3. Verify block metadata Say you have a replica, and you don't know whether it's corrupt. hdfs debug verify -meta <metadata-file> [-block <block-file>] This command is to help you verify a block's metadata. Argument "-meta <metadata-file>" is the absolute path for the metadata file on the local file system of the data node. Argument "-block <block-file>" is an optional parameter to specify the absolute path for the block file on the local file system of the data node. 4. Dump NameNode's metadata You want to dump NN's primary data structures. hdfs dfsadmin -metasave filename This command is to save NN's meta data to filename in the directory specified by hadoop.log.dir property. "filename" is overwritten if it exists in the command line. The filename will contain one line for each of the following: Datanodes heart beating with Namenode Blocks waiting to be replicated Blocks currently being replicated Blocks waiting to be deleted 5. Get specific Hadoop config You want to know one specific config. You're smart to use Ambari UI while you need a web browser, which you don't always have as you SSH to the cluster from home. You turn to the configuration files and search for the config key. Then you find something special like in the configuration files. For example, Embedded file substitutions. XInclude in the XML files are popular in Hadoop world, you know. Property substitutions. Config A's value refers to config B's value, and again, config C's value. You're in an urgent issue and tired of parsing the configuration files manually. You can do better. "Any scripting approach that tries to parse the XML files directly is unlikely to accurately match the implementation as its done inside Hadoop, so it's better to ask Hadoop itself." It's always been true. hdfs getconf -confKey <key> This command is to show you the actual, final results of any configuration properties as they are actually used by Hadoop. Interestingly, it is capable of checking configuration properties for YARN and MapReduce, not only HDFS. Tell your YARN friends about this. For more information, please refer to stackoverflow discussion, especially answers made by @Chris Nauroth.

mliu · ‎06-13-2016

Thank you @Chris Nauroth and @Arpit Agarwal for your helpful comments. I updated the article, and my MAVEN_OPTS env variable 🙂

mliu · ‎06-03-2016

Thanks for your tips!

mliu · ‎06-02-2016

When I do "hdfs dfs -cat file1" from the command line, I got the exception saying that it Cannot obtain block length for LocatedBlock. How should we handle this case?

mliu · ‎06-01-2016

1. Install Java 8. Mac should have the Java 6 installed with the OS. We recommend you switch to Java 8 as Hadoop trunk branch does. 2. Install Xcode from the App Store 3. Install homebrew, the missing package manager for Mac OS X 4. Install all necessary system tools/libraries (mainly maven, make) brew install maven autoconf automake cmake libtool 5. Install the protobuffer package which is heavily used in Hadoop IPC layer. Install homebrew versions brew tap homebrew/versions Install the protobuf package version 2.5 brew install homebrew/versions/protobuf250 6. Clone Hadoop git repository git clone git://git.apache.org/hadoop.git 7. Build the Hadoop project using maven mvn clean package -Pdist,native -Dtar -DskipTests=true -Dmaven.site.skip=true -Dmaven.javadoc.skip=true 8. Install IntelliJ IDEA IDE. This is much better than Eclipse. 9. Create a new project using the existing source. Do use the Maven option if asked. 10. Start coding! Bonus followup: there is a known bug in the TestGetTrimmedStringCollection and TestGetStringCollection methods. Can you find them?

mliu · ‎05-28-2016

If this problem happens a lot,I mean you always need know the mapping from file operations (create, delete, rename etc) to upper level applications, I think you can suggest users use caller context feature, which was released to HDP 2.2 and up. The feature introduces a new setting hadoop.caller.context.enabled . When set to additional fields are written into namenode audit log records to help identify the job or query that introduced each NameNode operation. This feature is enabled by default starting with this release of HDP. New Behavior: This feature brings a new key-value pair at the end of each audit log record. The newly added key at is callerContext , value context:signature . The overall format would be callerContext=context:signature. If the signature is null or empty, the value will be context only, in the format of callerContext=context . If the hadoop.caller.context.enabled config key is false, the key-value pair will not be showing. The audit log format is not changed in this case. It is also possible to limit the maximum length of context and signature. Consider the hadoop.caller.context.max.size config key (default 128 bytes) and hadoop.caller.context.signature.max.size (default 40 bytes) config key respectively. There is a chance that the new information in the audit log may break existing scripts/automation that was being used to analyze the audit log. In this case the scripts may need to be fixed. We do not recommend disabling this feature as it can be a useful troubleshooting aid. Please refer to release notes.

mliu · ‎05-28-2016

Our customer has a HA-enabled cluster, and after automatic failover, the active name node (NN) is very slow. This is a central place for collecting all possible reasons that cause slow NN. Problem definition: NN is responding slowly. It does not crash. Most of the operations can succeed. By slow, I have two proofs a) hdfs dfs -ls / command is sluggy b) From JMX metrics, average RPC process time is up to 3~10 seconds For answers, please kindly list the possible reason along with solutions.

mliu · ‎03-30-2016

I don't see any problems from HDFS side. MR is using utf-8 for writing text. If the user is using other encoding, she has to extend the input/output format.

Online	Offline
Last Visited	‎03-23-2017 12:35 AM

Member Since	‎09-15-2015 02:08 AM
Last Visited	‎03-23-2017 12:35 AM
Posts	14
Kudos received	40

Cloudera Community

Re: Check opening files on HDFS

Re: HDP support of the Chinese language

Setting up a Hadoop/Spark cluster with Docker on a...

5 Infrequently Known Commands To Debug Your HDFS I...

Re: Setting Hadoop development environment on Mac ...

Re: Setting Hadoop development environment on Mac ...

Cannot obtain block length for LocatedBlock

Setting Hadoop development environment on Mac OS X...

Re: Check opening files on HDFS

Possible reasons that cause slow name node (NN)

Re: HDP support of the Chinese language