About skumpf

skumpf · ‎03-09-2017

OVERVIEW Docker supplies multiple storage drivers to manage the mutable and immutable layers of images and containers. Many options exist with varying pros and cons. Out of the box, docker uses devicemapper loop-lvm. The loop-lvm storage driver is not recommended for production, but requires zero setup to leverage. When attempting to increase the base size of the mutable layer, it was observed that docker client operations slow. The alternative of using smaller base layers causes failures due to out of storage conditions. The intent of this article is to outline testing that was performed to determine sane defaults for the docker storage driver options. TESTING The following testing methodology was used: Build the centos6 image with different combinations of base sizes and storage drivers (build) Create a container from the image (run) Stop the container (stop) Remove the container (rm) Stop docker Delete/reprovision the docker graph storage location Repeat The following scenarios were tested: loop-lvm (xfs) direct-lvm (ext4) direct -lvm (xfs) btrfs zfs overlay aufs The following base sizes were tested: 25GB 50GB 100GB 250GB The following container operation counts were tested: 1 10 25 50 100 The tests were run on the following hardware: Intel(R) Core(TM) i5-4460 CPU @ 3.20GHz 4 core 12GB memory 1x SATA 1TB OS + Docker OS details: CentOS 7.2.1511 Kernel: 3.10.0-327.4.5.el7.x86_64 docker 1.9.1 Due to docker issue 17653 cgroupfs must be used instead of systemd on CentOS 7.2 --exec-opt native.cgroupdriver=cgroupfs LOOP-LVM Notes: loop-lvm requires no up front storage configuration, and uses /var/lib/docker by default. In these tests, the docker cache directory was reconfigured to use a separate XFS mount on a SATA drive. Example setup (optional is OS disk on SATA): mkdir -p /docker/loop-xfs # path to the filesystem on SATA Docker command: /usr/bin/docker daemon --graph=/docker/loop-xfs\ --storage-driver=devicemapper \ --storage-opt dm.basesize=${BASESIZE}G \ --storage-opt dm.loopdatasize=5000G \ --storage-opt dm.loopmetadatasize=1000GB DIRECT-LVM Notes: direct-lvm requires that a logical volume(s) be provisioned on the docker daemon node. The logical volume is then converted to a thinpool to allow docker images and containers to be provisioned with minimal storage usage. The docker-storage-setup script typically handles the logical volume setup for RHEL/CentOS, if installing from the EPEL yum repos. However, when installing from the main docker repo, to leverage the latest version of docker, this script is not included. The docker-storage-setup script is not actually required, as the required LVM commands and docker configuration can be extracted. The instructions below do not include the auto expansion of the logical volumes, which is an additional feature supported by docker-storage-setup. The direct-lvm approach allows for using ext4 or xfs, both were tested. Example setup: pvcreate -ffy /dev/sda4 vgcreate vg-docker /dev/sda4 lvcreate -L 209708s -n docker-poolmeta vg-docker lvcreate -l 60%FREE -n docker-pool vg-docker <<< "y" lvconvert -y --zero n -c 512K --thinpool vg-docker/docker-pool --poolmetadata vg-docker/docker-poolmeta Docker command (ext4): /usr/bin/docker daemon --storage-driver=devicemapper \ --storage-opt dm.basesize=${BASESIZE}G \ --storage-opt dm.thinpooldev=/dev/mapper/vg--docker-docker--pool\ --storage-opt dm.fs=ext4 Docker command (xfs): /usr/bin/docker daemon --storage-driver=devicemapper \ --storage-opt dm.basesize=${BASESIZE}G \ --storage-opt dm.thinpooldev=/dev/mapper/vg--docker-docker--pool\ --storage-opt dm.fs=xfs BTRFS Notes: The docker btrfs option requires a btrfs filesystem, which has mixed support depending on OS distribution. Note that btrfs does not honor the dm.basesize setting. Each image and container is represented as a btrfs subvolume. As a result, the usable storage for docker is the total amount of storage available in the btrfs filesystem. Example setup: yum install btrfs-tools -y modprobe btrfs mkfs.btrfs -f /dev/sda4 mount /dev/sda4 /docker/btrfs Docker command: /usr/bin/docker daemon --graph /docker/btrfs --storage-driver=btrfs ZFS Notes: The docker zfs storage driver requires a zfs zpool to be created and mounted on the partition or disk where docker data should be stored. Snapshots (read-only) and Clones (read-write) are used to manage the images and containers. zfs does not honor, or even allow, the dm.basesize setting. As a result, the usable storage for docker is the total available space in the zpool. Running zfs on RHEL/CentOS requires the install of an unsigned kernel module. On modern PCs this is a problem as modprobe will fail due to the UEFI SecureBoot feature. The UEFI SecureBoot feature MUST be disabled via the UEFI or BIOS menu, depending on system board manufacturer. Example setup: yum -ylocalinstall --nogpgcheck https://download.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm yum -y localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm yum -y install kernel-devel zfs modprobe zfs mkdir -p /docker/zfs zpool destroy -f zpool-docker zpool create -f zpool-docker /dev/sda4 zfs create -o mountpoint=/docker/zfs zpool-docker/docker Docker command: /usr/bin/docker daemon --graph=/docker/zfs \ --storage-driver=zfs OVERLAYFS OverlayFS is a modern union filesystem that is similar to AUFS. It is layered on top of an existing filesystem such as ext4 or xfs. OverlayFS promises to be fast, but currently can not be used with RPM on RHEL/CentOS6 images or hosts. This issue is fixed in yum-utils-1.1.31-33.e17, however, this requires that all images be upgraded to use the RHEL/CentOS 7.2 image. Originally, OverlayFS was tested, but not a single image could be successfully built, using both ext4 and xfs. No results are available as part of this test to prove it's speed. Additional testing will be conducted in the future when image upgrades are feasible. OverlayFS also exhibits abnormally high inode usage, increasing the number of inodes on the backing filesystem is necessary. As a follow up, OverlayFS now functions properly with RHEL/CentOS 7.2 based images. However, it was discovered that it does not honor base size or the graph storage location. Instructions below and tools have been updated to reflect these discoveries. Example Setup: modprobe overlay # create the backing filesystem with extra inodes mkfs -t ext4 -N 131072000 /dev/sda4 rm -rf /var/lib/docker/* mount /dev/sda4 /var/lib/docker Docker command: /usr/bin/docker daemon --storage-driver=overlay \ --storage-opt dm.fs=ext4 AUFS AUFS is the original union filesystem used with docker. It is no longer recommended for production. AUFS requires a custom built kernel with support for AUFS. As a result, this option was not tested. OverlayFS is being touted as the replacement. RESULTS This section contains the result of the testing that was performed as called out previously in this document. BASE SIZE AND DRIVER IMPACT ON BUILD TIMES Build times are erratic making it difficult to truly assess the impact of the various base size and driver combinations. Median values were used in an attempt to normalize results. The following graph shows the build times of the base size and driver combinations. Note that ZFS and BTRFS do not honor the base size parameter, therefore the size listed is for the entire backing filesystem. Summary: BTRFS was consistently faster than all other drivers, but is not yet recommended for production. Direct LVM leveraging Ext4 provided the most flexibility with minimal impact due to base size and is supported in production. DRIVER TYPE IMPACT ON OPERATIONS After building the image, the next steps were to run, stop, and remove the container based on the base image. Below are the results of those actions using a 250GB base size. The following drills down into each of the operation types to show the relative differences between storage drivers. Very little impact was found for all of the direct filesystem based approaches. However, when using the loop-lvm xfs backed driver, stop times were considerably higher. This aligns with the problem statement that the loop-lvm approach is slower at larger base sizes. BASE SIZE IMPACT ON OPERATIONS The following is a breakdown of the impact to operations as the base size increases for storage drivers that support supplying a base size. btrfs and zfs were not tested as base size is not honored by those drivers. loop-lvm xfs As seen below, base size has a direct impact on the amount of time needed to stop a container. No other operations are impacted by the base size. direct-lvm ext4 The base size increasing does not significantly impact direct filesystem approaches. Below outlines the operation times across base image sizes for the direct-lvm ext4 approach. direct-lvm xfs The base size increasing does not significantly impact direct filesystem approaches. Below outlines the operation times across base image sizes for the direct-lvm xfs approach. PARALLEL CONTAINER OPERATIONS IMPACT It is possible to execute docker run, stop, and remove operations in parallel, however, very little benefit is gained by doing so and complicates scheduling of containers. The exception to this result is the stop operation. The stop operation is responsible for a bulk of the time needed to deprovision containers. Running the stop operation in parallel will reduce the overall time needed to run, stop, and remove the container. OVERLAYFS RESULTS OverlayFS was compared to the current recommended storage drive, LVM Direct Ext4. BASE SIZE AND DRIVER IMPACT ON BUILD TIMES As seen below, OverlayFS is nearly twice as fast for build operations than LVM Direct Ext4. As previously mentioned, build times are erratic due to all of the downloads it requires, however, OverlayFS consistently beat the closest competitor. PARALLEL CONTAINER OPERATIONS IMPACT OverlayFS is faster at most operations, but only marginally. A breakdown of each operation follows to show the relative difference between OverlayFS and LVM Direct Ext4. SUMMARY Below is a summary of the pros and cons of each of the storage drivers tested: Loop-LVM XFS Pros No configuration required Decent performance at small base sizes Cons Poor performance at larger base sizes Not recommended for production Direct-LVM Ext4 Pros More performant than xfs at build, run, and stop operations Consistent performance for all tested base sizes Cons Requires dedicated storage, as LVM logical volumes Slightly slower than xfs for remove operations Direct-LVM XFS Pros More performant than ext4 at remove operations Consistent performance for all tested base sizes Cons Requires dedicated storage, as LVM logical volumes Slower than ext4 at build, run, and stop operations. btrfs Pros No need to manage base size, docker can use all the space in the filesystem Most performant for build operations Cons Not recommended for production Requires dedicated storage, as a btrfs filesystem zfs Pros No need to manage base size, docker can use all the space in the zpool Cons Not recommended for production Requires disabling UEFI SmartBoot at the system level Requires dedicated storage, as a zfs filesystem overlayfs Pros Claims to be fast and efficient The “modern” union filesystem Cons Not yet production ready Not supported with RPM + CentOS 6 Could not properly test due to this issue Potential fix available for RHEL/CentOS 7.2+ images AUFS Pros The original Cons Requires a custom kernel Could not property test due to this issue

skumpf · ‎03-08-2017

This could occur if you have an overloaded NM and the liveness monitor expiration has occurred. Are you seeing an Nodemanagers in a Lost state? What does resource consumption look like on your nodemanagers when this occurs?

skumpf · ‎06-01-2016

You are correct, use LVM for OS disks, but not data disks. In the end, the filesystem choice doesn't make a huge difference. ext4 everywhere would simply the overall design and allow for the ability to resize filesystems online in the future. Allocating a larger amount of storage to the OS filesystems does simplify the install. Otherwise, during the Ambari install wizard, you need to go through each of the service's configurations and change "/var/log" to one of the data disk mount points (i.e. /opt/dev/sdb as an example above). If you allocated more storage to the OS (and subsequently made /usr say 30GB and /var/log 200GB), you would not have to change as much during the Ambari install. Either approach is viable, so I would suggest discussing with your OS admin team to see if they have a preference. Also note that I'm referring to daemon logs (namenode, resource manager, etc) that end up in /var/log, versus application logs. The yarn settings you show above are for the yarn application logs and local scratch space. You want to follow that same pattern in production.

skumpf · ‎06-01-2016

The HDP documentation around filesystem selection is out dated. ext4 and XFS are fine choices today. You can use LVM for the OS filesystems. This provides a nice way to shuffle space around on your 2x 300GB OS drives as needed. XFS is perfectly fine here, so you can let RHEL use the default. However, note that XFS filesystems can not be shrunk, whereas with LVM + ext4, filesystems can be expanded and shrunk while online. This is a big gap for XFS. For the datanode disks, do not use RAID or LVM. You want each individual disk mounted as a separate filesystem. You then provide HDFS with a comma separated list of mount points, and HDFS will handle spreading data and load across the disks. If you have 24 data disks per node, you should have 24 filesystems configured in HDFS. XFS is good choice here, since resizing is unlikely to come into play. Also keep in mind that /var/log and /usr have specific needs. /var/log can grow to hundreds of GBs, so moving this logging to one of the data disks may be necessary. The HDP binaries are installed to /usr/hdp, and depending on which components you are installing, could use as much as 6GB per HDP release. Keep this in mind as sufficient space is needed here for upgrades. Hope that helps.

skumpf · ‎05-23-2016

One point of clarification, the Secondary Name Node is not used for High Availability. It was poorly named and only provides checkpointing capabilities. You need to enable Name Node HA (which replaces the Secondary Name Node with a Standby Name Node) for failover to work. Ambari has a wizard to assist in enabling NameNode HA. Once NameNode HA is enabled, jobs will continue if the Primary NameNode fails.

skumpf · ‎05-21-2016

When installing HDB/HAWQ on Sandbox, it is necessary to relocate the default Ambari postgres database to a postgres instance running on a different port. The following script performs the move in a mostly automated fashion. When prompted by ambari-server setup, select option 4 for the database configuration and fill in the details. Note that this is only intended for Sandbox. Please do not use in production. #!/usr/bin/env bash # # Change as needed # PGPORT=12346 PGDATA=/var/lib/pgsql/ambari AMBARI_WEB_USER=admin AMBARI_WEB_PW=admin AMBARI_DB_NAME=ambari AMBARI_DB_USER=ambari AMBARI_DB_PW=bigdata # # Variables # PG_INIT_PATH=/etc/init.d/postgresql DB_BKUP_DIR=/tmp/ambari-db-backup AMBARI_PROPS=/etc/ambari-server/conf/ambari.properties # # Main # echo -e "\n#### Stopping ambari-server" ambari-server stop echo -e "\n#### Creating the pgpass file" echo "*:*:*:$AMBARI_DB_USER:$AMBARI_DB_PW" >> $HOME/.pgpass chmod 600 $HOME/.pgpass echo -e "\n#### Creating database backup directory" if [ -d $DB_BKUP_DIR ]; then rm -rf $DB_BKUP_DIR fi mkdir -p $DB_BKUP_DIR chown 777 $DB_BKUP_DIR echo -e "\n#### Backing up ambari-server databases" pg_dump -U $AMBARI_DB_USER -w -f $DB_BKUP_DIR/ambari.sql echo -e "\n#### Attempting to stop postgres on port $PGPORT, if running" service postgresql.${PGPORT} stop echo -e "\n#### Setting up new postgres data directory" if [ -d $PGDATA ]; then rm -rf $PGDATA fi mkdir -p $PGDATA chown postgres:postgres $PGDATA echo -e "\n#### Creating new init script" sed -e 's|^PGPORT=.*|PGPORT='$PGPORT'|g' -e 's|^PGDATA=.*|PGDATA='$PGDATA'|g' $PG_INIT_PATH > ${PG_INIT_PATH}.${PGPORT} chmod 775 ${PG_INIT_PATH}.${PGPORT} echo -e "\n#### Initializing new postgres instance on port $PGPORT" service postgresql.${PGPORT} initdb echo -e "\n#### Modify postgres config to listen on all interfaces" sed -i "s|^#\?listen_addresses.*|listen_addresses = '*'|g" $PGDATA/postgresql.conf echo -e "\n#### Copy existing pg_hba.conf" cp /var/lib/pgsql/data/pg_hba.conf $PGDATA/pg_hba.conf echo -e "\n#### Starting new postgres instance on port $PGPORT" service postgresql.${PGPORT} start echo -e "\n#### Creating the ambari db" su - postgres -c "psql -p $PGPORT -c 'CREATE DATABASE ambari;' -d postgres" echo -e "\n#### Creating the ambari db user role" su - postgres -c "psql -p $PGPORT -c \"CREATE ROLE $AMBARI_DB_USER LOGIN PASSWORD '$AMBARI_DB_PW';\" -d ambari" echo -e "\n#### Restoring ambari database backup" su - postgres -c "psql -p $PGPORT -f $DB_BKUP_DIR/ambari.sql -d ambari" echo -e "\n#### Updating jdbc config for ambari-server" grep -v "server.jdbc" $AMBARI_PROPS >${AMBARI_PROPS}.nojdbc echo "server.jdbc.port=$PGPORT" >> ${AMBARI_PROPS}.nojdbc echo "server.jdbc.rca.driver=org.postgresql.Driver" >> ${AMBARI_PROPS}.nojdbc echo "server.jdbc.rca.url=jdbc:postgresql://localhost:${PGPORT}/ambari" >> ${AMBARI_PROPS}.nojdbc echo "server.jdbc.driver=org.postgresql.Driver" >> ${AMBARI_PROPS}.nojdbc echo "server.jdbc.user.name=$AMBARI_DB_USER" >> ${AMBARI_PROPS}.nojdbc echo "server.jdbc.postgres.schema=ambari" >> ${AMBARI_PROPS}.nojdbc echo "server.jdbc.hostname=localhost" >> ${AMBARI_PROPS}.nojdbc echo "server.jdbc.rca.user.passwd=/etc/ambari-server/conf/password.dat" >> ${AMBARI_PROPS}.nojdbc echo "server.jdbc.rca.user.name=$AMBARI_DB_USER" >> ${AMBARI_PROPS}.nojdbc echo "server.jdbc.url=jdbc:postgresql://localhost:${PGPORT}/ambari" >> ${AMBARI_PROPS}.nojdbc echo "server.jdbc.user.passwd=/etc/ambari-server/conf/password.dat" >> ${AMBARI_PROPS}.nojdbc echo "server.jdbc.database=postgres" >> ${AMBARI_PROPS}.nojdbc echo "server.jdbc.database_name=ambari" >> ${AMBARI_PROPS}.nojdbc cp ${AMBARI_PROPS}.nojdbc $AMBARI_PROPS echo -e "\n#### Stopping existing postgres instance" service postgresql stop echo -e "\n#### Running ambari-server setup" ambari-server setup echo -e "\n#### Starting ambari-server" service ambari-server start

skumpf · ‎03-30-2016

As has been mentioned in this thread, there is no native C# HBase client. however, there are several options for interacting with HBase from C#. C# HBase Thrift client - Thrift allows for defining service endpoints and data models in a common format and using code generators to create language specific bindings. HBase provides a Thirft server and definitions. There are many examples online for creating a C# HBase Thrift Client. hbase-thrift-csharp Marlin - Marlin is a C# client for interacting with Stargate (HBase REST API) that ultimately became hbase-sdk-for-net. I have not personally tested this against HBase 1.x+, but considering it uses Stargate, I expect it should work. If you are planning to use Stargate and implement your own client, which I would recommend over Thrift, make sure to use protobufs to avoid the JSON serialization overhead. Using a HTTP based approach also makes it much easier to load balance requests over multiple gateways. Phoenix Query Server - Phoenix is a SQL skin on HBase. Phoenix Query Server is a REST API for submitting SQL queries to Phoenix. Here is some example code, however, I have not yet tested it. hdinsight-phoenix-sharp Simba HBase ODBC Driver - Using ODBC to connect to HBase. I've heard positive feedback on this approach, especially from tools like Tableau. This is not open source and requires purchasing a license. Future: Phoenix ODBC Driver - I've been told a Phoenix ODBC driver is in the works. Unfortunately, no ETA. What we really need is an Entity or LINQ based framework, as that's how C# developers expect to interact with backend data sources. At one point, a member of the community began developing Linq2Hive, but the project appears to be no more. It may be possible to leverage Linq2Hive and the HbaseStorageHandler, but that seems like a really poor pattern. 🙂 I'm sure there are others, but hopefully this helps.

skumpf · ‎03-08-2016

I expect you just need to add a port forwarding rule to forward 9090 from your host to the VM. Right Click the VM -> Settings -> Network -> Port Forwarding and validate a rule exists for 9090. If that wasn't the issue, check /etc/hosts on your host to ensure there isn't an old sandbox entry in there.

skumpf · ‎01-29-2016

Unfortunately, I have not found any archetype to meet these requirements, but the need is there. The project below seems to meet #4 on the list as a starting point: sparkjava-archetypes

skumpf · ‎01-19-2016

From what I understand, the Eclipse plugin has not been maintained as new versions of Hadoop have been released. It appears that the command to start the DataNode is missing a required argument: Usage: java DataNode [regular | rollback] regular : Normal DataNode startup (default). rollback : Rollback a standard or rolling upgrade. Refer to HDFS documentation for the difference between standard and rolling upgrades. The Apache HDT (Hadoop Development Tools) project had plans to fix this, but unfortunately, it has been retired due to lack of contributions. http://hdt.incubator.apache.org/ One option to consider would be to ditch the Eclipse Plugin and leverage "mini clusters" to provide a similar development experience, but without the need to connect to an external cluster or leverage the Eclipse plugin. https://wiki.apache.org/hadoop/HowToDevelopUnitTests Another option would be to leverage the hadoop-mini-clusters project that I maintain. It simplifies the use of mini clusters by wrapping them in a common Builder pattern. https://github.com/sakserv/hadoop-mini-clusters Hope that helps.

Online	Offline
Last Visited	‎09-15-2023 07:16 AM

Member Since	‎09-21-2015 02:01 PM
Last Visited	‎09-15-2023 07:16 AM
Posts	31
Kudos received	59

Cloudera Community

Re: HDP 2.4 installation on prod. cluster - filesy...

Re: Sandbox 2.4 Nifi connection refused

Re: Hadoop Eclipse plugin

Re: Why does the install of Accumulo not work with...

Re: HBase shell error: java.lang.UnsatisfiedLinkEr...

Docker storage drivers overview

Re: Unauthorized access or Invalid container

Re: HDP 2.4 installation on prod. cluster - filesy...

Re: HDP 2.4 installation on prod. cluster - filesy...

Re: Will the MapReduce jobs fail when the NameNode...

Moving the Ambari Default Database on Sandbox

Re: Is there a way to connect to HBase using C#?

Re: Sandbox 2.4 Nifi connection refused

Re: Is there a Spark Maven archetype for Java?

Re: Hadoop Eclipse plugin