About myoung

myoung · ‎11-10-2016

@Roger Young There are 3 versions of the HDP sandbox: Docker, VirtualBox and VMWare. The Docker version is a native Docker version intended to be loaded into Docker. Both the VirtualBox and VMWare sandboxes now use Docker internally to host the sandbox. So all of the sandboxes are based on Docker containers. That means access the sandbox and making configuration changes are now a little different.

myoung · ‎11-10-2016

@Marcia Hon I wrote an HCC article that walks you through the process of increasing the base size of the CentOS 7 docker vm image. This is the preferred method for making these changes. https://community.hortonworks.com/content/kbentry/65714/how-to-modify-the-default-docker-configuration-on.html

myoung · ‎11-10-2016

@Abdelmajid Boutjim If you have commas embedded within the data itself and your columns are not using quotes, then your problem is much more difficult. Your data can be any length so tackling this is hard to do programmatically. Is there anyway to get a new export of the data using a different delimiter like ~ or | or having each column of data quoted?

myoung · ‎11-10-2016

Objective If you are using Docker on CentOS 7, you may have run into issues importing the Docker version of the Hortonworks Sandbox for HDP 2.5. The default configuration of Docker on CentOS 7 limits your Docker virtual machine image to 10GB in size. The Docker HDP sandbox is 13GB in size. This will cause the loading process of the sandbox to fail. This tutorial will guide you through the process of installing Docker on CentOS 7 using Vagrant and modifying the configuration of Docker to move the location of and increase the size of Docker virtual machine image. While we are using Vagrant + Virtualbox, this process should work for any install of CentOS (Amazon, etc) with small changes because VirtualBox and related plugins are not needed. Prerequisites You should already have downloaded the Docker HDP 2.5 Sandbox. Read more here: Docker HDP Sandbox You should already have installed VirtualBox 5.1.x. Read more here: VirtualBox You should already have installed Vagrant 1.8.6. Read more here: Vagrant You should already have installed the vagrant-vbguest plugin. This plugin will keep the VirtualBox Guest Additions software current as you upgrade your kernel and/or VirtualBox versions. Read more here: vagrant-vbguest You should already have installed the vagrant-hostmanager plugin. This plugin will automatically manage the /etc/hosts file on your local mac and in your virtual machines. Read more here: vagrant-hostmanager Scope Mac OS X 10.11.6 (El Capitan) VirtualBox 5.1.6 Vagrant 1.8.6 vagrant-vbguest plugin 0.13.0 vagrant-hostnamanger plugin 1.8.5 Steps Create Vagrant project directory Before we get started, determine where you want to keep your Vagrant project files. Each Vagrant project should have its own directory. I keep my Vagrant projects in my ~/Development/Vagrant directory. You should also use a helpful name for each Vagrant project directory you create. cd ~/Development/Vagrant mkdir centos7-docker cd centos7-docker We will be using a CentOS 7.2 Vagrant box, so I include centos7 in the Vagrant project name to differentiate it from a Centos 6 project. The project is for Docker, so I include that in the name. Thus we have a project directory name of centos7-docker. Create Vagrant project files Create Vagrantfile The Vagrantfile tells Vagrant how to configure your virtual machines. Here is my Vagrantfile: # -*- mode: ruby -*- # vi: set ft=ruby : # Using yaml to load external configuration files require 'yaml' Vagrant.configure("2") do |config| # Using the hostmanager vagrant plugin to update the host files config.hostmanager.enabled = true config.hostmanager.manage_host = true config.hostmanager.manage_guest = true config.hostmanager.ignore_private_ip = false # Loading in the list of commands that should be run when the VM is provisioned. commands = YAML.load_file('commands.yaml') commands.each do |command| config.vm.provision :shell, inline: command end # Loading in the VM configuration information servers = YAML.load_file('servers.yaml') servers.each do |servers| config.vm.define servers["name"] do |srv| srv.vm.box = servers["box"] # Speciy the name of the Vagrant box file to use srv.vm.hostname = servers["name"] # Set the hostname of the VM srv.vm.network "private_network", ip: servers["ip"], :adapater=>2 # Add a second adapater with a specified IP srv.vm.network :forwarded_port, guest: 22, host: servers["port"] # Add a port forwarding rule srv.vm.provision :shell, inline: "sed -i'' '/^127.0.0.1\\t#{srv.vm.hostname}\\t#{srv.vm.hostname}$/d' /etc/hosts" srv.vm.provider :virtualbox do |vb| vb.name = servers["name"] # Name of the VM in VirtualBox vb.cpus = servers["cpus"] # How many CPUs to allocate to the VM vb.memory = servers["ram"] # How much memory to allocate to the VM vb.customize ["modifyvm", :id, "--cpuexecutioncap", "75"] # Limit to VM to 75% of available CPU end end end end Create a servers.yaml file The servers.yaml file contains the configuration information for our VMs. Here is the content from my file: --- - name: docker box: bento/centos-7.2 cpus: 6 ram: 12288 ip: 192.168.56.100 port: 10022 NOTE: My configuration uses 6 CPUs and 12GB of memory for VirtualBox. Make sure you have enough resources to use this configuration. For the purposes of the this demo, you can lower this to a 2CPU and 4GB of memory. Create commands.yaml file The commands.yaml file contains the list of commands that should be run on each VM when they are first provisioned. This allows us to automate configuration tasks that would other wise be tedious and/or repetitive. Here is the content from my file: - "sudo yum -y install net-tools ntp wget" - "sudo systemctl enable ntpd && sudo systemctl start ntpd" - "sudo systemctl disable firewalld && sudo systemctl stop firewalld" - "sudo sed -i --follow-symlinks 's/^SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux" Start Virtual Machine Once you have created the 3 files in your Vagrant project directory, you are ready to start your virtual machine. Creating the vm for the first time and starting it every time after that uses the same command: vagrant up NOTE: During the startup process, the vagrant-hostmanager will prompt you for a password. This is the sudo password for your local machine. It needs to use sudo to update the /etc/hosts file. Once the process is complete you should have 1 server running. You can verify by looking at the VirtualBox user interface; you should have a virtual machine called docker running. Connect to virtual machine You are able to login to the virtual machine via ssh using the vagrant ssh command. vagrant ssh Install Docker in Vagrant virtual machine Now that you are connected to the virtual machine, it's time to install Docker. First we'll create the docker.repo file: sudo tee /etc/yum.repos.d/docker.repo <<-'EOF' [dockerrepo] name=Docker Repository baseurl=https://yum.dockerproject.org/repo/main/centos/7/ enabled=1 gpgcheck=1 gpgkey=https://yum.dockerproject.org/gpg EOF Now we can install Docker: sudo yum install docker-engine Now we need to enable the Docker service: sudo systemctl enable docker.service Now we can start the Docker service: sudo systemctl start docker Load Docker HDP sandbox We will attempt to load the HDP sandbox into Docker using the default Docker settings. This process will fail after the image has loaded approximately 10GB of data. First, make sure a copy of the Docker HDP sandbox file HDP_2.5_docker.tar.gz is located in your project directory. This will allow you to access the file via the /vagrant/ directory within your virtual machine. I'm doing this on a Mac and my file was downloaded in ~/Downloads . This command is run on your local computer: cp ~/Downloads/HDP_2.5_docker.tar.gz ~/Development/Vagrant/centos7-docker/ Now we can try loading the sandbox image on the VM. You need to use sudo to interact with Docker. You should see something similar to the following: sudo docker load < /vagrant/HDP_2.5_docker.tar.gz b1b065555b8a: Loading layer [==================================================>] 202.2 MB/202.2 MB 0b547722f59f: Loading layer [=====================================> ] 10.36 GB/13.84 GB ApplyLayer exit status 1 stdout: stderr: write /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/jre/lib/rt.jar: no space left on device As you can see above, the load command failed and it did so a little more than 10GB into the process. That is because the Docker virtual machine image defaults to 10GB in size. We can check to see where our Docker Root is located: sudo docker info | grep "Docker Root" WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled Docker Root Dir: /var/lib/docker We can see from above the default is /var/lib/docker . Change Docker configuration CentOS 7 uses systemd to manage services. You can read more about it here systemd. The older methods of modifying Docker configuration using /etc/sysconfig based configuration files should be avoided. While you can get everything working using this approach, it's better to use the appropriate systemd methods. Some people have modified the /etc/systemd/system/multi-user.target.wants/docker.service file to make changes to the Docker configuration. This file can be overwritten during software updates, so it is best to not modify this file. Instead you should use /etc/systemd/system/docker.service.d/docker.conf . We need to create an /etc/systemd/system/docker.service.d directory. This is where our docker.conf configuration file will be located. sudo mkdir -p /etc/systemd/system/docker.service.d Now we can edit our docker.conf file: sudo vi /etc/systemd/system/docker.service.d/docker.conf You should add the following configuration to the file: [Service] ExecStart= ExecStart=/usr/bin/dockerd --graph="/mnt/docker-data" --storage-driver=overlay --storage-opt=dm.basesize=30G The blank ExecStart= entry is not a typo. It is required to wipe the default configuration from memory. The --graph parameter specifies the location where the Docker image file should be located. Make sure this location has sufficient room for your expected data size. The --storage-driver parameter specifies the storage driver for Docker. The default is devicemapper which is limited to 10GB. Set it to overlay to allow for larger file sizes. You can read more about the storage drivers here docker storage drivers. The --storage-opt parameter allows us to change the base image size of the virtual machine. In this example I've set my base image size to 30GB. You may want to use a larger size. Restart Docker Now that we've updated our configuration, we need to restart the Docker daemon. sudo systemctl daemon-reload sudo systemctl restart docker We can now check to see where our Docker Root is. You should see something similar to this: sudo docker info | grep "Docker Root" WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled Docker Root Dir: /mnt/docker-data As you can see, our Docker Root has moved to /mnt/docker-data which is the location we specified with the --graph parameter in our docker.conf file. Load Docker HDP sandbox Now that we have updated our Docker configuration and restarted the daemon, we should be able to load our HDP sandbox again. Let's run the load command to see if it completed. You should see something similar to this: sudo docker load < /vagrant/HDP_2.5_docker.tar.gz b1b065555b8a: Loading layer [==================================================>] 202.2 MB/202.2 MB 0b547722f59f: Loading layer [==================================================>] 13.84 GB/13.84 GB 99d7327952e0: Loading layer [==================================================>] 234.8 MB/234.8 MB 294b1c0e07bd: Loading layer [==================================================>] 207.5 MB/207.5 MB fd5c10f2f1a1: Loading layer [==================================================>] 387.6 kB/387.6 kB 6852ef70321d: Loading layer [==================================================>] 163 MB/163 MB 517f170bbf7f: Loading layer [==================================================>] 20.98 MB/20.98 MB 665edb80fc91: Loading layer [==================================================>] 337.4 kB/337.4 kB Loaded image: sandbox:latest As you can see, the load command was successful. Review If you successfully followed along with this tutorial, we were able setup a CentOS 7 virtual machine using Vagrant. We updated the Docker configuration to use a different location for the virtual machine image and changed the base size of that image. Once these changes were complete, we were able to successfully load the HDP sandbox image. You can read more about how to install Docker on CentOS 7 here https://docs.docker.com/engine/installation/linux/centos/ and configure it here https://docs.docker.com/engine/admin/

myoung · ‎11-09-2016

@Abdelmajid Boutjim The best answer will depend on what the data looks like and what tools you have available. A common way to load csv based text files into HBase is to use the importtsv tool: http://hbase.apache.org/0.94/book/ops_mgt.html#importtsv Take a look at this HCC article: https://community.hortonworks.com/articles/4942/import-csv-data-into-hbase-using-importtsv.html which is a tutorial you can follow.

myoung · ‎11-09-2016

@Roberto Sancho 1. Have you tried running it with hive.execution.engine=mr to verify that works properly? 2. Did you try adding the jar file to a Hive session to verify if that works properly? 3, Try setting set the tez classpath via tez.cluster.additional.classpath.prefix which is set in tez-site.xml via Ambari. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.3/bk_installing_manually_book/content/ref-ffec9e6b-41f4-47de-b5cd-1403b4c4a7c8.1.html

myoung · ‎11-09-2016

@Jannik Franz You don't say whether you started the Kafka service. Kafka the messaging system used to keep everything in sync. You can read more about the configuration here http://atlas.incubator.apache.org/Configuration.html and architecture here http://atlas.incubator.apache.org/Architecture.html. Can you confirm that Kafka is started? If it is not, can you start Kafka and try to repeat the process?

myoung · ‎11-09-2016

@Atsushi Marumo I have written an article guiding you through the process with CentOS 7. This is the preferred way to do this: https://community.hortonworks.com/content/kbentry/65714/how-to-modify-the-default-docker-configuration-on.html You can read more about CentOS 7 and Docker image location here: https://docs.docker.com/engine/admin/systemd/

myoung · ‎11-09-2016

@Marcia Hon I'm glad you were able to get things working!

myoung · ‎11-08-2016

@Roberto Sancho I've seen reports that Tez will remove the http-commons jar from the classpath. You can try adding it back into your session like this (update path and version numbers as appropriate): add jar /usr/hdp/2.3.0.0-2557/hive/lib/commons-httpclient-3.0.1.jar Another thing you can do is set the execution engine in your session to see if the error goes away: hive.execution.engine=mr

Online	Offline
Last Visited	‎02-08-2019 07:03 PM

Member Since	‎02-09-2016 09:44 PM
Last Visited	‎02-08-2019 07:03 PM
Posts	559
Kudos received	413

Cloudera Community

Re: How can I force the getTwitter processor to no...

Re: Send Ambari Metric to Elasticsearch

Re: Ingesting unformatted, unordered data from hdf...

Re: What would the audit record on Zeppelin users ...

Re: Automate loading data into HDFS

Re: Is the hortonworks sandbox download for virtua...

Re: Cannot docker load < HDP_2.5_docker.tar on Cen...

Re: How to load data from a text file into hbase??

How to modify the default Docker configuration on ...

Re: How to load data from a text file into hbase??

Re: Error insert hive - elastic

Re: Atlas not tracking Hive lineage

Re: Docker hdp 2.5 import is very slow

Re: Cannot docker load < HDP_2.5_docker.tar on Cen...

Re: Error insert hive - elastic