Member since
02-09-2016
559
Posts
422
Kudos Received
98
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2182 | 03-02-2018 01:19 AM | |
3570 | 03-02-2018 01:04 AM | |
2410 | 08-02-2017 05:40 PM | |
2370 | 07-17-2017 05:35 PM | |
1734 | 07-10-2017 02:49 PM |
11-10-2016
02:15 PM
@Roger Young There are 3 versions of the HDP sandbox: Docker, VirtualBox and VMWare. The Docker version is a native Docker version intended to be loaded into Docker. Both the VirtualBox and VMWare sandboxes now use Docker internally to host the sandbox. So all of the sandboxes are based on Docker containers. That means access the sandbox and making configuration changes are now a little different.
... View more
11-10-2016
01:53 PM
@Marcia Hon I wrote an HCC article that walks you through the process of increasing the base size of the CentOS 7 docker vm image. This is the preferred method for making these changes. https://community.hortonworks.com/content/kbentry/65714/how-to-modify-the-default-docker-configuration-on.html
... View more
11-10-2016
01:32 PM
@Abdelmajid Boutjim If you have commas embedded within the data itself and your columns are not using quotes, then your problem is much more difficult. Your data can be any length so tackling this is hard to do programmatically. Is there anyway to get a new export of the data using a different delimiter like ~ or | or having each column of data quoted?
... View more
11-10-2016
02:35 AM
2 Kudos
Objective If you are using Docker on CentOS 7, you may have run into issues importing the Docker version of the Hortonworks Sandbox for HDP 2.5. The default configuration of Docker on CentOS 7 limits your Docker virtual machine image to 10GB in size. The Docker HDP sandbox is 13GB in size. This will cause the loading process of the sandbox to fail. This tutorial will guide you through the process of installing Docker on CentOS 7 using Vagrant and modifying the configuration of Docker to move the location of and increase the size of Docker virtual machine image. While we are using Vagrant + Virtualbox, this process should work for any install of CentOS (Amazon, etc) with small changes because VirtualBox and related plugins are not needed. Prerequisites You should already have downloaded the Docker HDP 2.5 Sandbox. Read more here: Docker HDP Sandbox You should already have installed VirtualBox 5.1.x. Read more here: VirtualBox You should already have installed Vagrant 1.8.6. Read more here: Vagrant You should already have installed the vagrant-vbguest plugin. This plugin will keep the VirtualBox Guest Additions software current as you upgrade your kernel and/or VirtualBox versions. Read more here: vagrant-vbguest You should already have installed the vagrant-hostmanager plugin. This plugin will automatically manage the /etc/hosts file on your local mac and in your virtual machines. Read more here: vagrant-hostmanager Scope Mac OS X 10.11.6 (El Capitan) VirtualBox 5.1.6 Vagrant 1.8.6 vagrant-vbguest plugin 0.13.0 vagrant-hostnamanger plugin 1.8.5 Steps Create Vagrant project directory Before we get started, determine where you want to keep your Vagrant project files. Each Vagrant project should have its own directory. I keep my Vagrant projects in my ~/Development/Vagrant directory. You should also use a helpful name for each Vagrant project directory you create.
cd ~/Development/Vagrant
mkdir centos7-docker
cd centos7-docker We will be using a CentOS 7.2 Vagrant box, so I include centos7 in the Vagrant project name to differentiate it from a Centos 6 project. The project is for Docker, so I include that in the name. Thus we have a project directory name of centos7-docker. Create Vagrant project files Create Vagrantfile The Vagrantfile tells Vagrant how to configure your virtual machines. Here is my Vagrantfile:
# -*- mode: ruby -*-
# vi: set ft=ruby :
# Using yaml to load external configuration files
require 'yaml'
Vagrant.configure("2") do |config|
# Using the hostmanager vagrant plugin to update the host files
config.hostmanager.enabled = true
config.hostmanager.manage_host = true
config.hostmanager.manage_guest = true
config.hostmanager.ignore_private_ip = false
# Loading in the list of commands that should be run when the VM is provisioned.
commands = YAML.load_file('commands.yaml')
commands.each do |command|
config.vm.provision :shell, inline: command
end
# Loading in the VM configuration information
servers = YAML.load_file('servers.yaml')
servers.each do |servers|
config.vm.define servers["name"] do |srv|
srv.vm.box = servers["box"] # Speciy the name of the Vagrant box file to use
srv.vm.hostname = servers["name"] # Set the hostname of the VM
srv.vm.network "private_network", ip: servers["ip"], :adapater=>2 # Add a second adapater with a specified IP
srv.vm.network :forwarded_port, guest: 22, host: servers["port"] # Add a port forwarding rule
srv.vm.provision :shell, inline: "sed -i'' '/^127.0.0.1\\t#{srv.vm.hostname}\\t#{srv.vm.hostname}$/d' /etc/hosts"
srv.vm.provider :virtualbox do |vb|
vb.name = servers["name"] # Name of the VM in VirtualBox
vb.cpus = servers["cpus"] # How many CPUs to allocate to the VM
vb.memory = servers["ram"] # How much memory to allocate to the VM
vb.customize ["modifyvm", :id, "--cpuexecutioncap", "75"] # Limit to VM to 75% of available CPU
end
end
end
end Create a servers.yaml file The servers.yaml file contains the configuration information for our VMs. Here is the content from my file:
---
- name: docker
box: bento/centos-7.2
cpus: 6
ram: 12288
ip: 192.168.56.100
port: 10022 NOTE: My configuration uses 6 CPUs and 12GB of memory for VirtualBox. Make sure you have enough resources to use this configuration. For the purposes of the this demo, you can lower this to a 2CPU and 4GB of memory. Create commands.yaml file The commands.yaml file contains the list of commands that should be run on each VM when they are first provisioned. This allows us to automate configuration tasks that would other wise be tedious and/or repetitive. Here is the content from my file:
- "sudo yum -y install net-tools ntp wget"
- "sudo systemctl enable ntpd && sudo systemctl start ntpd"
- "sudo systemctl disable firewalld && sudo systemctl stop firewalld"
- "sudo sed -i --follow-symlinks 's/^SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux" Start Virtual Machine Once you have created the 3 files in your Vagrant project directory, you are ready to start your virtual machine. Creating the vm for the first time and starting it every time after that uses the same command:
vagrant up NOTE: During the startup process, the vagrant-hostmanager will prompt you for a password. This is the sudo password for your local machine. It needs to use sudo to update the /etc/hosts file. Once the process is complete you should have 1 server running. You can verify by looking at the VirtualBox user interface; you should have a virtual machine called docker running. Connect to virtual machine You are able to login to the virtual machine via ssh using the vagrant ssh command.
vagrant ssh Install Docker in Vagrant virtual machine Now that you are connected to the virtual machine, it's time to install Docker. First we'll create the docker.repo file:
sudo tee /etc/yum.repos.d/docker.repo <<-'EOF'
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7/
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
EOF Now we can install Docker:
sudo yum install docker-engine Now we need to enable the Docker service:
sudo systemctl enable docker.service Now we can start the Docker service:
sudo systemctl start docker Load Docker HDP sandbox We will attempt to load the HDP sandbox into Docker using the default Docker settings. This process will fail after the image has loaded approximately 10GB of data. First, make sure a copy of the Docker HDP sandbox file HDP_2.5_docker.tar.gz is located in your project directory. This will allow you to access the file via the /vagrant/ directory within your virtual machine. I'm doing this on a Mac and my file was downloaded in ~/Downloads . This command is run on your local computer:
cp ~/Downloads/HDP_2.5_docker.tar.gz ~/Development/Vagrant/centos7-docker/ Now we can try loading the sandbox image on the VM. You need to use sudo to interact with Docker. You should see something similar to the following:
sudo docker load < /vagrant/HDP_2.5_docker.tar.gz
b1b065555b8a: Loading layer [==================================================>] 202.2 MB/202.2 MB
0b547722f59f: Loading layer [=====================================> ] 10.36 GB/13.84 GB
ApplyLayer exit status 1 stdout: stderr: write /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/jre/lib/rt.jar: no space left on device As you can see above, the load command failed and it did so a little more than 10GB into the process. That is because the Docker virtual machine image defaults to 10GB in size. We can check to see where our Docker Root is located:
sudo docker info | grep "Docker Root"
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Docker Root Dir: /var/lib/docker We can see from above the default is /var/lib/docker . Change Docker configuration CentOS 7 uses systemd to manage services. You can read more about it here systemd. The older methods of modifying Docker configuration using /etc/sysconfig based configuration files should be avoided. While you can get everything working using this approach, it's better to use the appropriate systemd methods. Some people have modified the /etc/systemd/system/multi-user.target.wants/docker.service file to make changes to the Docker configuration. This file can be overwritten during software updates, so it is best to not modify this file. Instead you should use /etc/systemd/system/docker.service.d/docker.conf . We need to create an /etc/systemd/system/docker.service.d directory. This is where our docker.conf configuration file will be located.
sudo mkdir -p /etc/systemd/system/docker.service.d Now we can edit our docker.conf file:
sudo vi /etc/systemd/system/docker.service.d/docker.conf You should add the following configuration to the file:
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --graph="/mnt/docker-data" --storage-driver=overlay --storage-opt=dm.basesize=30G The blank ExecStart= entry is not a typo. It is required to wipe the default configuration from memory. The --graph parameter specifies the location where the Docker image file should be located. Make sure this location has sufficient room for your expected data size. The --storage-driver parameter specifies the storage driver for Docker. The default is devicemapper which is limited to 10GB. Set it to overlay to allow for larger file sizes. You can read more about the storage drivers here docker storage drivers. The --storage-opt parameter allows us to change the base image size of the virtual machine. In this example I've set my base image size to 30GB. You may want to use a larger size. Restart Docker Now that we've updated our configuration, we need to restart the Docker daemon.
sudo systemctl daemon-reload
sudo systemctl restart docker We can now check to see where our Docker Root is. You should see something similar to this:
sudo docker info | grep "Docker Root"
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Docker Root Dir: /mnt/docker-data As you can see, our Docker Root has moved to /mnt/docker-data which is the location we specified with the --graph parameter in our docker.conf file. Load Docker HDP sandbox Now that we have updated our Docker configuration and restarted the daemon, we should be able to load our HDP sandbox again. Let's run the load command to see if it completed. You should see something similar to this:
sudo docker load < /vagrant/HDP_2.5_docker.tar.gz
b1b065555b8a: Loading layer [==================================================>] 202.2 MB/202.2 MB
0b547722f59f: Loading layer [==================================================>] 13.84 GB/13.84 GB
99d7327952e0: Loading layer [==================================================>] 234.8 MB/234.8 MB
294b1c0e07bd: Loading layer [==================================================>] 207.5 MB/207.5 MB
fd5c10f2f1a1: Loading layer [==================================================>] 387.6 kB/387.6 kB
6852ef70321d: Loading layer [==================================================>] 163 MB/163 MB
517f170bbf7f: Loading layer [==================================================>] 20.98 MB/20.98 MB
665edb80fc91: Loading layer [==================================================>] 337.4 kB/337.4 kB
Loaded image: sandbox:latest As you can see, the load command was successful. Review If you successfully followed along with this tutorial, we were able setup a CentOS 7 virtual machine using Vagrant. We updated the Docker configuration to use a different location for the virtual machine image and changed the base size of that image. Once these changes were complete, we were able to successfully load the HDP sandbox image. You can read more about how to install Docker on CentOS 7 here https://docs.docker.com/engine/installation/linux/centos/ and configure it here https://docs.docker.com/engine/admin/
... View more
Labels:
11-09-2016
06:22 PM
2 Kudos
@Abdelmajid Boutjim The best answer will depend on what the data looks like and what tools you have available. A common way to load csv based text files into HBase is to use the importtsv tool: http://hbase.apache.org/0.94/book/ops_mgt.html#importtsv Take a look at this HCC article: https://community.hortonworks.com/articles/4942/import-csv-data-into-hbase-using-importtsv.html which is a tutorial you can follow.
... View more
11-09-2016
06:14 PM
@Roberto Sancho 1. Have you tried running it with hive.execution.engine=mr to verify that works properly? 2. Did you try adding the jar file to a Hive session to verify if that works properly? 3, Try setting set the tez classpath via tez.cluster.additional.classpath.prefix which is set in tez-site.xml via Ambari. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.3/bk_installing_manually_book/content/ref-ffec9e6b-41f4-47de-b5cd-1403b4c4a7c8.1.html
... View more
11-09-2016
05:48 PM
1 Kudo
@Jannik Franz You don't say whether you started the Kafka service. Kafka the messaging system used to keep everything in sync. You can read more about the configuration here http://atlas.incubator.apache.org/Configuration.html and architecture here http://atlas.incubator.apache.org/Architecture.html. Can you confirm that Kafka is started? If it is not, can you start Kafka and try to repeat the process?
... View more
11-09-2016
05:44 PM
1 Kudo
@Atsushi Marumo I have written an article guiding you through the process with CentOS 7. This is the preferred way to do this: https://community.hortonworks.com/content/kbentry/65714/how-to-modify-the-default-docker-configuration-on.html You can read more about CentOS 7 and Docker image location here: https://docs.docker.com/engine/admin/systemd/
... View more
11-09-2016
03:36 AM
@Marcia Hon I'm glad you were able to get things working!
... View more
11-08-2016
07:08 PM
@Roberto Sancho I've seen reports that Tez will remove the http-commons jar from the classpath. You can try adding it back into your session like this (update path and version numbers as appropriate): add jar /usr/hdp/2.3.0.0-2557/hive/lib/commons-httpclient-3.0.1.jar
Another thing you can do is set the execution engine in your session to see if the error goes away: hive.execution.engine=mr
... View more