Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Guru

Modern data driven applications require a "Connected Platform" capable of bringing data in and out to/from the Internet of Things, mobile users, and social media in real time. In order to monetize all of that real time data the platform must have the ability to process Petabytes of data to create adaptive learning algorithms and apply those algorithms in real time as the data streams in and out of the platform. However, the modern data application cannot be effectively utilized or operated without an application tier that allows the business to visualize, interact, and act on the massive volumes of data and insight coming in and out in real time and accumulating in storage. The Hortonworks Connected Platform "HDP+HDF" has the capability to act as a PaaS that can host the application tier of the modern data application along side of all of the data processing.

It is possible to use Slider to run a dockerized application managed by Yarn inside of the Hadoop cluster similar to an application PaaS. This can be accomplished as follows:

1. Create a web application project using that includes the application server embedded in the package. The resulting package should be runnable something like a Java runnable jar. This can be accomplished using Maven. Here is an example oft he application packaging:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<groupId>BigData</groupId>
	<artifactId>ShopFloorUI</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<packaging>jar</packaging>
	<properties>
		<docker.registry.name></docker.registry.name>
		<docker.repository.name>${docker.registry.name}vvaks/biologicsmanufacturingui</doc	ker.repository.name>
	<tomcat.version>7.0.57</tomcat.version>
	</properties>
	<dependencies>
		<dependency>
			<groupId>org.apache.tomcat.embed</groupId>
			<artifactId>tomcat-embed-core</artifactId>
			<version>${tomcat.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.tomcat.embed</groupId>
			<artifactId>tomcat-embed-logging-juli</artifactId>
			<version>${tomcat.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.tomcat.embed</groupId>
			<artifactId>tomcat-embed-jasper</artifactId>
			<version>${tomcat.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.tomcat</groupId>
			<artifactId>tomcat-jasper</artifactId>
			<version>${tomcat.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.tomcat</groupId>
			<artifactId>tomcat-jasper-el</artifactId>
			<version>${tomcat.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.tomcat</groupId>
			<artifactId>tomcat-jsp-api</artifactId>
			<version>${tomcat.version}</version>
		</dependency>
		<dependency>
			<groupId>org.eclipse.jetty</groupId>
			<artifactId>jetty-client</artifactId>
			<version>9.3.6.v20151106</version>
		</dependency>
		<dependency>
			<groupId>org.eclipse.jetty</groupId>
			<artifactId>jetty-util</artifactId>
			<version>9.3.6.v20151106</version>
		</dependency>
		<dependency>
			<groupId>org.cometd.java</groupId>
			<artifactId>cometd-api</artifactId>
			<version>1.1.5</version>
		</dependency>
		<dependency>
			<groupId>org.cometd.java</groupId>
			<artifactId>cometd-java-client</artifactId>
			<version>3.0.7</version>
		</dependency>
		<dependency>
			<groupId>javax.el</groupId>
			<artifactId>el-api</artifactId>
			<version>2.2</version>
		</dependency>
		<dependency>
			<groupId>javax.servlet</groupId>
			<artifactId>javax.servlet-api</artifactId>
			<version>3.1.0</version>
		</dependency>
		<dependency>
			<groupId>javax.servlet.jsp</groupId>
			<artifactId>jsp-api</artifactId>
			<version>2.2</version>
		</dependency>
		<dependency>
			<groupId>javax.servlet.jsp.jstl</groupId>
			<artifactId>javax.servlet.jsp.jstl-api</artifactId>
			<version>1.2.1</version>
		</dependency>
		<dependency>
			<groupId>org.codehaus.jackson</groupId>
			<artifactId>jackson-core-asl</artifactId>
			<version>1.9.13</version>
		</dependency>
		<dependency>
			<groupId>org.codehaus.jackson</groupId>
			<artifactId>jackson-mapper-asl</artifactId>
			<version>1.9.13</version>
		</dependency>
		<dependency>
			<groupId>org.slf4j</groupId>
			<artifactId>slf4j-simple</artifactId>
			<version>1.7.13</version>
		</dependency>
	</dependencies>
	<build>
		<finalName>${project.artifactId}</finalName>
		<sourceDirectory>src/</sourceDirectory>
		<resources>
			<resource>
				<directory>src/main/webapp</directory>
				<targetPath>META-INF/resources</targetPath>
			</resource>
			<resource>
				<directory>src/main/resources</directory>
				<targetPath>META-INF/resources</targetPath>
			</resource>
		</resources>
		<outputDirectory>classes/</outputDirectory>
		<plugins>
			<plugin>
				<artifactId>maven-assembly-plugin</artifactId>
				<configuration>
					<descriptorRefs>
					      <descriptorRef>jar-with-dependencies</descriptorRef>
					</descriptorRefs>
					<archive>
						<manifest>
			<mainClass>com.hortonworks.iot.shopfloorui.ShopFloorUIMain</mainClass>
						</manifest>
					</archive>
				</configuration>
				<executions>
					<execution>
						<phase>package</phase>
						<goals>
							<goal>single</goal>
						</goals>
					</execution>
				</executions>
			</plugin>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-war-plugin</artifactId>
				<version>2.1.1</version>
				<configuration>
					<webappDirectory>webapp/</webappDirectory>
					<finalName>ShopFloorUI</finalName>
				</configuration>
			</plugin>
			<plugin>
				<groupId>org.jolokia</groupId>
				<artifactId>docker-maven-plugin</artifactId>
				<version>0.13.3</version>
				<configuration>
					<images>
						<image>
						<!-- <alias>${project.artifactId}</alias>
				<name>${docker.repository.name}:${project.version}</name> -->
				<alias>biologicsmanufacturingui</alias>
				<name>${docker.repository.name}</name>
				<build>
					<from>java:8-jre</from>
					<maintainer>vvaks</maintainer>
					<assembly>
						<descriptor>docker-assembly.xml</descriptor>
					</assembly>
					<ports>
						<port>8090</port>
					</ports>
					<cmd>
						<shell>java -jar \
					/maven/ShopFloorUI-jar-with-dependencies.jar server \
					/maven/docker-config.yml</shell>
					</cmd>
				</build>
			</image>
		</images>
	</configuration>
	</plugin>
	</plugins>
	</build>
</project>

2. Create a docker container with that contains the runnable package and has the command to start that package on startup. Maven has a docker maven plugin that automates the creation of a docker container using the runnable created by the Maven assembly plugin. In order for the plugin to work, docker must be installed and assessable by the Eclipse session.

3. Create an account on https://hub.docker.com/. You will need this account to publish the docker container that you created locally. This is important as the Slider client will attempt to download the docker container from the docker hub not from the local repository. This is essential since otherwise it would necessary to distribute the docker container to every single node in the cluster since Yarn can decide to start it up using any node manager agent.

4. Create the Slider configuration files:

appConfig.json - This file contains the command that the node manager will execute to start the docker container locally as well as the command to run periodically to check the health of the container. The example below starts two docker containers, one called MAPUI and another called COMETD.

{
    "schema": "http://example.org/specification/v2.0.0",
    "metadata": {},
    "global": {},
    "components": {
            "MAPUI": {
                "mapui.commandPath": "/usr/bin/docker", 
                "mapui.options":"-d --net=host",
                "mapui.statusCommand":"docker inspect -f {{.State.Running}} ${CONTAINER_ID} | grep true"
            },
	    "COMETD": {
                "cometd.commandPath": "/usr/bin/docker",
                "cometd.options":"-d --net=host",
                "cometd.statusCommand":"docker inspect -f {{.State.Running}} ${CONTAINER_ID} | grep true"
            }
    }
}

metainfo.json - This file contains the image to download from docker hub as well as the ports that the container is listening on. The component names must match up across all three configuration files.

{
"schemaVersion": "2.1",
"application": {
        "name": "MAPUI",
        "components": [
            {
                "name": "MAPUI",
                "type": "docker",
                "dockerContainers": [
                    {
                        "name": "mapui",
                        "commandPath": "/usr/bin/docker",
                        "image": "vvaks/mapui",
                        "ports": [{"containerPort" : "8091", "hostPort" : "8091"}]
                    }
                ]
            },
	    {
                "name": "COMETD",
                "type": "docker",
                "dockerContainers": [
                    {
                        "name": "cometd",
                        "commandPath": "/usr/bin/docker",
                        "image": "vvaks/cometd",
				"ports": [{"containerPort" : "8090", "hostPort" : "8090"}]
                    }
                ]
            }
	]
    }
}


resources.json - This file contains the resources required by the application. Slider will use these specifications to request the required resources from Yarn. The component names must match up across all three configuration files.

{
    "schema": "http://example.org/specification/v2.0.0", 
    "metadata": { }, 
    "global": { }, 
    "components": {"slider-appmaster": { }, 
        "MAPUI": {
            "yarn.role.priority": "1", 
            "yarn.component.instances": "1", 
            "yarn.memory": "256"
        },
	"COMETD": {
            "yarn.role.priority": "2",
            "yarn.component.instances": "1",
            "yarn.memory": "256"
        }
    }
}

5. Make sure that a Slider client is available on the host from which you will launch the request and that the Slider client is configured to point at the target Yarn cluster's Resource Manager.

slider create mapui --template /home/docker/dockerbuild/mapui/appConfig.json --metainfo /home/docker/dockerbuild/mapui/metainfo.json --resources /home/docker/dockerbuild/mapui/resources.json

Slider will reach out to Yarn, request the containers specified in resources.json and then instruct Yarn to run the command specified in appInfo.json with the details specified in metainfo.json. At this point you should see the application listed as a Slider type application in Yarn Resource Manager UI. You should be able to click on the application link and view the logs being generated by the containers as the application starts up. Of course, Docker must be installed and running on the nodes that make up the queue where slider will request the application to start.

It should be noted that this approach does not solve all of the problems that a PaaS does. The issue of application instance registry still has to be dealt with. There is no, out of the box approach, that allows discovery and routing of the client to the application after it starts or upon container failure. The following link addresses how to deal with this issue:

https://slider.incubator.apache.org/design/registry/a_YARN_service_registry.html

All of these issues will be solved by the Yarn.Next initiative. The HDP engineering team is hard at work making this happen. Yarn.Next will embedded all of the capabilities described above as part of core Yarn. This will allow the creation of a Modern Data Application, including all components like Storm, HBase, and the Application tier by simply providing Yarn with a JSON descriptor. The application start with all of the required components pre-integrated and discoverable via standard DNS resolution. Stay tuned for the next installment.

For working examples, check out these Repos. Each of these is a working example of a modern data application running on the Hortonworks Connected Platform, including the application tier.

https://community.hortonworks.com/content/repo/27236/credit-fraud-prevention-demo.html

https://community.hortonworks.com/content/repo/29196/biologics-manufacturing-optimization-demo.html

https://community.hortonworks.com/content/repo/26288/telecom-predictive-maintenance.html

3,184 Views
Comments
avatar
Master Guru

"There is no, out of the box approach, that allows discovery and routing of the client to the application after it starts or upon container failure."

Is there any way apart from node labels to tell slider to request containers on some nodes of the cluster? I fear otherwise this is not very useful.

However if you could say:

Start containers on datanodes 1-4 and try to keep them up. It would be quite useful. You could have a load balancer in front of it for high availability. Without that I do not see the usecases. I mean you could do that with nodelabels I suppose but it would be a big effort.

avatar
Guru

@Benjamin Leonhardi

With the release of Yarn.Next, the containers will receive their own IP address and get registered in DNS. The FQDN will be available via a rest call to Yarn. If the current Yarn container die, the docker container will start in a different Yarn container somewhere in the cluster. As long as all clients are pointing at the FQDN of the application, the outage will be nearly transparent. In the mean time, there are several options using only slider but it requires some scripting or registration in Zookeeper. If you run:

slider lookup --id application_1462448051179_0002
2016-05-08 01:55:51,676 [main] INFO  impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2016-05-08 01:55:53,847 [main] WARN  shortcircuit.DomainSocketFactory - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
2016-05-08 01:55:53,868 [main] INFO  client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
{
  "applicationId" : "application_1462448051179_0002",
  "applicationAttemptId" : "appattempt_1462448051179_0002_000001",
  "name" : "biologicsmanufacturingui",
  "applicationType" : "org-apache-slider",
  "user" : "root",
  "queue" : "default",
  "host" : "sandbox.hortonworks.com",
  "rpcPort" : 1024,
  "state" : "RUNNING",
  "diagnostics" : "",
  "url" : "http://sandbox.hortonworks.com:8088/proxy/application_1462448051179_0002/",
  "startTime" : 1462454411514,
  "finishTime" : 0,
  "finalStatus" : "UNDEFINED",
  "origTrackingUrl" : "http://sandbox.hortonworks.com:1025",
  "progress" : 1.0
}
2016-05-08 01:55:54,542 [main] INFO  util.ExitUtil - Exiting with status 0

You do get the host the container is currently bound to. Since the instructions bind the docker container to the host IP, this would allow URL discovery but as I said, not out of the box. This article is merely the harbinger to Yarn.Next as that will integrate the PaaS capabilities into Yarn itself, including application registration and discovery.