Member since
02-12-2016
33
Posts
44
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5877 | 11-10-2018 12:59 AM | |
2191 | 10-30-2018 12:47 AM | |
3552 | 04-25-2016 08:06 PM |
05-08-2020
09:30 AM
In Part 1, I shared an alternative architecture for Prometheus that increases scalability and flexibility of time series metric monitoring. In part 2, I will walk through an extension of the architecture that unites metric and log data with a unified, scalable data pipeline.
Part 1 Architecture
Part 2 Architecture
By adding log collection agents (e.g. MiNiFi) and a search cluster (e.g. Solr), the solution architecture can be extended to support log data in addition to time series metrics. This has the advantage of reducing duplicated infrastructure components for a more efficient and supportable solution.
Additional NiFi processors can be added to the flow for pre-processing (e.g. filtering, routing, scoring) the incoming data (e.g. ERROR vs INFO messages). Rulesets (e.g. Drools) from expert systems can be embedded directly into the flow, while ML models can be either directly embedded or hosted as a service that NiFi calls. Further downstream, Flink can be used to apply stateful stream processing (e.g. joins, windowing).
By applying these advanced analytics to metrics and logs in-stream, before the data lands, operations teams can shift from digging through charts and graphs to acting on intelligent, targeted alerts with the full context necessary to resolve any issue that may arise.
The journey to streaming analytics with ML and expert systems requires rethinking architectures, but the value gained from the timely insights that otherwise would not be possible is well worth the upfront refactoring and results in a much more stable and efficient system in the long run.
... View more
11-27-2019
08:58 AM
2 Kudos
Traditional Prometheus Architecture (Image Courtesy: https://prometheus.io/docs/introduction/overview/) Prometheus is great. It has a huge number of integrations and provides a great metric monitoring platform, especially when working with Kubernetes. However, it does have a few shortcomings. The aim of this alternative architecture is to preserve the best parts of Prometheus while augmenting its weaker points with more powerful technologies. The service discovery and metric scraping framework in Prometheus is its greatest strength, but it is greatly limited by its tight coupling to the TSDB system inside Prometheus. While it is possible to replace the TSDB inside Prometheus with an external database, the data retrieval process only supports writing into this one database. Maui Architecture The greatest strength of Prometheus, the service discovery and metric scraping framework, can now be used within Apache NiFi with the introduction of the GetPrometheusMetrics processor. This processor uses cgo and JNI to leverage the actual Prometheus libraries for service discovery and metric scraping. The standard Prometheus YML configurations are provided to the processor and JSON data is output as it scrapes metrics from configured and/or discovered endpoints. When combined with NiFi’s HTTP listener processors, the entire data ingestion portion of Prometheus can be embedded within NiFi. The advantage of NiFi for data ingestion is that it comes with a rich set of processors for transforming, filtering, routing, and publishing data, potentially to many different places. The ability to load data into the data store (or data stores) of choice increases extensibility and enables more advanced analytics. One good option for the datastore is Apache Druid. Druid was built for both real-time and historical analytics at scale (ingest of millions of events per second plus petabytes of history). It is supported by many dashboarding tools natively (such as Grafana or Superset), and it supports SQL through JDBC, making it accessible from a wide array of tools (such as Tableau). Druid addresses the scalability issues of the built-in TSDB while still providing a similar user experience and increasing extensibility to more user interfaces. The option of sending scraped data to many locations provides an easy way to integrate with other monitoring frameworks, or to perform advanced analytics and machine learning. For example, loading metrics into Kafka makes it accessible in real-time to stream processing engines (like Apache Flink), function as a service engines (like OpenWhisk), and custom microservices. With this architecture it is now possible to apply ML to Prometheus-scaped metrics in real-time and to activate functions when anomalies are found. Part 2 of this article can be found here. Artifacts The GetPrometheusMetrics processor can be found in this repository: https://github.com/SamHjelmfelt/nifi-prometheus-metrics A sample NiFi template using GetPrometheusMetrics to write into both Druid and Kafka can be found here: https://gist.github.com/SamHjelmfelt/f04aae5489fa88bdedd4bba211d083e0
... View more
Labels:
01-09-2019
02:31 AM
2 Kudos
Docker on YARN is relatively easy to set up on an existing cluster, but full clusters are not always available. My Ember Project was created to simplify dev/test for technologies managed by Ambari and Cloudera Manager. It provides utilities for creating dockerized clusters that use far fewer resources than a full bare metal or VM-based cluster. Additionally, by using pre-built images, the time it takes to get a cluster up and running can be reduced to less than 10 minutes. The following four commands are all that is necessary to download and run a ~5GB image that comes preinstalled with Ambari, Zookeeper, HDFS, and YARN with Docker on YARN pre-configured. Docker containers spawned by YARN will be created on the host machine as peers to the container with YARN inside. All containers are launched into the "ember" docker network by default. Once the container is downloaded, it takes less than 5 minutes to start all services. curl -L https://github.com/SamHjelmfelt/Ember/archive/v1.1.zip -o Ember_1.1.zip
unzip Ember_1.1.zip
cd Ember-1.1/
./ember.sh createFromPrebuiltSample samples/yarnquickstart/yarnquickstart-sample.ini The Ambari UI can be found at http://localhost:8080 The YARN Resource Manager UI can be found at http://localhost:8088 Usage The YARN service REST API documentation can be found here: https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html The YARN app CLI documentation can be found here: https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#application_or_app Testing Place the following service definition into a file (e.g. redis.json) {
"name": "redis-service",
"version": "1.0.0",
"description": "redis example",
"components" :
[
{
"name": "redis",
"number_of_containers": 1,
"artifact": {
"id": "library/redis",
"type": "DOCKER"
},
"launch_command": "",
"resource": {
"cpus": 1,
"memory": "256"
},
"configuration": {
"env": {
"YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE": "true"
}
}
}
]
} Submit the service with the following curl command. YARN should respond back with the applicationId curl -X POST -H "Content-Type: application/json" http://localhost:8088/app/v1/services?user.name=ambari-qa -d @redis.json The service status can be viewed on the YARN UI or through the REST API (python makes it easier to read): curl http://localhost:8088/app/v1/services/redis-service?user.name=ambari-qa | python -m json.tool The service name must be unique in the cluster. If you need to delete your service, the following command can be used: curl -X DELETE http://localhost:8088/app/v1/services/redis-service?user.name=ambari-qa
... View more
Labels:
11-10-2018
01:51 AM
5 Kudos
Update: See here for a Docker on YARN sandbox solution: https://community.hortonworks.com/articles/232540/docker-on-yarn-sandbox.html Overview This guide has been tested with and without Kerberos on HDP 3.0.1. YARN offers a DNS service backed by Zookeeper for service discovery, but that can be challenging to setup. For a quickstart scenario, I will use docker swarm and an overlay network instead. If your environment is a single host, the networking is even simpler. This configuration is not recommended for production. I will use pssh to run commands in parallel across the cluster based on a hostlist file and a workerlist file. The hostlist file should contain every host in the cluster, and the workerlist file should include every node except for the one chosen to be the docker swarm master node. Prerequisites Install HDP 3.0.1 with or without Kerberos Install Docker on every host in the cluster #pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem -o 'StrictHostKeyChecking no'" "echo hostname"
pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo yum install -y yum-utils device-mapper-persistent-data lvm2"
pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo";
pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo yum install -y docker-ce"
pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo systemctl start docker"
pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo systemctl enable docker" Configure docker swarm and create an overlay network ssh -i ~/cloudbreak.pem cloudbreak@<masternode> "sudo docker swarm init"
pssh -i -h workerlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo <output from last command: docker swarm join ...>"
ssh -i ~/cloudbreak.pem cloudbreak@<masternode> "sudo docker network create -d overlay --attachable yarnnetwork" If Kerberos is not enabled, create a default user for containers: pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo useradd dockeruser" Ambari In the YARN general settings tab, toggle the Docker Runtime button to "Enabled". This should change the following setting in Advanced YARN-Site: yarn.nodemanager.container-executor.class=org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor In Advanced YARN-Site, change the following, so all YARN docker containers use the overlay network we created by default: yarn.nodemanager.runtime.linux.docker.default-container-network=yarnnetwork
yarn.nodemanager.runtime.linux.docker.allowed-container-networks=host,none,bridge,yarnnetwork In Custom YARN-Site, add the following if kerberos is not enabled: yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user=dockeruser In Advanced Container Executor: Note that this allows any image from docker hub to be run. to limit the that docker images that can be run, set this property to a comma separated list of trusted registries. Docker images have the form <registry>/<imageName>:<tag>. docker_trusted_registries=* Alternatively, the following Ambari blueprint encapsulates these configurations: {
"configurations" : [
{
"yarn-site" : {
"properties" : {
"yarn.nodemanager.container-executor.class" : "org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor",
"yarn.nodemanager.runtime.linux.docker.default-container-network" : "yarnnetwork",
"yarn.nodemanager.runtime.linux.docker.allowed-container-networks" : "host,none,bridge,yarnnetwork",
"yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user" : "dockeruser"
}
}
},
{
"container-executor" : {
"properties" : {
"docker_trusted_registries" : "library",
"docker_module_enabled" : "true"
}
}
}],
"host_groups" : [
{
"name" : "all",
"components" : [
{"name" : "HISTORYSERVER"},
{"name" : "NAMENODE"},
{"name" : "APP_TIMELINE_SERVER"},
{"name" : "NODEMANAGER"},
{"name" : "DATANODE"},
{"name" : "RESOURCEMANAGER"},
{"name" : "ZOOKEEPER_SERVER"},
{"name" : "SECONDARY_NAMENODE"},
{"name" : "HDFS_CLIENT"},
{"name" : "ZOOKEEPER_CLIENT"},
{"name" : "YARN_CLIENT"},
{"name" : "MAPREDUCE2_CLIENT"}
],
"cardinality" : "1"
}
],
"Blueprints" : {
"blueprint_name" : "yarn sample",
"stack_name" : "HDP",
"stack_version" : "3.0"
}
} Save the configurations and restart YARN Usage The YARN service REST API documentation can be found here: https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html The YARN app CLI documentation can be found here: https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#application_or_app Testing without Kerberos Place the following service definition into a file (e.g. yarnservice.json) {
"name": "redis-service",
"version": "1.0.0",
"description": "redis example",
"components" :
[
{
"name": "redis",
"number_of_containers": 1,
"artifact": {
"id": "library/redis",
"type": "DOCKER"
},
"launch_command": "",
"resource": {
"cpus": 1,
"memory": "256"
},
"configuration": {
"env": {
"YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE": "true"
}
}
}
]
} Submit the service with the following curl command. YARN should respond back with the applicationId The user will need write permission on their HDFS home directory(e.g. hdfs:/user/user1). ambari-qa has it by default. curl -X POST -H "Content-Type: application/json" http://<resource manager>:8088/app/v1/services?user.name=ambari-qa -d @yarnservice.json The service status can be viewed on the YARN UI, or through the REST APIs (python makes it easier to read): curl http://<resource manager>:8088/app/v1/services/redis-service?user.name=ambari-qa | python -m json.tool The service name must be unique in the cluster. If you need to delete your service, the following command can be used: curl -X DELETE http://<resource manager>:8088/app/v1/services/redis-service?user.name=ambari-qa Testing with Kerberos Create a kerberos principal of the format <username>/<hostname>@<realm> The hostname portion of the principal is required. Create a keytab for the principal and upload it to HDFS kadmin.local
>addprinc user1/host1.example.com@EXAMPLE.COM
...
>xst -k user1_host1.keytab user1/host1.example.com@EXAMPLE.COM
...
>exit
hadoop fs -put user1_host1.keytab hdfs:/user/user1/
hadoop fs -chown user1 hdfs:/user/user1/ Place the following service definition into a file (e.g. yarnservice.json) {
"name": "redis-service",
"version": "1.0.0",
"description": "redis example",
"components" :
[
{
"name": "redis",
"number_of_containers": 1,
"artifact": {
"id": "library/redis",
"type": "DOCKER"
},
"launch_command": "",
"resource": {
"cpus": 1,
"memory": "256"
},
"configuration": {
"env": {
"YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE": "true"
}
}
}
],
"kerberos_principal": {
"principal_name": "user1/host1.example.com@EXAMPLE.COM",
"keytab": "hdfs:/user/user1/user1_host1.keytab"
}
} Submit the service with the following curl command. YARN should respond back with the applicationId User1 will need permission to write into their HDFS home directory: (hdfs:/user/user1) curl --negotiate -u : -X POST -H "Content-Type: application/json" http://<resource manager>:8088/app/v1/services -d @yarnservice.json The service status can be viewed on the YARN UI, or through the REST APIs (python makes it easier to read): curl --negotiate -u : http://<resource manager>:8088/app/v1/services/redis-service | python -m json.tool The service name must be unique in the cluster. If you need to delete your service, the following command can be used: curl --negotiate -u : -X DELETE http://<resource manager>:8088/app/v1/services/redis-service Adding a local docker registry Each node in the cluster needs a way of downloading docker images when a service is run. It is possible to just use the public docker hub, but that is not always an option. Similar to creating a local repo for yum, a local registry can be created for Docker. Here is a quickstart that skips the security steps. In production, security best practices should be followed: On a master node, create an instance of the docker registry container. This will bind the registry to port 5000 on the host machine. docker run -d -p 5000:5000 --restart=always --name registry -v /mnt/registry:/var/lib/registry registry:2 Configure each machine to skip HTTPS checks: https://docs.docker.com/registry/insecure/. Here are commands for CentOS 7: pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo echo '{\"insecure-registries\": [\"<registryHost>:5000\"]}' | sudo tee --append /etc/docker/daemon.json"
pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo systemctl restart docker" The YARN service configuration, docker_trusted_registries, needs to be set to star ( * ) or needs to have this local registry in its list (e.g. library,<registryHost>:5000). Restart YARN Testing Local Docker Registry Build, tag, and push an image to the registry docker build -t myImage:1 .
docker tag myImage:1 <registryHost>:5000/myImage:1
docker push <registryHost>:5000/myImage:1 View image via REST curl <registryHost>:5000/v2/_catalog
curl <registryHost>:5000/v2/_catalog/myImage/tags/list Download image to all hosts in cluster (only necessary to demonstrate connectivity. Docker and YARN do this automatically) pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo docker pull <registryHost>:5000/myImage:1" Now, when an image with this registry prefix (e.g. <registryHost>:5000/myImage:1) is used in a YARN service definition, YARN will use the image from this local registry instead of trying to pull from the default public location.
... View more
Labels:
11-10-2018
01:08 AM
That worked. I just uploaded one of the keytabs into hdfs:/user/user1/user1_host1.keytab and updated the "kerberos_principal" section as follows. Is there a plan to remove the hostname requirement? Thanks, @Gour Saha! "kerberos_principal": { "principal_name": "user1/host1.example.com@EXAMPLE.COM", "keytab": "hdfs:/user/user1/user1_host1.keytab" }
... View more
11-10-2018
12:59 AM
Turns out, the guide that I was following was outdated. I tried this again on a different cluster and it worked perfectly with the Ambari default for yarn.nodemanager.linux-container-executor.cgroups.mount-path ("/cgroup")
... View more
11-10-2018
12:35 AM
I have been able to run Dockerized YARN services on a kerberized HDP 3.0.1 cluster using the following service configuration. However, this requires a service principal to be created for every node in the cluster in the format user1/hostname@EXAMPLE.COM. Additionally, the keytab for each of these principals must be distributed to their respective hosts. Is there a way around this? {
"name": "hello-world",
"version": "1.0.0",
"description": "hello world example",
"components" :
[
{
"name": "hello",
"number_of_containers": 5,
"artifact": {
"id": "library/redis",
"type": "DOCKER"
},
"launch_command": "",
"resource": {
"cpus": 1,
"memory": "256"
},
"configuration": {
"env": {
"YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE": "true"
}
}
}
],
"kerberos_principal": {
"principal_name": "user1/_HOST@EXAMPLE.COM",
"keytab": "file:///etc/security/keytabs/user1.keytab"
}
} If I leave out the "kerberos_principal" section completely, I receive this error at service submission: {"diagnostics":"Kerberos principal or keytab is missing."} If I use a principal without the "_HOST" portion, I receive this error at service submission: {"diagnostics":"Kerberos principal (user1@EXAMPLE.COM) does not contain a hostname."} If the keytab does not exist on the worker node, I receive this error in the application log: org.apache.hadoop.service.ServiceStateException: java.io.IOException:
SASL is configured for registry, but neither keytab/principal nor
java.security.auth.login.config system property are specified
... View more
Labels:
- Labels:
-
Apache YARN
-
Docker
11-08-2018
08:32 PM
Comma separating the launch_command fields and setting YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE=true in the service definition rather than yarn-site.env allowed me to use my custom entrypoint as expected in HDP 3.0.1. Strangely, exporting YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE=true in the yarn-site.env worked fine in my Apache Hadoop 3.1.1 environment, but not in my Ambari-installed HDP 3.0.1 environment. The stdout and stderr redirects are not included in the docker run command in the Apache release. Must be some other setting involved, but I am past my issue, so I will leave it here. Thanks, @Tarun Parimi!
... View more
11-05-2018
08:29 PM
@Tarun Parimi Thanks for the tip. I set "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE" in the yarn-env.sh, because I did not know I could set it on the service itself. This is a better solution, but does not address the main issue. Unfortunately, comma separating the input parameters did not help either. Here is the output: Docker run command: /usr/bin/docker run --name=container_e06_1541194419811_0015_01_000002 --user=1015:1015 --net=yarnnetwork -v /hadoop/yarn/local/filecache:/hadoop/yarn/local/filecache:ro -v /hadoop/yarn/local/usercache/shjelmfelt/filecache:/hadoop/yarn/local/usercache/admin/filecache:ro -v /hadoop/yarn/log/application_1541194419811_0015/container_e06_1541194419811_0015_01_000002:/hadoop/yarn/log/application_1541194419811_0015/container_e06_1541194419811_0015_01_000002 -v /hadoop/yarn/local/usercache/admin/appcache/application_1541194419811_0015:/hadoop/yarn/local/usercache/admin/appcache/application_1541194419811_0015 --cgroup-parent=/hadoop-yarn/container_e06_1541194419811_0015_01_000002 --cap-drop=ALL --cap-add=SYS_CHROOT --cap-add=MKNOD --cap-add=SETFCAP --cap-add=SETPCAP --cap-add=DAC_READ_SEARCH --cap-add=FSETID --cap-add=SYS_PTRACE --cap-add=CHOWN --cap-add=SYS_ADMIN --cap-add=AUDIT_WRITE --cap-add=SETGID --cap-add=NET_RAW --cap-add=FOWNER --cap-add=SETUID --cap-add=DAC_OVERRIDE --cap-add=KILL --cap-add=NET_BIND_SERVICE --hostname=myappcontainers-0.myapp.admin.EXAMPLE.COM --group-add 1015 --env-file /hadoop/yarn/local/nmPrivate/application_1541194419811_0015/container_e06_1541194419811_0015_01_000002/docker.container_e06_1541194419811_0015_01_0000022942289027318724111.env 172.26.224.119:5000/myapp:1.0-SNAPSHOT input1 input2 1>/hadoop/yarn/log/application_1541194419811_0015/container_e06_1541194419811_0015_01_000002/stdout.txt 2>/hadoop/yarn/log/application_1541194419811_0015/container_e06_1541194419811_0015_01_000002/stderr.txt
Received input: input1,input2 1>/hadoop/yarn/log/application_1541194419811_0015/container_e06_1541194419811_0015_01_000002/stdout.txt 2>/hadoop/yarn/log/application_1541194419811_0015/container_e06_1541194419811_0015_01_000002/stderr.txt
... View more