About shjelmfelt

shjelmfelt · ‎05-08-2020

In Part 1, I shared an alternative architecture for Prometheus that increases scalability and flexibility of time series metric monitoring. In part 2, I will walk through an extension of the architecture that unites metric and log data with a unified, scalable data pipeline. Part 1 Architecture Part 2 Architecture By adding log collection agents (e.g. MiNiFi) and a search cluster (e.g. Solr), the solution architecture can be extended to support log data in addition to time series metrics. This has the advantage of reducing duplicated infrastructure components for a more efficient and supportable solution. Additional NiFi processors can be added to the flow for pre-processing (e.g. filtering, routing, scoring) the incoming data (e.g. ERROR vs INFO messages). Rulesets (e.g. Drools) from expert systems can be embedded directly into the flow, while ML models can be either directly embedded or hosted as a service that NiFi calls. Further downstream, Flink can be used to apply stateful stream processing (e.g. joins, windowing). By applying these advanced analytics to metrics and logs in-stream, before the data lands, operations teams can shift from digging through charts and graphs to acting on intelligent, targeted alerts with the full context necessary to resolve any issue that may arise. The journey to streaming analytics with ML and expert systems requires rethinking architectures, but the value gained from the timely insights that otherwise would not be possible is well worth the upfront refactoring and results in a much more stable and efficient system in the long run.

shjelmfelt · ‎11-27-2019

Traditional Prometheus Architecture (Image Courtesy: https://prometheus.io/docs/introduction/overview/) Prometheus is great. It has a huge number of integrations and provides a great metric monitoring platform, especially when working with Kubernetes. However, it does have a few shortcomings. The aim of this alternative architecture is to preserve the best parts of Prometheus while augmenting its weaker points with more powerful technologies. The service discovery and metric scraping framework in Prometheus is its greatest strength, but it is greatly limited by its tight coupling to the TSDB system inside Prometheus. While it is possible to replace the TSDB inside Prometheus with an external database, the data retrieval process only supports writing into this one database. Maui Architecture The greatest strength of Prometheus, the service discovery and metric scraping framework, can now be used within Apache NiFi with the introduction of the GetPrometheusMetrics processor. This processor uses cgo and JNI to leverage the actual Prometheus libraries for service discovery and metric scraping. The standard Prometheus YML configurations are provided to the processor and JSON data is output as it scrapes metrics from configured and/or discovered endpoints. When combined with NiFi’s HTTP listener processors, the entire data ingestion portion of Prometheus can be embedded within NiFi. The advantage of NiFi for data ingestion is that it comes with a rich set of processors for transforming, filtering, routing, and publishing data, potentially to many different places. The ability to load data into the data store (or data stores) of choice increases extensibility and enables more advanced analytics. One good option for the datastore is Apache Druid. Druid was built for both real-time and historical analytics at scale (ingest of millions of events per second plus petabytes of history). It is supported by many dashboarding tools natively (such as Grafana or Superset), and it supports SQL through JDBC, making it accessible from a wide array of tools (such as Tableau). Druid addresses the scalability issues of the built-in TSDB while still providing a similar user experience and increasing extensibility to more user interfaces. The option of sending scraped data to many locations provides an easy way to integrate with other monitoring frameworks, or to perform advanced analytics and machine learning. For example, loading metrics into Kafka makes it accessible in real-time to stream processing engines (like Apache Flink), function as a service engines (like OpenWhisk), and custom microservices. With this architecture it is now possible to apply ML to Prometheus-scaped metrics in real-time and to activate functions when anomalies are found. Part 2 of this article can be found here. Artifacts The GetPrometheusMetrics processor can be found in this repository: https://github.com/SamHjelmfelt/nifi-prometheus-metrics A sample NiFi template using GetPrometheusMetrics to write into both Druid and Kafka can be found here: https://gist.github.com/SamHjelmfelt/f04aae5489fa88bdedd4bba211d083e0

shjelmfelt · ‎01-09-2019

Docker on YARN is relatively easy to set up on an existing cluster, but full clusters are not always available. My Ember Project was created to simplify dev/test for technologies managed by Ambari and Cloudera Manager. It provides utilities for creating dockerized clusters that use far fewer resources than a full bare metal or VM-based cluster. Additionally, by using pre-built images, the time it takes to get a cluster up and running can be reduced to less than 10 minutes. The following four commands are all that is necessary to download and run a ~5GB image that comes preinstalled with Ambari, Zookeeper, HDFS, and YARN with Docker on YARN pre-configured. Docker containers spawned by YARN will be created on the host machine as peers to the container with YARN inside. All containers are launched into the "ember" docker network by default. Once the container is downloaded, it takes less than 5 minutes to start all services. curl -L https://github.com/SamHjelmfelt/Ember/archive/v1.1.zip -o Ember_1.1.zip unzip Ember_1.1.zip cd Ember-1.1/ ./ember.sh createFromPrebuiltSample samples/yarnquickstart/yarnquickstart-sample.ini The Ambari UI can be found at http://localhost:8080 The YARN Resource Manager UI can be found at http://localhost:8088 Usage The YARN service REST API documentation can be found here: https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html The YARN app CLI documentation can be found here: https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#application_or_app Testing Place the following service definition into a file (e.g. redis.json) { "name": "redis-service", "version": "1.0.0", "description": "redis example", "components" : [ { "name": "redis", "number_of_containers": 1, "artifact": { "id": "library/redis", "type": "DOCKER" }, "launch_command": "", "resource": { "cpus": 1, "memory": "256" }, "configuration": { "env": { "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE": "true" } } } ] } Submit the service with the following curl command. YARN should respond back with the applicationId curl -X POST -H "Content-Type: application/json" http://localhost:8088/app/v1/services?user.name=ambari-qa -d @redis.json The service status can be viewed on the YARN UI or through the REST API (python makes it easier to read): curl http://localhost:8088/app/v1/services/redis-service?user.name=ambari-qa | python -m json.tool The service name must be unique in the cluster. If you need to delete your service, the following command can be used: curl -X DELETE http://localhost:8088/app/v1/services/redis-service?user.name=ambari-qa

shjelmfelt · ‎11-10-2018

Update: See here for a Docker on YARN sandbox solution: https://community.hortonworks.com/articles/232540/docker-on-yarn-sandbox.html Overview This guide has been tested with and without Kerberos on HDP 3.0.1. YARN offers a DNS service backed by Zookeeper for service discovery, but that can be challenging to setup. For a quickstart scenario, I will use docker swarm and an overlay network instead. If your environment is a single host, the networking is even simpler. This configuration is not recommended for production. I will use pssh to run commands in parallel across the cluster based on a hostlist file and a workerlist file. The hostlist file should contain every host in the cluster, and the workerlist file should include every node except for the one chosen to be the docker swarm master node. Prerequisites Install HDP 3.0.1 with or without Kerberos Install Docker on every host in the cluster #pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem -o 'StrictHostKeyChecking no'" "echo hostname" pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo yum install -y yum-utils device-mapper-persistent-data lvm2" pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo"; pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo yum install -y docker-ce" pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo systemctl start docker" pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo systemctl enable docker" Configure docker swarm and create an overlay network ssh -i ~/cloudbreak.pem cloudbreak@<masternode> "sudo docker swarm init" pssh -i -h workerlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo <output from last command: docker swarm join ...>" ssh -i ~/cloudbreak.pem cloudbreak@<masternode> "sudo docker network create -d overlay --attachable yarnnetwork" If Kerberos is not enabled, create a default user for containers: pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo useradd dockeruser" Ambari In the YARN general settings tab, toggle the Docker Runtime button to "Enabled". This should change the following setting in Advanced YARN-Site: yarn.nodemanager.container-executor.class=org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor In Advanced YARN-Site, change the following, so all YARN docker containers use the overlay network we created by default: yarn.nodemanager.runtime.linux.docker.default-container-network=yarnnetwork yarn.nodemanager.runtime.linux.docker.allowed-container-networks=host,none,bridge,yarnnetwork In Custom YARN-Site, add the following if kerberos is not enabled: yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user=dockeruser In Advanced Container Executor: Note that this allows any image from docker hub to be run. to limit the that docker images that can be run, set this property to a comma separated list of trusted registries. Docker images have the form <registry>/<imageName>:<tag>. docker_trusted_registries=* Alternatively, the following Ambari blueprint encapsulates these configurations: { "configurations" : [ { "yarn-site" : { "properties" : { "yarn.nodemanager.container-executor.class" : "org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor", "yarn.nodemanager.runtime.linux.docker.default-container-network" : "yarnnetwork", "yarn.nodemanager.runtime.linux.docker.allowed-container-networks" : "host,none,bridge,yarnnetwork", "yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user" : "dockeruser" } } }, { "container-executor" : { "properties" : { "docker_trusted_registries" : "library", "docker_module_enabled" : "true" } } }], "host_groups" : [ { "name" : "all", "components" : [ {"name" : "HISTORYSERVER"}, {"name" : "NAMENODE"}, {"name" : "APP_TIMELINE_SERVER"}, {"name" : "NODEMANAGER"}, {"name" : "DATANODE"}, {"name" : "RESOURCEMANAGER"}, {"name" : "ZOOKEEPER_SERVER"}, {"name" : "SECONDARY_NAMENODE"}, {"name" : "HDFS_CLIENT"}, {"name" : "ZOOKEEPER_CLIENT"}, {"name" : "YARN_CLIENT"}, {"name" : "MAPREDUCE2_CLIENT"} ], "cardinality" : "1" } ], "Blueprints" : { "blueprint_name" : "yarn sample", "stack_name" : "HDP", "stack_version" : "3.0" } } Save the configurations and restart YARN Usage The YARN service REST API documentation can be found here: https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html The YARN app CLI documentation can be found here: https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#application_or_app Testing without Kerberos Place the following service definition into a file (e.g. yarnservice.json) { "name": "redis-service", "version": "1.0.0", "description": "redis example", "components" : [ { "name": "redis", "number_of_containers": 1, "artifact": { "id": "library/redis", "type": "DOCKER" }, "launch_command": "", "resource": { "cpus": 1, "memory": "256" }, "configuration": { "env": { "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE": "true" } } } ] } Submit the service with the following curl command. YARN should respond back with the applicationId The user will need write permission on their HDFS home directory(e.g. hdfs:/user/user1). ambari-qa has it by default. curl -X POST -H "Content-Type: application/json" http://<resource manager>:8088/app/v1/services?user.name=ambari-qa -d @yarnservice.json The service status can be viewed on the YARN UI, or through the REST APIs (python makes it easier to read): curl http://<resource manager>:8088/app/v1/services/redis-service?user.name=ambari-qa | python -m json.tool The service name must be unique in the cluster. If you need to delete your service, the following command can be used: curl -X DELETE http://<resource manager>:8088/app/v1/services/redis-service?user.name=ambari-qa Testing with Kerberos Create a kerberos principal of the format <username>/<hostname>@<realm> The hostname portion of the principal is required. Create a keytab for the principal and upload it to HDFS kadmin.local >addprinc user1/host1.example.com@EXAMPLE.COM ... >xst -k user1_host1.keytab user1/host1.example.com@EXAMPLE.COM ... >exit hadoop fs -put user1_host1.keytab hdfs:/user/user1/ hadoop fs -chown user1 hdfs:/user/user1/ Place the following service definition into a file (e.g. yarnservice.json) { "name": "redis-service", "version": "1.0.0", "description": "redis example", "components" : [ { "name": "redis", "number_of_containers": 1, "artifact": { "id": "library/redis", "type": "DOCKER" }, "launch_command": "", "resource": { "cpus": 1, "memory": "256" }, "configuration": { "env": { "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE": "true" } } } ], "kerberos_principal": { "principal_name": "user1/host1.example.com@EXAMPLE.COM", "keytab": "hdfs:/user/user1/user1_host1.keytab" } } Submit the service with the following curl command. YARN should respond back with the applicationId User1 will need permission to write into their HDFS home directory: (hdfs:/user/user1) curl --negotiate -u : -X POST -H "Content-Type: application/json" http://<resource manager>:8088/app/v1/services -d @yarnservice.json The service status can be viewed on the YARN UI, or through the REST APIs (python makes it easier to read): curl --negotiate -u : http://<resource manager>:8088/app/v1/services/redis-service | python -m json.tool The service name must be unique in the cluster. If you need to delete your service, the following command can be used: curl --negotiate -u : -X DELETE http://<resource manager>:8088/app/v1/services/redis-service Adding a local docker registry Each node in the cluster needs a way of downloading docker images when a service is run. It is possible to just use the public docker hub, but that is not always an option. Similar to creating a local repo for yum, a local registry can be created for Docker. Here is a quickstart that skips the security steps. In production, security best practices should be followed: On a master node, create an instance of the docker registry container. This will bind the registry to port 5000 on the host machine. docker run -d -p 5000:5000 --restart=always --name registry -v /mnt/registry:/var/lib/registry registry:2 Configure each machine to skip HTTPS checks: https://docs.docker.com/registry/insecure/. Here are commands for CentOS 7: pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo echo '{\"insecure-registries\": [\"<registryHost>:5000\"]}' | sudo tee --append /etc/docker/daemon.json" pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo systemctl restart docker" The YARN service configuration, docker_trusted_registries, needs to be set to star ( * ) or needs to have this local registry in its list (e.g. library,<registryHost>:5000). Restart YARN Testing Local Docker Registry Build, tag, and push an image to the registry docker build -t myImage:1 . docker tag myImage:1 <registryHost>:5000/myImage:1 docker push <registryHost>:5000/myImage:1 View image via REST curl <registryHost>:5000/v2/_catalog curl <registryHost>:5000/v2/_catalog/myImage/tags/list Download image to all hosts in cluster (only necessary to demonstrate connectivity. Docker and YARN do this automatically) pssh -i -h hostlist -l cloudbreak -x "-i ~/cloudbreak.pem" "sudo docker pull <registryHost>:5000/myImage:1" Now, when an image with this registry prefix (e.g. <registryHost>:5000/myImage:1) is used in a YARN service definition, YARN will use the image from this local registry instead of trying to pull from the default public location.

shjelmfelt · ‎11-10-2018

Yes, that is what I meant.

shjelmfelt · ‎11-10-2018

That worked. I just uploaded one of the keytabs into hdfs:/user/user1/user1_host1.keytab and updated the "kerberos_principal" section as follows. Is there a plan to remove the hostname requirement? Thanks, @Gour Saha! "kerberos_principal": { "principal_name": "user1/host1.example.com@EXAMPLE.COM", "keytab": "hdfs:/user/user1/user1_host1.keytab" }

shjelmfelt · ‎11-10-2018

Turns out, the guide that I was following was outdated. I tried this again on a different cluster and it worked perfectly with the Ambari default for yarn.nodemanager.linux-container-executor.cgroups.mount-path ("/cgroup")

shjelmfelt · ‎11-10-2018

I have been able to run Dockerized YARN services on a kerberized HDP 3.0.1 cluster using the following service configuration. However, this requires a service principal to be created for every node in the cluster in the format user1/hostname@EXAMPLE.COM. Additionally, the keytab for each of these principals must be distributed to their respective hosts. Is there a way around this? { "name": "hello-world", "version": "1.0.0", "description": "hello world example", "components" : [ { "name": "hello", "number_of_containers": 5, "artifact": { "id": "library/redis", "type": "DOCKER" }, "launch_command": "", "resource": { "cpus": 1, "memory": "256" }, "configuration": { "env": { "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE": "true" } } } ], "kerberos_principal": { "principal_name": "user1/_HOST@EXAMPLE.COM", "keytab": "file:///etc/security/keytabs/user1.keytab" } } If I leave out the "kerberos_principal" section completely, I receive this error at service submission: {"diagnostics":"Kerberos principal or keytab is missing."} If I use a principal without the "_HOST" portion, I receive this error at service submission: {"diagnostics":"Kerberos principal (user1@EXAMPLE.COM) does not contain a hostname."} If the keytab does not exist on the worker node, I receive this error in the application log: org.apache.hadoop.service.ServiceStateException: java.io.IOException: SASL is configured for registry, but neither keytab/principal nor java.security.auth.login.config system property are specified

shjelmfelt · ‎11-08-2018

Comma separating the launch_command fields and setting YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE=true in the service definition rather than yarn-site.env allowed me to use my custom entrypoint as expected in HDP 3.0.1. Strangely, exporting YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE=true in the yarn-site.env worked fine in my Apache Hadoop 3.1.1 environment, but not in my Ambari-installed HDP 3.0.1 environment. The stdout and stderr redirects are not included in the docker run command in the Apache release. Must be some other setting involved, but I am past my issue, so I will leave it here. Thanks, @Tarun Parimi!

shjelmfelt · ‎11-05-2018

@Tarun Parimi Thanks for the tip. I set "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE" in the yarn-env.sh, because I did not know I could set it on the service itself. This is a better solution, but does not address the main issue. Unfortunately, comma separating the input parameters did not help either. Here is the output: Docker run command: /usr/bin/docker run --name=container_e06_1541194419811_0015_01_000002 --user=1015:1015 --net=yarnnetwork -v /hadoop/yarn/local/filecache:/hadoop/yarn/local/filecache:ro -v /hadoop/yarn/local/usercache/shjelmfelt/filecache:/hadoop/yarn/local/usercache/admin/filecache:ro -v /hadoop/yarn/log/application_1541194419811_0015/container_e06_1541194419811_0015_01_000002:/hadoop/yarn/log/application_1541194419811_0015/container_e06_1541194419811_0015_01_000002 -v /hadoop/yarn/local/usercache/admin/appcache/application_1541194419811_0015:/hadoop/yarn/local/usercache/admin/appcache/application_1541194419811_0015 --cgroup-parent=/hadoop-yarn/container_e06_1541194419811_0015_01_000002 --cap-drop=ALL --cap-add=SYS_CHROOT --cap-add=MKNOD --cap-add=SETFCAP --cap-add=SETPCAP --cap-add=DAC_READ_SEARCH --cap-add=FSETID --cap-add=SYS_PTRACE --cap-add=CHOWN --cap-add=SYS_ADMIN --cap-add=AUDIT_WRITE --cap-add=SETGID --cap-add=NET_RAW --cap-add=FOWNER --cap-add=SETUID --cap-add=DAC_OVERRIDE --cap-add=KILL --cap-add=NET_BIND_SERVICE --hostname=myappcontainers-0.myapp.admin.EXAMPLE.COM --group-add 1015 --env-file /hadoop/yarn/local/nmPrivate/application_1541194419811_0015/container_e06_1541194419811_0015_01_000002/docker.container_e06_1541194419811_0015_01_0000022942289027318724111.env 172.26.224.119:5000/myapp:1.0-SNAPSHOT input1 input2 1>/hadoop/yarn/log/application_1541194419811_0015/container_e06_1541194419811_0015_01_000002/stdout.txt 2>/hadoop/yarn/log/application_1541194419811_0015/container_e06_1541194419811_0015_01_000002/stderr.txt Received input: input1,input2 1>/hadoop/yarn/log/application_1541194419811_0015/container_e06_1541194419811_0015_01_000002/stdout.txt 2>/hadoop/yarn/log/application_1541194419811_0015/container_e06_1541194419811_0015_01_000002/stderr.txt

Online	Offline
Last Visited	‎05-08-2020 12:55 PM

Member Since	‎02-12-2016 11:29 PM
Last Visited	‎05-08-2020 12:55 PM
Posts	33
Kudos received	37

Cloudera Community

Re: YARN Node Managers failing to start after enab...

Re: YARN Service REST API - GenericExceptionHandle...

Re: Hive query LIKE wildcard special character '$'...

Maui: An Alternative Architecture for Prometheus -...

Maui: An Alternative Architecture for Prometheus

Docker on YARN - Sandbox

Dockerized YARN Services - Quickstart

Re: Dockerized YARN services with Kerberos

Re: Dockerized YARN services with Kerberos

Re: YARN Node Managers failing to start after enab...

Dockerized YARN services with Kerberos

Re: Docker on YARN Incorrect Program Arguments

Re: Docker on YARN Incorrect Program Arguments