About shjelmfelt

shjelmfelt · ‎11-02-2018

When running a Dockerized YARN service, YARN is not providing the correct input arguments. The service is defined as follows. The entry point in the docker file is ["java", "-jar", "myapp.jar"]. For debugging, it outputs the incoming arguments and exits. { "name": "myapp", "version": "1.0.0", "description": "myapp", "components" : [ { "name": "myappcontainers", "number_of_containers": 1, "artifact": { "id": "myapp:1.0-SNAPSHOT", "type": "DOCKER" }, "launch_command": "input1 input2", "resource": { "cpus": 1, "memory": "256" } } ] } Here is the output from YARN: Launching docker container... Docker run command: /usr/bin/docker run --name=container_e06_1541194419811_0006_01_000026 --user=1015:1015 --net=yarnnetwork -v /hadoop/yarn/local/filecache:/hadoop/yarn/local/filecache:ro -v /hadoop/yarn/local/usercache/admin/filecache:/hadoop/yarn/local/usercache/admin/filecache:ro -v /hadoop/yarn/log/application_1541194419811_0006/container_e06_1541194419811_0006_01_000026:/hadoop/yarn/log/application_1541194419811_0006/container_e06_1541194419811_0006_01_000026 -v /hadoop/yarn/local/usercache/admin/appcache/application_1541194419811_0006:/hadoop/yarn/local/usercache/admin/appcache/application_1541194419811_0006 --cgroup-parent=/hadoop-yarn/container_e06_1541194419811_0006_01_000026 --cap-drop=ALL --cap-add=SYS_CHROOT --cap-add=MKNOD --cap-add=SETFCAP --cap-add=SETPCAP --cap-add=DAC_READ_SEARCH --cap-add=FSETID --cap-add=SYS_PTRACE --cap-add=CHOWN --cap-add=SYS_ADMIN --cap-add=AUDIT_WRITE --cap-add=SETGID --cap-add=NET_RAW --cap-add=FOWNER --cap-add=SETUID --cap-add=DAC_OVERRIDE --cap-add=KILL --cap-add=NET_BIND_SERVICE --hostname=myappcontainers-3.myapp.admin.EXAMPLE.COM --group-add 1015 --env-file /hadoop/yarn/local/nmPrivate/application_1541194419811_0006/container_e06_1541194419811_0006_01_000026/docker.container_e06_1541194419811_0006_01_0000264842430064377299975.env myapp:1.0-SNAPSHOT input1 input2 1>/hadoop/yarn/log/application_1541194419811_0006/container_e06_1541194419811_0006_01_000026/stdout.txt 2>/hadoop/yarn/log/application_1541194419811_0006/container_e06_1541194419811_0006_01_000026/stderr.txt Received input: input1 input2 1>/hadoop/yarn/log/application_1541194419811_0006/container_e06_1541194419811_0006_01_000026/stdout.txt 2>/hadoop/yarn/log/application_1541194419811_0006/container_e06_1541194419811_0006_01_000026/stderr.txt The program itself is given the redirection commands. Is there a way to disable this behavior? The only two workarounds I have identified are: Change the ENTRYPOINT in the dockerfile to be ["sh", "-c"] and the launch_command to "java -jar myjar.jar" Change the program to use or ignore the "1>" and "2>" inputs Both of these solutions require repackaging in a way that does not conform to Docker best practice.

shjelmfelt · ‎11-02-2018

I was able to work around this error by running: sudo mkdir /sys/fs/cgroup/blkio/hadoop-yarn sudo chown -R yarn:yarn /sys/fs/cgroup/blkio/hadoop-yarn I then received a very similar message for "/sys/fs/cgroup/memory/hadoop-yarn" and "/sys/fs/cgroup/cpu/hadoop-yarn". After creating these directories as well, the node managers came up. Here is the full work-around that was run on each node: sudo mkdir /sys/fs/cgroup/blkio/hadoop-yarn sudo chown -R yarn:yarn /sys/fs/cgroup/blkio/hadoop-yarn sudo mkdir /sys/fs/cgroup/memory/hadoop-yarn sudo chown -R yarn:yarn /sys/fs/cgroup/memory/hadoop-yarn sudo mkdir /sys/fs/cgroup/cpu/hadoop-yarn sudo chown -R yarn:yarn /sys/fs/cgroup/cpu/hadoop-yarn

shjelmfelt · ‎11-02-2018

I am receiving the following message from each node manager when attempting to start YARN after enabling docker. What is the root cause? 2018-11-02 18:28:50,974 INFO recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:checkVersion(1662)) - Loaded NM state version info 1.2 2018-11-02 18:28:51,174 INFO resources.ResourceHandlerModule (ResourceHandlerModule.java:initNetworkResourceHandler(182)) - Using traffic control bandwidth handler 2018-11-02 18:28:51,193 WARN resources.CGroupsBlkioResourceHandlerImpl (CGroupsBlkioResourceHandlerImpl.java:checkDiskScheduler(101)) - Device vda does not use the CFQ scheduler; disk isolation using CGroups will not work on this partition. 2018-11-02 18:28:51,199 INFO resources.CGroupsHandlerImpl (CGroupsHandlerImpl.java:initializePreMountedCGroupController(410)) - Initializing mounted controller blkio at /sys/fs/cgroup/blkio/hadoop-yarn 2018-11-02 18:28:51,199 INFO resources.CGroupsHandlerImpl (CGroupsHandlerImpl.java:initializePreMountedCGroupController(420)) - Yarn control group does not exist. Creating /sys/fs/cgroup/blkio/hadoop-yarn 2018-11-02 18:28:51,200 ERROR nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:init(323)) - Failed to bootstrap configured resource subsystems! org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: Unexpected: Cannot create yarn cgroup Subsystem:blkio Mount points:/proc/mounts User:yarn Path:/sys/fs/cgroup/blkio/hadoop-yarn at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializePreMountedCGroupController(CGroupsHandlerImpl.java:425) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializeCGroupController(CGroupsHandlerImpl.java:377) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsBlkioResourceHandlerImpl.bootstrap(CGroupsBlkioResourceHandlerImpl.java:123) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.bootstrap(ResourceHandlerChain.java:58) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:320) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:391) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013) 2018-11-02 18:28:51,205 INFO service.AbstractService (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state INITED org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:393) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013) Caused by: java.io.IOException: Failed to bootstrap configured resource subsystems! at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:324) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:391) ... 3 more 2018-11-02 18:28:51,207 ERROR nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(936)) - Error starting NodeManager org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:393) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013) Caused by: java.io.IOException: Failed to bootstrap configured resource subsystems! at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:324) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:391) ... 3 more

shjelmfelt · ‎10-30-2018

This error was resolved by explicitly specifying the content type as JSON: curl ... -H "Content-Type: application/json"

shjelmfelt · ‎10-30-2018

I receive the following generic error when attempting to POST a YARN service definition using the api: "/app/v1/services": 2018-10-27 09:33:24,440 WARN webapp.GenericExceptionHandler (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR javax.ws.rs.WebApplicationException at com.sun.jersey.server.impl.uri.rules.TerminatingRule.accept(TerminatingRule.java:66) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:89) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:178) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:98) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:748)

shjelmfelt · ‎10-04-2018

Most data movement use cases do not require a “shuffle phase” for redistributing FlowFiles across a NiFi cluster, but there are few cases where it is useful. For example: ListFile -> FetchFile ListHDFS -> FetchHDFS ListFTP -> FetchFTP GenerateTableFetch -> ExecuteSQL GetSQS -> FetchS3 In each case, the flow starts with a processor that generates tasks to run (e.g. filenames) followed by the actual execution of those tasks. To scale, tasks need to run on each node in the NiFi cluster, but for consistency, the task generation should only run on the primary node. The solution is to introduce a shuffle (aka load balancing) step in between task generation and task execution. Processors can be configured to run on the primary node by going to “View Configuration”-> “Scheduling” and selecting “Primary node only” under “Execution”. The shuffle step is not an explicit component on the NiFi canvas, but rather the combination of a Remote Input Port and a Remote Process Group pointing at the local cluster. FlowFiles that are sent to the Remote Process Group will be load balanced over Site-to-Site and come back into the flow via the Remote Input Port. Under “Manage Remote Ports” on the Remote Process Group there are batch settings that help control the load balancing. Here are two example flows that use this design pattern:

shjelmfelt · ‎05-14-2018

A quick glance at NiFi’s 252+ processors shows that it can solve a wide array of use cases out of the box. What is not immediately obvious is the flexibility that its attributes and expression language can provide. This allows it to quickly, easily, and efficiently solve complex use cases that would require significant customization to solve in other solutions. For example, sending all of the incoming data to both Kafka and HDFS while sending 10% to a dev environment and a portion to a partner system based on the content of the data (e.g. CustomerName=ABC). These more complex routing scenarios are easily accommodated using UpdateAttribute, RouteOnAttribute, and RouteOnContent. Another example of NiFi’s flexibility is the ability to multiplex data flows. In traditional ETL systems, the schema is tightly coupled to the data as it moves between systems, because transformations occur in transit. In more modern ELT scenarios, the data is often loaded into the destination with minimal transformations before the complex transformation step is kicked off. This has many advantages and allows NiFi to focus on the EL portion of the flow. When focused on EL, there is far less of a need for the movement engine to be schema aware since it is general focused on simple routing, filtering, format translation, and concatenation. One common scenario is when loading data from many Kafka topics into their respective HDFS directories and/or Hive tables with only simple transformations. In traditional systems, this would require one flow per topic, but by parameterizing flows, one flow can be used for all topics. In the image below you can see the configurations and attributes that make this possible. The ConsumeKafka processor can use a list of topics or a regular expression to consume from many topics at once. Each FlowFile (e.g. batch of Kafka messages) has an attribute added called "kafka.topic" to identify its source topic. Next, in order to load streaming data into HDFS or Hive, it is recommended to use MergeContent to combine records into large files (e.g. every 1GB or every 15 minutes). In MergeContent, setting the “correlation attribute” configuration to “kafka.topic” ensures that only records from the same kafka topic are combined (similar to a group-by clause). After the files are merged, the “directory” configuration in HDFS can be parameterized (e.g. /myDir/${kafka.topic}) in order to load the data into the correct directory based on the kafka topic name. Note that this diagram includes a retry and notify on failure process group. This type of solution is highly recommended for production flows. More information can be found here. This example could easily be extended to include file format translation (e.g. ConverAvroToORC), filtering (e.g. RouteOnContent), kafka-topic to HDFS-directory mapping (e.g. UpdateAttribute). It can even trigger downstream processing (e.g. ExecuteSparkInteractive, PutHiveQL, ExecuteStreamCommand, etc.) or periodically update metrics and logging solutions such as Graphite, Druid, or Solr. Of course, this solution also applies to many more data stores than just Kafka and HDFS. Overall, parameterizing flows in NiFi for multiplexing can reduce complexity for EL use cases and simplify administration. This design is straightforward to implement and uses core NiFi features. It is also easily extended to a variety of use cases.

shjelmfelt · ‎02-27-2018

Hi Mitthu, Here is an article I wrote about handling failures in NiFi: https://community.hortonworks.com/articles/76598/nifi-error-handling-design-pattern-1.html It describes how to retry failures X times, then send an email, then wait for administrative input. This might help you address the requirements of your solution. You could also add a PutEmail processor on the "Success" relationship to send an email after processing succeeds.

shjelmfelt · ‎01-05-2017

Many process groups have a success and failure output relationship. A common question is how to best handle these failures. For invalid data, it makes sense to output the flow files to an HDFS directory for analysis, but not when failure was caused by an external dependency (e.g. HDFS, Kafka, FTP). A simple solution might be to loop the failures back to retry, but then it may fail repeatedly without notifying an administrator. A better solution would be to retry three times, then, if it still has not succeeded, an administrator should be notified and the flow file should wait before trying again. This gives the administrator time to resolve the issue and the ability to quickly and easily retry the flow files. Below (and attached) is a simple process group that implements this logic. The failed flow files come in through the input port. The UpdateAttribute processor sets the retryCount attribute to one or increments it if it has already been set. The RouteOnAttribute processor determines whether the retryCount attribute is over a threshold (e.g. three). If it is not over the threshold, the flow file is routed out through the retry port. If it is over the threshold, the flow file is routed to a PutEmail processor. The last UpdateAttribute processor should be disabled at all times so that the flowfiles will queue up after the PutEmail processor to wait for the administrator to resolve the issue. Once the issue is resolved, the administrator simply enables, starts, stops, and disables this last processor. The retryCount attribute will be set to zero and the flow file will go out through the retry port. If the flow file still does not succeed, it will go back into this process group and the administrator will get another email. Note that a merge content processor could be used to reduce the number of emails, if necessary.

shjelmfelt · ‎11-08-2016

You will have to option to select which services you want to install, similar to HDP. You can select only Zookeeper and NiFi, but I would recommend LogSearch, Ambari Metrics, and Ranger as they really augment the solution.

Online	Offline
Last Visited	‎05-08-2020 12:55 PM

Member Since	‎02-12-2016 11:29 PM
Last Visited	‎05-08-2020 12:55 PM
Posts	33
Kudos received	37

Cloudera Community

Re: YARN Node Managers failing to start after enab...

Re: YARN Service REST API - GenericExceptionHandle...

Re: Hive query LIKE wildcard special character '$'...

Docker on YARN Incorrect Program Arguments

Re: YARN Node Managers failing to start after enab...

YARN Node Managers failing to start after enabling...

Re: YARN Service REST API - GenericExceptionHandle...

YARN Service REST API - GenericExceptionHandler We...

NiFi Shuffle - Design Pattern

NiFi Multiplexing - Design Pattern

Re: How to handle error in NIFI flow and send noti...

NiFi Error Handling - Design Pattern

Re: HDF 2.0 are all MPack services required?