Member since
02-12-2016
33
Posts
44
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5829 | 11-10-2018 12:59 AM | |
2166 | 10-30-2018 12:47 AM | |
3504 | 04-25-2016 08:06 PM |
11-02-2018
11:11 PM
When running a Dockerized YARN service, YARN is not providing the correct input arguments. The service is defined as follows. The entry point in the docker file is ["java", "-jar", "myapp.jar"]. For debugging, it outputs the incoming arguments and exits. {
"name": "myapp",
"version": "1.0.0",
"description": "myapp",
"components" :
[
{
"name": "myappcontainers",
"number_of_containers": 1,
"artifact": {
"id": "myapp:1.0-SNAPSHOT",
"type": "DOCKER"
},
"launch_command": "input1 input2",
"resource": {
"cpus": 1,
"memory": "256"
}
}
]
} Here is the output from YARN: Launching docker container...
Docker run command: /usr/bin/docker run --name=container_e06_1541194419811_0006_01_000026 --user=1015:1015 --net=yarnnetwork -v /hadoop/yarn/local/filecache:/hadoop/yarn/local/filecache:ro -v /hadoop/yarn/local/usercache/admin/filecache:/hadoop/yarn/local/usercache/admin/filecache:ro -v /hadoop/yarn/log/application_1541194419811_0006/container_e06_1541194419811_0006_01_000026:/hadoop/yarn/log/application_1541194419811_0006/container_e06_1541194419811_0006_01_000026 -v /hadoop/yarn/local/usercache/admin/appcache/application_1541194419811_0006:/hadoop/yarn/local/usercache/admin/appcache/application_1541194419811_0006 --cgroup-parent=/hadoop-yarn/container_e06_1541194419811_0006_01_000026 --cap-drop=ALL --cap-add=SYS_CHROOT --cap-add=MKNOD --cap-add=SETFCAP --cap-add=SETPCAP --cap-add=DAC_READ_SEARCH --cap-add=FSETID --cap-add=SYS_PTRACE --cap-add=CHOWN --cap-add=SYS_ADMIN --cap-add=AUDIT_WRITE --cap-add=SETGID --cap-add=NET_RAW --cap-add=FOWNER --cap-add=SETUID --cap-add=DAC_OVERRIDE --cap-add=KILL --cap-add=NET_BIND_SERVICE --hostname=myappcontainers-3.myapp.admin.EXAMPLE.COM --group-add 1015 --env-file /hadoop/yarn/local/nmPrivate/application_1541194419811_0006/container_e06_1541194419811_0006_01_000026/docker.container_e06_1541194419811_0006_01_0000264842430064377299975.env myapp:1.0-SNAPSHOT input1 input2 1>/hadoop/yarn/log/application_1541194419811_0006/container_e06_1541194419811_0006_01_000026/stdout.txt 2>/hadoop/yarn/log/application_1541194419811_0006/container_e06_1541194419811_0006_01_000026/stderr.txt
Received input: input1 input2 1>/hadoop/yarn/log/application_1541194419811_0006/container_e06_1541194419811_0006_01_000026/stdout.txt 2>/hadoop/yarn/log/application_1541194419811_0006/container_e06_1541194419811_0006_01_000026/stderr.txt The program itself is given the redirection commands. Is there a way to disable this behavior? The only two workarounds I have identified are:
Change the ENTRYPOINT in the dockerfile to be ["sh", "-c"] and the launch_command to "java -jar myjar.jar" Change the program to use or ignore the "1>" and "2>" inputs Both of these solutions require repackaging in a way that does not conform to Docker best practice.
... View more
Labels:
- Labels:
-
Apache YARN
-
Docker
11-02-2018
07:14 PM
1 Kudo
I was able to work around this error by running: sudo mkdir /sys/fs/cgroup/blkio/hadoop-yarn
sudo chown -R yarn:yarn /sys/fs/cgroup/blkio/hadoop-yarn I then received a very similar message for "/sys/fs/cgroup/memory/hadoop-yarn" and "/sys/fs/cgroup/cpu/hadoop-yarn". After creating these directories as well, the node managers came up. Here is the full work-around that was run on each node: sudo mkdir /sys/fs/cgroup/blkio/hadoop-yarn
sudo chown -R yarn:yarn /sys/fs/cgroup/blkio/hadoop-yarn
sudo mkdir /sys/fs/cgroup/memory/hadoop-yarn
sudo chown -R yarn:yarn /sys/fs/cgroup/memory/hadoop-yarn
sudo mkdir /sys/fs/cgroup/cpu/hadoop-yarn
sudo chown -R yarn:yarn /sys/fs/cgroup/cpu/hadoop-yarn
... View more
11-02-2018
07:10 PM
I am receiving the following message from each node manager when attempting to start YARN after enabling docker. What is the root cause? 2018-11-02 18:28:50,974 INFO recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:checkVersion(1662)) - Loaded NM state version info 1.2
2018-11-02 18:28:51,174 INFO resources.ResourceHandlerModule (ResourceHandlerModule.java:initNetworkResourceHandler(182)) - Using traffic control bandwidth handler
2018-11-02 18:28:51,193 WARN resources.CGroupsBlkioResourceHandlerImpl (CGroupsBlkioResourceHandlerImpl.java:checkDiskScheduler(101)) - Device vda does not use the CFQ scheduler; disk isolation using CGroups will not work on this partition.
2018-11-02 18:28:51,199 INFO resources.CGroupsHandlerImpl (CGroupsHandlerImpl.java:initializePreMountedCGroupController(410)) - Initializing mounted controller blkio at /sys/fs/cgroup/blkio/hadoop-yarn
2018-11-02 18:28:51,199 INFO resources.CGroupsHandlerImpl (CGroupsHandlerImpl.java:initializePreMountedCGroupController(420)) - Yarn control group does not exist. Creating /sys/fs/cgroup/blkio/hadoop-yarn
2018-11-02 18:28:51,200 ERROR nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:init(323)) - Failed to bootstrap configured resource subsystems!
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: Unexpected: Cannot create yarn cgroup Subsystem:blkio Mount points:/proc/mounts User:yarn Path:/sys/fs/cgroup/blkio/hadoop-yarn
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializePreMountedCGroupController(CGroupsHandlerImpl.java:425)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.initializeCGroupController(CGroupsHandlerImpl.java:377)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsBlkioResourceHandlerImpl.bootstrap(CGroupsBlkioResourceHandlerImpl.java:123)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.bootstrap(ResourceHandlerChain.java:58)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:320)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:391)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013)
2018-11-02 18:28:51,205 INFO service.AbstractService (AbstractService.java:noteFailure(267)) - Service NodeManager failed in state INITED
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:393)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013)
Caused by: java.io.IOException: Failed to bootstrap configured resource subsystems!
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:324)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:391)
... 3 more
2018-11-02 18:28:51,207 ERROR nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(936)) - Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:393)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:933)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1013)
Caused by: java.io.IOException: Failed to bootstrap configured resource subsystems!
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:324)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:391)
... 3 more
... View more
Labels:
- Labels:
-
Apache YARN
-
Docker
10-30-2018
12:47 AM
This error was resolved by explicitly specifying the content type as JSON: curl ... -H "Content-Type: application/json"
... View more
10-30-2018
12:39 AM
I receive the following generic error when attempting to POST a YARN service definition using the api: "/app/v1/services": 2018-10-27 09:33:24,440 WARN webapp.GenericExceptionHandler (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
javax.ws.rs.WebApplicationException
at com.sun.jersey.server.impl.uri.rules.TerminatingRule.accept(TerminatingRule.java:66)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:89)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
at org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:178)
at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:98)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)
... View more
Labels:
- Labels:
-
Apache YARN
-
Docker
10-04-2018
08:11 PM
2 Kudos
Most
data movement use cases do not require a “shuffle phase” for redistributing
FlowFiles across a NiFi cluster, but there are few cases where it is useful.
For example: ListFile
-> FetchFile ListHDFS
-> FetchHDFS ListFTP
-> FetchFTP GenerateTableFetch
-> ExecuteSQL GetSQS
-> FetchS3 In
each case, the flow starts with a processor that generates tasks to run (e.g.
filenames) followed by the actual execution of those tasks. To scale, tasks
need to run on each node in the NiFi cluster, but for consistency, the task
generation should only run on the primary node. The solution is to introduce a
shuffle (aka load balancing) step in between task generation and task
execution. Processors
can be configured to run on the primary node by going to “View
Configuration”-> “Scheduling” and selecting “Primary node only” under
“Execution”. The
shuffle step is not an explicit component on the NiFi canvas, but rather the
combination of a Remote Input Port and a Remote Process Group pointing at the
local cluster. FlowFiles that are sent to the Remote Process Group will be load
balanced over Site-to-Site and come back into the flow via the Remote Input
Port. Under “Manage Remote Ports” on the Remote Process Group there are batch
settings that help control the load balancing. Here
are two example flows that use this design pattern:
... View more
Labels:
05-14-2018
08:51 PM
7 Kudos
A quick glance at NiFi’s 252+ processors shows that it can
solve a wide array of use cases out of the box. What is not immediately obvious is the flexibility that its attributes and expression language
can provide. This allows it to quickly,
easily, and efficiently solve complex use cases that would require significant customization to solve in other solutions. For example, sending all of the incoming data to both Kafka
and HDFS while sending 10% to a dev environment and a portion to a partner
system based on the content of the data (e.g. CustomerName=ABC). These more
complex routing scenarios are easily accommodated using UpdateAttribute,
RouteOnAttribute, and RouteOnContent. Another example of NiFi’s flexibility is the ability to
multiplex data flows. In traditional ETL systems, the schema is tightly coupled
to the data as it moves between systems, because transformations occur in
transit. In more modern ELT scenarios, the data is often loaded into the
destination with minimal transformations before the complex transformation step
is kicked off. This has many advantages and allows NiFi to focus on the EL
portion of the flow. When focused on EL, there is far less of a need for the
movement engine to be schema aware since it is general focused on simple
routing, filtering, format translation, and concatenation. One common scenario
is when loading data from many Kafka topics into their respective HDFS
directories and/or Hive tables with only simple transformations. In traditional
systems, this would require one flow per topic, but by parameterizing flows, one flow can be used for all topics. In the image below you can see the configurations and
attributes that make this possible. The ConsumeKafka processor can use a list
of topics or a regular expression to consume from many topics at once. Each
FlowFile (e.g. batch of Kafka messages) has an attribute added called
"kafka.topic" to identify its source topic. Next, in order to load streaming data into HDFS or Hive, it
is recommended to use MergeContent to combine records into large files (e.g.
every 1GB or every 15 minutes). In MergeContent, setting the “correlation
attribute” configuration to “kafka.topic” ensures that only records from the
same kafka topic are combined (similar to a group-by clause). After the files
are merged, the “directory” configuration in HDFS can be parameterized (e.g.
/myDir/${kafka.topic}) in order to load the data into the correct directory
based on the kafka topic name. Note that this diagram includes a retry and notify on
failure process group. This type of solution is highly recommended for
production flows. More information can be found here. This example could easily be extended to include file format
translation (e.g. ConverAvroToORC), filtering (e.g. RouteOnContent),
kafka-topic to HDFS-directory mapping (e.g. UpdateAttribute). It can even
trigger downstream processing (e.g. ExecuteSparkInteractive, PutHiveQL,
ExecuteStreamCommand, etc.) or periodically update metrics and logging
solutions such as Graphite, Druid, or Solr. Of course, this solution also
applies to many more data stores than just Kafka and HDFS. Overall, parameterizing flows in NiFi for multiplexing can
reduce complexity for EL use cases and simplify administration. This design is
straightforward to implement and uses core NiFi features. It is also easily
extended to a variety of use cases.
... View more
Labels:
02-27-2018
08:25 PM
Hi Mitthu, Here is an article I wrote about handling failures in NiFi: https://community.hortonworks.com/articles/76598/nifi-error-handling-design-pattern-1.html It describes how to retry failures X times, then send an email, then wait for administrative input. This might help you address the requirements of your solution. You could also add a PutEmail processor on the "Success" relationship to send an email after processing succeeds.
... View more
01-05-2017
10:20 PM
9 Kudos
Many process groups have a success and failure output relationship. A common question is how to best handle these failures. For invalid data, it makes sense to output the flow files to an HDFS directory for analysis, but not when failure was caused by an external dependency (e.g. HDFS, Kafka, FTP). A simple solution might be to loop the failures back to retry, but then it may fail repeatedly without notifying an administrator. A better solution would be to retry three times, then, if it still has not succeeded, an administrator should be notified and the flow file should wait before trying again. This gives the administrator time to resolve the issue and the ability to quickly and easily retry the flow files. Below (and attached) is a simple process group that implements this logic. The failed flow files come in through the input port. The UpdateAttribute processor sets the retryCount attribute to one or increments it if it has already been set. The RouteOnAttribute processor determines whether the retryCount attribute is over a threshold (e.g. three). If it is not over the threshold, the flow file is routed out through the retry port. If it is over the threshold, the flow file is routed to a PutEmail processor. The last UpdateAttribute processor should be disabled at all times so that the flowfiles will queue up after the PutEmail processor to wait for the administrator to resolve the issue. Once the issue is resolved, the administrator simply enables, starts, stops, and disables this last processor. The retryCount attribute will be set to zero and the flow file will go out through the retry port. If the flow file still does not succeed, it will go back into this process group and the administrator will get another email. Note that a merge content processor could be used to reduce the number of emails, if necessary.
... View more
Labels:
11-08-2016
07:59 PM
1 Kudo
You will have to option to select which services you want to install, similar to HDP. You can select only Zookeeper and NiFi, but I would recommend LogSearch, Ambari Metrics, and Ranger as they really augment the solution.
... View more
- « Previous
-
- 1
- 2
- Next »