About Fawze

Fawze · ‎04-03-2017

Hi @neerjakhattar 1- removed all instances from spark gateway, deployed configuration and then added the servers to the gatway and deployed the configuration but this didn't help. 2- checked all the luster instances and see the new spark history server in all spark-default.conf and no servers has any other configuration. cat /etc/spark/conf/spark-defaults.conf | grep spark.yarn.historyServer.address spark.yarn.historyServer.address=http://avvr-ahc101.lpdomain.com:18088 3- the valve CM config is empty. Where these conf can be overwritten in some other places?

Fawze · ‎04-03-2017

This issue only can occur if spark-default.conf has the wrong config and getting overwritten from safety valve or it got changed manually from command line. which servers i should check it's spark-default.conf about the this config, i checked both of resource managers and the parameters is right, and the safetly valve is empty at cloudera manager. Maybe try deleting the role and add on a new host and do deploy client config and see what happens. deleting the spark role means that i need to delete the oozie and hive before and i think it's not the right solution for this case.

Fawze · ‎04-03-2017

Hi @wenjie, Try this: Maximum Number of Transfer Threads

Fawze · ‎04-02-2017

Thanks, Indeed in my case the memory I assigned to the executor was overrides by the memory passed in the workflow so the executors were running with 1 GB instead of 8GB. I fixed it by passing the memory in the workflow xml

Fawze · ‎04-01-2017

@TheKishore432 Hi where you able to solve the issue?

Fawze · ‎03-31-2017

Anyone who faced the same issue or have any insight how to solve this issue?

Fawze · ‎03-29-2017

My CM is 5.5.2 and CDH 5.5.4 and all is working fine.

Fawze · ‎03-21-2017

Hi, I have a MR job running with 30 reducers, one of the reducers when it reached specific percentage failed with the below error: I increased the reducer memory but with no success, i invetigating the data to find if specific key value has a lot of data and caused this. Mu point i don't get from the below error if it IO issue which might limited connections between nodes then i need to increase the ulimit or it's a memory issue and i increase to increase the memory more SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/liveperson/hadoop/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/liveperson/data/server_hdfs/data/disk6/yarn/nm/usercache/lereports/appcache/application_1486847749225_242069/filecache/10/job.jar/job.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Halting due to Out Of Memory Error... Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "DataStreamer for file /liveperson/data/server_live-engage-mr/output/1490104329046-bi_contribution_xsess/_temporary/1/_temporary/attempt_1486847749225_242069_r_000001_0/RAWDATA-b_default-RPT_FA_CONTRIBUTION_XSESSION-r-00001 block BP-1370881566-172.16.144.147-1434971434689:blk_1283227391_209512600" Mar 21, 2017 10:34:10 AM com.datastax.shaded.netty.channel.socket.nio.AbstractNioSelector WARNING: Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space at com.datastax.shaded.netty.buffer.HeapChannelBuffer.<init>(HeapChannelBuffer.java:42) at com.datastax.shaded.netty.buffer.BigEndianHeapChannelBuffer.<init>(BigEndianHeapChannelBuffer.java:34) at com.datastax.shaded.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134) at com.datastax.shaded.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68) at com.datastax.shaded.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48) at com.datastax.shaded.netty.channel.socket.nio.NioWorker.read(NioWorker.java:80) at com.datastax.shaded.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at com.datastax.shaded.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at com.datastax.shaded.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at com.datastax.shaded.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at com.datastax.shaded.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at com.datastax.shaded.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Mar 21, 2017 11:02:11 AM com.datastax.shaded.netty.channel.socket.nio.AbstractNioSelector WARNING: Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space Mar 21, 2017 11:08:50 AM com.datastax.shaded.netty.util.HashedWheelTimer WARNING: An exception was thrown by TimerTask. java.lang.OutOfMemoryError: Java heap space Halting due to Out Of Memory Error... Mar 21, 2017 11:11:10 AM com.datastax.shaded.netty.channel.socket.nio.AbstractNioSelector WARNING: Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space Mar 21, 2017 11:27:11 AM com.datastax.shaded.netty.channel.socket.nio.AbstractNioSelector WARNING: Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space Log Type: stdout Log Upload Time: Tue Mar 21 13:49:54 -0400 2017 Log Length: 0 Log Type: syslog Log Upload Time: Tue Mar 21 13:49:54 -0400 2017 Log Length: 3997692 Showing 4096 bytes of 3997692 total. Click here for the full log. enewer.renew(LeaseRenewer.java:423) at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448) at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71) at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304) at java.lang.Thread.run(Thread.java:745) Caused by: com.google.protobuf.ServiceException: java.lang.RuntimeException: unexpected checked exception at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:244) at com.sun.proxy.$Proxy13.renewLease(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571) ... 12 more Caused by: java.lang.RuntimeException: unexpected checked exception at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1059) at org.apache.hadoop.ipc.Client.call(Client.java:1445) at org.apache.hadoop.ipc.Client.call(Client.java:1403) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) ... 14 more Caused by: java.lang.OutOfMemoryError: Java heap space 2017-03-21 11:01:36,323 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "svpr-dhc024.lpdomain.com/172.16.144.172"; destination host is: "svpr-dhc012.lpdomain.com":45935; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at org.apache.hadoop.ipc.Client.call(Client.java:1470) at org.apache.hadoop.ipc.Client.call(Client.java:1403) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:243) at com.sun.proxy.$Proxy8.ping(Unknown Source) at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:782) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Couldn't set up IO streams at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:790) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1519) at org.apache.hadoop.ipc.Client.call(Client.java:1442) ... 5 more Caused by: java.lang.OutOfMemoryError: Java heap space 2017-03-21 11:06:36,820 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "svpr-dhc024.lpdomain.com/172.16.144.172"; destination host is: "svpr-dhc012.lpdomain.com":45935; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at org.apache.hadoop.ipc.Client.call(Client.java:1470) at org.apache.hadoop.ipc.Client.call(Client.java:1403) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:243) at com.sun.proxy.$Proxy8.ping(Unknown Source) at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:782) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Couldn't set up IO streams at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:790) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1519) at org.apache.hadoop.ipc.Client.call(Client.java:1442) ... 5 more Caused by: java.lang.OutOfMemoryError: Java heap space 2017-03-21 11:11:10,785 INFO [LeaseRenewer:lereports@VAProd] org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking renewLease of class ClientNamenodeProtocolTranslatorPB over svpr-mhc102.lpdomain.com/172.16.144.148:8020 after 1 fail over attempts. Trying to fail over after sleeping for 593ms. 2017-03-21 11:16:47,403 ERROR [LeaseRenewer:lereports@VAProd] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[LeaseRenewer:lereports@VAProd,5,main] threw an Error. Shutting down now... java.lang.OutOfMemoryError: Java heap space 2017-03-21 11:34:56,644 INFO [LeaseRenewer:lereports@VAProd] org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException

Fawze · ‎03-10-2017

Hi, Any help with this impala error is much appreciated: Memory limit exceeded Failed to pin block for fixed-length data needed for sorting. Reducing query concurrency or increasing the memory limit may help this query to complete successfully. the impala daemon memory is 16G on 20 servers Memory Limit Exceeded Query(6f4bb710145f8780:301965b0c0457a87) Limit: Consumption=2.18 GB Fragment 6f4bb710145f8780:301965b0c0457a88: Consumption=8.00 KB EXCHANGE_NODE (id=20): Consumption=0 DataStreamRecvr: Consumption=0 Block Manager: Limit=12.80 GB Consumption=2.17 GB Fragment 6f4bb710145f8780:301965b0c0457a9b: Consumption=10.53 MB SORT_NODE (id=11): Consumption=0 HASH_JOIN_NODE (id=10): Consumption=10.52 MB EXCHANGE_NODE (id=18): Consumption=0 DataStreamRecvr: Consumption=0 EXCHANGE_NODE (id=19): Consumption=0 DataStreamRecvr: Consumption=0 DataStreamSender: Consumption=200.00 B Fragment 6f4bb710145f8780:301965b0c0457b07: Consumption=138.62 MB HASH_JOIN_NODE (id=9): Consumption=138.58 MB EXCHANGE_NODE (id=14): Consumption=0 DataStreamRecvr: Consumption=0 EXCHANGE_NODE (id=15): Consumption=0 DataStreamRecvr: Consumption=0 DataStreamSender: Consumption=19.41 KB Fragment 6f4bb710145f8780:301965b0c0457b30: Consumption=56.02 MB ANALYTIC_EVAL_NODE (id=4): Consumption=0 SORT_NODE (id=3): Consumption=48.01 MB AGGREGATION_NODE (id=13): Consumption=8.00 MB EXCHANGE_NODE (id=12): Consumption=0 WARNING: The following tables are missing relevant table and/or column statistics. default.analytics_customerinfo_v4p_tbl Read 95.12 MB of data across network that was expected to be local. Block locality metadata for table 'default.analytics_customerinfo_v4p_tbl' may be stale. Consider running "INVALIDATE METADATA `default`.`analytics_customerinfo_v4p_tbl`". Read 95.12 MB of data across network that was expected to be local. Block locality metadata for table 'default.analytics_customerinfo_v4p_tbl' may be stale. Consider running "INVALIDATE METADATA `default`.`analytics_customerinfo_v4p_tbl`". Read 95.12 MB of data across network that was expected to be local. Block locality metadata for table 'default.analytics_customerinfo_v4p_tbl' may be stale. Consider running "INVALIDATE METADATA `default`.`analytics_customerinfo_v4p_tbl`". Read 95.12 MB of data across network that was expected to be local. Block locality metadata for table 'default.analytics_customerinfo_v4p_tbl' may be stale. Consider running "INVALIDATE METADATA `default`.`analytics_customerinfo_v4p_tbl`". Read 95.12 MB of data across network that was expected to be local. Block locality metadata for table 'default.analytics_customerinfo_v4p_tbl' may be stale. Consider running "INVALIDATE METADATA `default`.`analytics_customerinfo_v4p_tbl`". Read 95.12 MB of data across network that was expected to be local. Block locality metadata for table 'default.analytics_customerinfo_v4p_tbl' may be stale. Consider running "INVALIDATE METADATA `default`.`analytics_customerinfo_v4p_tbl`". Read 95.12 MB of data across network that was expected to be local. Block locality metadata for table 'default.analytics_customerinfo_v4p_tbl' may be stale. Consider running "INVALIDATE METADATA `default`.`analytics_customerinfo_v4p_tbl`". Read 95.12 MB of data across network that was expected to be local. Block locality metadata for table 'default.analytics_customerinfo_v4p_tbl' may be stale. Consider running "INVALIDATE METADATA `default`.`analytics_customerinfo_v4p_tbl`".

Fawze · ‎03-05-2017

When i checked the job/the query that occur prior to the alert on the JN, i found one hive query that runs on a data of 6 months and recreate the hive table from new, which resulted in a good percentage of edit logs, i contacted the query owner and he reduced the his running window from 6 months to 2 months which solve for us the issue.

Online	Offline
Last Visited	‎10-19-2023 10:11 PM

Member Since	‎01-25-2017 01:09 PM
Last Visited	‎10-19-2023 10:11 PM
Posts	396
Kudos received	27

Cloudera Community

Re: How to make Yarn deploy resources to new added...

Re: Upgrade to CDH 6.0.x from 5.15

Re: How to define concrete resource consumption fo...

Re: Excution of the following command gives warnin...

Re: Excution of the following command gives warnin...

Re: Wrong spark history redirection for finished j...

Re: Wrong spark history redirection for finished j...

Re: How to limit speed ?

Re: ERROR YarnScheduler: Lost executor 7 on host r...

Re: ERROR YarnScheduler: Lost executor 7 on host r...

Re: Wrong spark history redirection for finished j...

Re: How can I upgrade CDH 5.10 from 5.9.1 ?

MR jobs failed on IO and java heap size

Impala query failed on memory limit

Re: Intermittently one of the journal nodes get ou...