<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Spark's faill durring persist() in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-s-faill-durring-persist/m-p/29416#M6521</link>
    <description>&lt;P&gt;Hi dear experts!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;i running on my spark cluster (yarn-client mode) follow simple test script:&lt;/P&gt;&lt;PRE&gt;import org.apache.spark.storage.StorageLevel
val input = sc.textFile("/user/hive/warehouse/tpc_ds_3T/...");
val result = input.coalesce(600).persist(StorageLevel.MEMORY_AND_DISK_SER)
result.count()&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;RDD much higher then memory, but i specify disk option.&lt;/P&gt;&lt;P&gt;in some time i start observing warnings like this:&lt;/P&gt;&lt;PRE&gt;15/07/08 23:20:38 WARN TaskSetManager: Lost task 33.1 in stage 0.0 (TID 104, host4: ExecutorLostFailure (executor 15 lost)&lt;/PRE&gt;&lt;P&gt;and finaly i got:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;15/07/08 23:14:41 INFO BlockManagerMasterActor: Registering block manager SomeHost2:16768 with 2.8 GB RAM, BlockManagerId(58, SomeHost2, 16768)
15/07/08 23:14:43 WARN TaskSetManager: Lost task 41.2 in stage 0.0 (TID 208, scaj43bda03.us.oracle.com): java.io.IOException: Failed to connect to SomeHost2/192.168.42.92:37305
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
        at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
        at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
        at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
        at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: SomeHost2/192.168.42.92:37305
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
        ... 1 more&lt;/PRE&gt;&lt;P&gt;honestly i don't know where i can start my debuging...&lt;/P&gt;&lt;P&gt;will appreciate any advice!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks!&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 09:33:43 GMT</pubDate>
    <dc:creator>fil</dc:creator>
    <dc:date>2022-09-16T09:33:43Z</dc:date>
    <item>
      <title>Spark's faill durring persist()</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-s-faill-durring-persist/m-p/29416#M6521</link>
      <description>&lt;P&gt;Hi dear experts!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;i running on my spark cluster (yarn-client mode) follow simple test script:&lt;/P&gt;&lt;PRE&gt;import org.apache.spark.storage.StorageLevel
val input = sc.textFile("/user/hive/warehouse/tpc_ds_3T/...");
val result = input.coalesce(600).persist(StorageLevel.MEMORY_AND_DISK_SER)
result.count()&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;RDD much higher then memory, but i specify disk option.&lt;/P&gt;&lt;P&gt;in some time i start observing warnings like this:&lt;/P&gt;&lt;PRE&gt;15/07/08 23:20:38 WARN TaskSetManager: Lost task 33.1 in stage 0.0 (TID 104, host4: ExecutorLostFailure (executor 15 lost)&lt;/PRE&gt;&lt;P&gt;and finaly i got:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;15/07/08 23:14:41 INFO BlockManagerMasterActor: Registering block manager SomeHost2:16768 with 2.8 GB RAM, BlockManagerId(58, SomeHost2, 16768)
15/07/08 23:14:43 WARN TaskSetManager: Lost task 41.2 in stage 0.0 (TID 208, scaj43bda03.us.oracle.com): java.io.IOException: Failed to connect to SomeHost2/192.168.42.92:37305
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
        at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
        at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
        at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
        at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: SomeHost2/192.168.42.92:37305
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
        ... 1 more&lt;/PRE&gt;&lt;P&gt;honestly i don't know where i can start my debuging...&lt;/P&gt;&lt;P&gt;will appreciate any advice!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks!&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:33:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-s-faill-durring-persist/m-p/29416#M6521</guid>
      <dc:creator>fil</dc:creator>
      <dc:date>2022-09-16T09:33:43Z</dc:date>
    </item>
    <item>
      <title>Re: Spark's faill durring persist()</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-s-faill-durring-persist/m-p/29417#M6522</link>
      <description>&lt;P&gt;I found something in the YARN logs:&lt;/P&gt;&lt;PRE&gt;15/07/08 23:24:28 WARN spark.CacheManager: Persisting partition rdd_4_174 to disk instead.
15/07/08 23:24:29 INFO executor.Executor: Executor is trying to kill task 170.0 in stage 0.0 (TID 235)
15/07/08 23:24:29 INFO executor.Executor: Executor is trying to kill task 171.0 in stage 0.0 (TID 236)
15/07/08 23:24:29 INFO executor.Executor: Executor is trying to kill task 173.0 in stage 0.0 (TID 238)
15/07/08 23:24:29 INFO executor.Executor: Executor is trying to kill task 174.0 in stage 0.0 (TID 239)
15/07/08 23:24:29 WARN storage.BlockManager: Putting block rdd_4_174 failed
15/07/08 23:24:29 INFO executor.Executor: Executor killed task 174.0 in stage 0.0 (TID 239)
15/07/08 23:24:29 INFO executor.Executor: Executor killed task 173.0 in stage 0.0 (TID 238)
15/07/08 23:24:30 INFO storage.MemoryStore: ensureFreeSpace(255696059) called with curMem=418412, maxMem=2222739947
15/07/08 23:24:30 INFO storage.MemoryStore: Block rdd_4_170 stored as bytes in memory (estimated size 243.9 MB, free 1875.5 MB)
15/07/08 23:24:30 INFO storage.BlockManagerMaster: Updated info of block rdd_4_170
15/07/08 23:24:30 INFO executor.Executor: Executor killed task 170.0 in stage 0.0 (TID 235)
15/07/08 23:24:30 INFO storage.MemoryStore: ensureFreeSpace(255621319) called with curMem=256114471, maxMem=2222739947
15/07/08 23:24:30 INFO storage.MemoryStore: Block rdd_4_171 stored as bytes in memory (estimated size 243.8 MB, free 1631.7 MB)
15/07/08 23:24:30 INFO storage.BlockManagerMaster: Updated info of block rdd_4_171
15/07/08 23:24:30 INFO executor.Executor: Executor killed task 171.0 in stage 0.0 (TID 236)&lt;BR /&gt;&lt;BR /&gt;&lt;/PRE&gt;&lt;P&gt;But i still have no idea why executor started kill tasks...&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jul 2015 03:44:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-s-faill-durring-persist/m-p/29417#M6522</guid>
      <dc:creator>fil</dc:creator>
      <dc:date>2015-07-09T03:44:23Z</dc:date>
    </item>
    <item>
      <title>Re: Spark's faill durring persist()</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-s-faill-durring-persist/m-p/29599#M6523</link>
      <description>t's possible that you are overwhelming the CPU on the hosts by using StorageLevel.MEMORY_AND_DISK_SER as this is a CPU intensive storage strategy:

&lt;A href="https://spark.apache.org/docs/1.3.0/programming-guide.html#rdd-persistence" target="_blank"&gt;https://spark.apache.org/docs/1.3.0/programming-guide.html#rdd-persistence&lt;/A&gt;

Are you able to use deserialized objects instead? Using StorageLevel.MEMORY_AND_DISK will be less CPU intensive.</description>
      <pubDate>Wed, 15 Jul 2015 12:50:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-s-faill-durring-persist/m-p/29599#M6523</guid>
      <dc:creator>dcote</dc:creator>
      <dc:date>2015-07-15T12:50:03Z</dc:date>
    </item>
    <item>
      <title>Re: Spark's faill durring persist()</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-s-faill-durring-persist/m-p/29640#M6524</link>
      <description>&lt;P&gt;Actually problem was in very agressive caching and overfilling&amp;nbsp;&lt;STRONG&gt;spark.yarn.executor.memoryOverhead&lt;/STRONG&gt; buffer and as cosequence OOM error.&lt;/P&gt;&lt;P&gt;i just increase it and everything works now&lt;/P&gt;</description>
      <pubDate>Thu, 16 Jul 2015 01:32:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-s-faill-durring-persist/m-p/29640#M6524</guid>
      <dc:creator>fil</dc:creator>
      <dc:date>2015-07-16T01:32:02Z</dc:date>
    </item>
  </channel>
</rss>

