<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Kudu Tablet Server - Leak Memory ? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/80750#M12582</link>
    <description>Those are metrics for a write, not a scan. A scan RPC trace looks like&lt;BR /&gt;&lt;BR /&gt;{&lt;BR /&gt;"method_name": "kudu.tserver.ScanRequestPB",&lt;BR /&gt;"samples": [&lt;BR /&gt;{&lt;BR /&gt;"header": {&lt;BR /&gt;"call_id": 7,&lt;BR /&gt;"remote_method": {&lt;BR /&gt;"service_name": "kudu.tserver.TabletServerService",&lt;BR /&gt;"method_name": "Scan"&lt;BR /&gt;},&lt;BR /&gt;"timeout_millis": 29999&lt;BR /&gt;},&lt;BR /&gt;"trace": "1005 10:27:46.216542 (+ 0us) service_pool.cc:162] Inserting onto call queue\n1005 10:27:46.216573 (+ 31us) service_pool.cc:221] Handling call\n1005 10:27:46.216712 (+ 139us) tablet_service.cc:1796] Created scanner 9c3aaa87517f4832aa81ff0dc0d71284 for tablet 42483058124f48c685943bef52f3b625\n1005 10:27:46.216839 (+ 127us) tablet_service.cc:1872] Creating iterator\n1005 10:27:46.216874 (+ 35us) tablet_service.cc:2209] Waiting safe time to advance\n1005 10:27:46.216894 (+ 20us) tablet_service.cc:2217] Waiting for operations to commit\n1005 10:27:46.216917 (+ 23us) tablet_service.cc:2231] All operations in snapshot committed. Waited for 32 microseconds\n1005 10:27:46.216937 (+ 20us) tablet_service.cc:1902] Iterator created\n1005 10:27:46.217231 (+ 294us) tablet_service.cc:1916] Iterator init: OK\n1005 10:27:46.217250 (+ 19us) tablet_service.cc:1965] has_more: true\n1005 10:27:46.217258 (+ 8us) tablet_service.cc:1980] Continuing scan request\n1005 10:27:46.217291 (+ 33us) tablet_service.cc:2033] Found scanner 9c3aaa87517f4832aa81ff0dc0d71284 for tablet 42483058124f48c685943bef52f3b625\n1005 10:27:46.218143 (+ 852us) inbound_call.cc:162] Queueing success response\n",&lt;BR /&gt;"duration_ms": 1,&lt;BR /&gt;"metrics": [&lt;BR /&gt;{&lt;BR /&gt;"key": "rowset_iterators",&lt;BR /&gt;"value": 1&lt;BR /&gt;},&lt;BR /&gt;{&lt;BR /&gt;"key": "threads_started",&lt;BR /&gt;"value": 1&lt;BR /&gt;},&lt;BR /&gt;{&lt;BR /&gt;"key": "thread_start_us",&lt;BR /&gt;"value": 64&lt;BR /&gt;},&lt;BR /&gt;{&lt;BR /&gt;"key": "compiler_manager_pool.run_cpu_time_us",&lt;BR /&gt;"value": 117013&lt;BR /&gt;},&lt;BR /&gt;{&lt;BR /&gt;"key": "compiler_manager_pool.run_wall_time_us",&lt;BR /&gt;"value": 127378&lt;BR /&gt;},&lt;BR /&gt;{&lt;BR /&gt;"key": "compiler_manager_pool.queue_time_us",&lt;BR /&gt;"value": 114&lt;BR /&gt;}&lt;BR /&gt;]&lt;BR /&gt;}&lt;BR /&gt;]&lt;BR /&gt;},&lt;BR /&gt;&lt;BR /&gt;The profile is pointing to this server having a lot of data blocks. What is your workload like? Does it involve a lot of updates and deletes? How many tablet replicas are on this server?&lt;BR /&gt;&lt;BR /&gt;Attaching the output of the following commands will help investigate further. All should be run on the tablet server where you're seeing the memory problem:&lt;BR /&gt;&lt;BR /&gt;sudo -u kudu kudu fs check --fs_wal_dir=&amp;lt;wal dir&amp;gt; --fs_data_dirs=&amp;lt;data dirs&amp;gt;&lt;BR /&gt;&lt;BR /&gt;This should be fine to run while the server is running. You'll see a benign error message about not being able to acquire a lock and proceeding in read only mode.&lt;BR /&gt;&lt;BR /&gt;sudo -u kudu kudu local_replica data_size &amp;lt;tablet id&amp;gt; --fs_wal_dir=&amp;lt;wal dir&amp;gt; --fs_data_dirs=&amp;lt;data dirs&amp;gt;&lt;BR /&gt;&lt;BR /&gt;Try running this for a few tablets of your most active tables.</description>
    <pubDate>Fri, 05 Oct 2018 20:58:17 GMT</pubDate>
    <dc:creator>wdberkeley</dc:creator>
    <dc:date>2018-10-05T20:58:17Z</dc:date>
    <item>
      <title>Kudu Tablet Server - Leak Memory ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/80417#M12577</link>
      <description>&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;Hi,&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;We have done some successful tests on Kudu for a few months with the following configuration:&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;Cluster Test - A&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3" color="#000000"&gt;*3 Kudu Masters&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3" color="#000000"&gt;*3 Tablet Servers&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3" color="#000000"&gt;*Sentry+Kerberos enabled in the cluster&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3" color="#000000"&gt;*Ubuntu 14.04&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" size="3" color="#000000"&gt;*Kudu 1.5.0&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;After that, we would like to put in production in our system &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;Cluster Production - B&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;*Same configuration in cluster A&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;*Ubuntu 16.04&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;*Kudu 1.7&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;But we are currently experiencing memory errors we never had before on the Cluster A. &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;After querying on a small table (700k rows and 30 columns), all the tablet servers have their memory full but the peak never goes back down, and we can't figure out where does it comes from.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;So we can't insert new rows etc.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;The only way we find to empty memory was to restart kudu ...&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;Our memory configuration:&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;memory_limit_hard_bytes = 4GB&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;block_cache_capacity_mb = 512MB&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;For example after a query, the memory of a tablet server:&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="process_memory.png" style="width: 600px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/4857i14B226D84861DB2D/image-size/large?v=v2&amp;amp;px=999" role="button" title="process_memory.png" alt="process_memory.png" /&gt;&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;And in the log of a tablet server:&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="warning.png" style="width: 600px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/4859i374FE8B45AD4AB76/image-size/large?v=v2&amp;amp;px=999" role="button" title="warning.png" alt="warning.png" /&gt;&lt;/span&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;We have also tried to debug memory usage by using gperftools, but a bit complicated to understand (I&amp;nbsp;can send&amp;nbsp;the svg files if needed)&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;There are also a similar problem in the link below, but we didn't find any solutions &lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;&lt;A href="https://issues.apache.org/jira/browse/KUDU-1762" target="_blank"&gt;https://issues.apache.org/jira/browse/KUDU-1762&lt;/A&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;Do you have any ideas ? Are we missing something ?&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;Thank you in advance,&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="arial,helvetica,sans-serif" color="#000000"&gt;Vincent&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 13:45:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/80417#M12577</guid>
      <dc:creator>vincenth</dc:creator>
      <dc:date>2022-09-16T13:45:23Z</dc:date>
    </item>
    <item>
      <title>Re: Kudu Tablet Server - Leak Memory ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/80432#M12578</link>
      <description>&lt;P&gt;Hi Vincent. The heap profiles would be very useful. You can find instructions on how to gather them at&amp;nbsp;&lt;A href="http://kudu.apache.org/docs/troubleshooting.html#heap_sampling" target="_blank"&gt;http://kudu.apache.org/docs/troubleshooting.html#heap_sampling&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also, could you provide more detail on what you did to trigger the memory issue? The schema of the table, including the column encoding and compression types, plus the query, might be helpful too. You can find the schema on the page for the table, accessible through the leader master's web UI.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Sep 2018 14:28:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/80432#M12578</guid>
      <dc:creator>wdberkeley</dc:creator>
      <dc:date>2018-09-28T14:28:27Z</dc:date>
    </item>
    <item>
      <title>Re: Kudu Tablet Server - Leak Memory ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/80478#M12579</link>
      <description>&lt;P&gt;&lt;FONT color="#000000"&gt;Hi,&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;Many thanks for your answer.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#000000"&gt;Below is the schema of the table.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="table1.png" style="width: 600px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/4872iE7AF98BD86C301FE/image-size/large?v=v2&amp;amp;px=999" role="button" title="table1.png" alt="table1.png" /&gt;&lt;/span&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="table2.png" style="width: 600px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/4873iF0780B2486FED261/image-size/large?v=v2&amp;amp;px=999" role="button" title="table2.png" alt="table2.png" /&gt;&lt;/span&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;The memory issue happens for any kind of queries, for example:&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#000000"&gt;-select * from table ORDER BY column_string ASC&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT color="#000000"&gt;-select * from table WHERE column_string = 'abc'&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;But even at rest, the RAM consumption is about 2.6/4G for this table of 700k rows (with a total tablet size of 8 G) it seems abnormal.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;Below are the links to the svg files:&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;A href="https://sendeyo.com/up/d/96b96e542a" target="_blank"&gt;https://sendeyo.com/up/d/96b96e542a&lt;/A&gt; (Kudu Tablet Server at rest)&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;&lt;A href="https://sendeyo.com/up/d/752e5b1b98" target="_blank"&gt;https://sendeyo.com/up/d/752e5b1b98&lt;/A&gt; (Kudu Tablet Server after a query)&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;Thank you in advance,&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000000"&gt;Vincent&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Oct 2018 08:55:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/80478#M12579</guid>
      <dc:creator>vincenth</dc:creator>
      <dc:date>2018-10-01T08:55:03Z</dc:date>
    </item>
    <item>
      <title>Re: Kudu Tablet Server - Leak Memory ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/80511#M12580</link>
      <description>Looking at the profile, all of the additional memory (about 1500MB) is being used by the scanner. Of that, about 900MB is going to the block cache. Can you double check -block_cache_capacity_mb? The profile clearly shows more than 512MB of memory allocated there. The other 600MB is allocated from parsing CFile footers (the footers of files containing data for a single column). You don't have very many columns, so probably there are a lot of CFiles that need to be open. That's certainly true for ORDER BY queries on columns that aren't part of the primary key.&lt;BR /&gt;&lt;BR /&gt;The best thing to do immediately is allocated more memory to Kudu. 4GB is not very much.&lt;BR /&gt;&lt;BR /&gt;Another thing that might help investigate further is to get the scan trace metrics for a scan. They will be on the /rpcz page after you run a scan.&lt;BR /&gt;&lt;BR /&gt;Finally, when you make these memory measurements, is your application holding the scanner open? The 600MB allocated for the scanner but not in the block cache should be released when the scanner is closed. The 900MB in the block cache will be evicted if new blocks need to be cached.</description>
      <pubDate>Mon, 01 Oct 2018 17:50:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/80511#M12580</guid>
      <dc:creator>wdberkeley</dc:creator>
      <dc:date>2018-10-01T17:50:22Z</dc:date>
    </item>
    <item>
      <title>Re: Kudu Tablet Server - Leak Memory ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/80738#M12581</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your answer.&lt;BR /&gt;We have check, the block_cache_capacity_mb is 512 MiB. And below is an example of scan trace metrics :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;"trace": "1004 15:36:51.672528 (+     0us) service_pool.cc:163] Inserting onto call queue\n1004 15:36:51.672553 (+    25us) service_pool.cc:222] Handling call\n1004 15:36:51.673895 (+  1342us) inbound_call.cc:157] Queueing success response\nRelated trace 'txn':\n1004 15:36:51.672634 (+     0us) write_transaction.cc:101] PREPARE: Starting\n1004 15:36:51.672651 (+    17us) write_transaction.cc:268] Acquiring schema lock in shared mode\n1004 15:36:51.672652 (+     1us) write_transaction.cc:271] Acquired schema lock\n1004 15:36:51.672652 (+     0us) tablet.cc:400] PREPARE: Decoding operations\n1004 15:36:51.672662 (+    10us) tablet.cc:422] PREPARE: Acquiring locks for 1 operations\n1004 15:36:51.672666 (+     4us) tablet.cc:426] PREPARE: locks acquired\n1004 15:36:51.672666 (+     0us) write_transaction.cc:126] PREPARE: finished.\n1004 15:36:51.672674 (+     8us) write_transaction.cc:136] Start()\n1004 15:36:51.672675 (+     1us) write_transaction.cc:141] Timestamp: P: 1538667411672672 usec, L: 0\n1004 15:36:51.672694 (+    19us) log.cc:582] Serialized 1538 byte log entry\n1004 15:36:51.673861 (+  1167us) write_transaction.cc:149] APPLY: Starting\n1004 15:36:51.673876 (+    15us) tablet_metrics.cc:365] ProbeStats: bloom_lookups=0,key_file_lookups=0,delta_file_lookups=0,mrs_lookups=0\n1004 15:36:51.673879 (+     3us) log.cc:582] Serialized 25 byte log entry\n1004 15:36:51.673886 (+     7us) write_transaction.cc:309] Releasing row and schema locks\n1004 15:36:51.673888 (+     2us) write_transaction.cc:277] Released schema lock\n1004 15:36:51.673889 (+     1us) write_transaction.cc:196] FINISH: updating metrics\n1004 15:36:51.673918 (+    29us) write_transaction.cc:309] Releasing row and schema locks\n1004 15:36:51.673919 (+     1us) write_transaction.cc:277] Released schema lock\n",&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But nothing seems to be anormal ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For the scanner we use impala via HUE, and when we check on Cloudera Manager the query state is "FINISHED" so I think that the scanner must be closed.&lt;BR /&gt;Is there a way to track the block cache or how to refresh it?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In fact we have removed this table and we are trying to reproduce the same situation.&lt;BR /&gt;But we have noticed that when we fill the table, the tablet server memory keep increasing (slowly) over time. To fill the table, we have a loop open on Kafka and with impyla we insert new messages into Kudu. And once again, to get Kudu tablet server "default" memory we have to restart Kudu.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We are trying to see if it's because we don't close impyla cursor, but are we missing something else?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you in advance,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Vincent&lt;/P&gt;</description>
      <pubDate>Fri, 05 Oct 2018 17:02:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/80738#M12581</guid>
      <dc:creator>vincenth</dc:creator>
      <dc:date>2018-10-05T17:02:28Z</dc:date>
    </item>
    <item>
      <title>Re: Kudu Tablet Server - Leak Memory ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/80750#M12582</link>
      <description>Those are metrics for a write, not a scan. A scan RPC trace looks like&lt;BR /&gt;&lt;BR /&gt;{&lt;BR /&gt;"method_name": "kudu.tserver.ScanRequestPB",&lt;BR /&gt;"samples": [&lt;BR /&gt;{&lt;BR /&gt;"header": {&lt;BR /&gt;"call_id": 7,&lt;BR /&gt;"remote_method": {&lt;BR /&gt;"service_name": "kudu.tserver.TabletServerService",&lt;BR /&gt;"method_name": "Scan"&lt;BR /&gt;},&lt;BR /&gt;"timeout_millis": 29999&lt;BR /&gt;},&lt;BR /&gt;"trace": "1005 10:27:46.216542 (+ 0us) service_pool.cc:162] Inserting onto call queue\n1005 10:27:46.216573 (+ 31us) service_pool.cc:221] Handling call\n1005 10:27:46.216712 (+ 139us) tablet_service.cc:1796] Created scanner 9c3aaa87517f4832aa81ff0dc0d71284 for tablet 42483058124f48c685943bef52f3b625\n1005 10:27:46.216839 (+ 127us) tablet_service.cc:1872] Creating iterator\n1005 10:27:46.216874 (+ 35us) tablet_service.cc:2209] Waiting safe time to advance\n1005 10:27:46.216894 (+ 20us) tablet_service.cc:2217] Waiting for operations to commit\n1005 10:27:46.216917 (+ 23us) tablet_service.cc:2231] All operations in snapshot committed. Waited for 32 microseconds\n1005 10:27:46.216937 (+ 20us) tablet_service.cc:1902] Iterator created\n1005 10:27:46.217231 (+ 294us) tablet_service.cc:1916] Iterator init: OK\n1005 10:27:46.217250 (+ 19us) tablet_service.cc:1965] has_more: true\n1005 10:27:46.217258 (+ 8us) tablet_service.cc:1980] Continuing scan request\n1005 10:27:46.217291 (+ 33us) tablet_service.cc:2033] Found scanner 9c3aaa87517f4832aa81ff0dc0d71284 for tablet 42483058124f48c685943bef52f3b625\n1005 10:27:46.218143 (+ 852us) inbound_call.cc:162] Queueing success response\n",&lt;BR /&gt;"duration_ms": 1,&lt;BR /&gt;"metrics": [&lt;BR /&gt;{&lt;BR /&gt;"key": "rowset_iterators",&lt;BR /&gt;"value": 1&lt;BR /&gt;},&lt;BR /&gt;{&lt;BR /&gt;"key": "threads_started",&lt;BR /&gt;"value": 1&lt;BR /&gt;},&lt;BR /&gt;{&lt;BR /&gt;"key": "thread_start_us",&lt;BR /&gt;"value": 64&lt;BR /&gt;},&lt;BR /&gt;{&lt;BR /&gt;"key": "compiler_manager_pool.run_cpu_time_us",&lt;BR /&gt;"value": 117013&lt;BR /&gt;},&lt;BR /&gt;{&lt;BR /&gt;"key": "compiler_manager_pool.run_wall_time_us",&lt;BR /&gt;"value": 127378&lt;BR /&gt;},&lt;BR /&gt;{&lt;BR /&gt;"key": "compiler_manager_pool.queue_time_us",&lt;BR /&gt;"value": 114&lt;BR /&gt;}&lt;BR /&gt;]&lt;BR /&gt;}&lt;BR /&gt;]&lt;BR /&gt;},&lt;BR /&gt;&lt;BR /&gt;The profile is pointing to this server having a lot of data blocks. What is your workload like? Does it involve a lot of updates and deletes? How many tablet replicas are on this server?&lt;BR /&gt;&lt;BR /&gt;Attaching the output of the following commands will help investigate further. All should be run on the tablet server where you're seeing the memory problem:&lt;BR /&gt;&lt;BR /&gt;sudo -u kudu kudu fs check --fs_wal_dir=&amp;lt;wal dir&amp;gt; --fs_data_dirs=&amp;lt;data dirs&amp;gt;&lt;BR /&gt;&lt;BR /&gt;This should be fine to run while the server is running. You'll see a benign error message about not being able to acquire a lock and proceeding in read only mode.&lt;BR /&gt;&lt;BR /&gt;sudo -u kudu kudu local_replica data_size &amp;lt;tablet id&amp;gt; --fs_wal_dir=&amp;lt;wal dir&amp;gt; --fs_data_dirs=&amp;lt;data dirs&amp;gt;&lt;BR /&gt;&lt;BR /&gt;Try running this for a few tablets of your most active tables.</description>
      <pubDate>Fri, 05 Oct 2018 20:58:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/80750#M12582</guid>
      <dc:creator>wdberkeley</dc:creator>
      <dc:date>2018-10-05T20:58:17Z</dc:date>
    </item>
    <item>
      <title>Re: Kudu Tablet Server - Leak Memory ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/80833#M12583</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your answer.&lt;BR /&gt;Below is a scan RPC trace:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2" color="#000000"&gt;"method_name": "kudu.tserver.ScanRequestPB",&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"samples": [&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;{&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"header": {&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"call_id": 1421646,&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"remote_method": {&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"service_name": "kudu.tserver.TabletServerService",&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"method_name": "Scan"&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;},&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"timeout_millis": 179999&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;},&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"trace": "1008 12:09:55.490616 (+ 0us) service_pool.cc:163] Inserting onto call queue\n1008 12:09:55.490648 (+ 32us) service_pool.cc:222] Handling call\n1008 12:09:55.490657 (+ 9us) tablet_service.cc:1901] Found scanner 4f0b899b9c7d469a942854dfe3e1d921\n1008 12:09:55.494442 (+ 3785us) inbound_call.cc:157] Queueing success response\n",&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"duration_ms": 3,&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"metrics": [&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;{&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"key": "cfile_cache_miss",&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"value": 455&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;},&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;{&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"key": "cfile_cache_miss_bytes",&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"value": 402982&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;},&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;{&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"key": "lbm_reads_lt_1ms",&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"value": 455&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;},&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;{&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"key": "lbm_read_time_us",&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"value": 889&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;},&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;{&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"key": "spinlock_wait_cycles",&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;"value": 25600&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;}&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;]&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2" color="#000000"&gt;},&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Our worklow is relatively light for the moment:&lt;BR /&gt;Kafka -&amp;gt; python consumer-&amp;gt; Kudu&lt;BR /&gt;And the python consumer read about 25 messages per second with an average of 150 kb per second, and insert the data into kudu via impyla. (There are no updates and no deletes)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is the output for the following commands on the tablet server (the 3 tablet servers have the same memory problem):&lt;BR /&gt;sudo -u kudu kudu fs check --fs_wal_dir=&amp;lt;wal dir&amp;gt; --fs_data_dirs=&amp;lt;data dirs&amp;gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="s1_tablet_server.png" style="width: 600px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/4916iA0E002D2EBB3D889/image-size/large?v=v2&amp;amp;px=999" role="button" title="s1_tablet_server.png" alt="s1_tablet_server.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;We didn't expect to have so many missing blocks ...&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For a tablet with the command:&lt;BR /&gt;sudo -u kudu kudu local_replica data_size 6a1b491538e24709808172aabd4cedae --fs_wal_dir=&amp;lt;wal dir&amp;gt; --fs_data_dirs=&amp;lt;data dirs&amp;gt;:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="s2_tablet_0013adb5495e4566bf9acdd0286cb398.png" style="width: 290px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/4917i5F51006D31EBFFBE/image-size/large?v=v2&amp;amp;px=999" role="button" title="s2_tablet_0013adb5495e4566bf9acdd0286cb398.png" alt="s2_tablet_0013adb5495e4566bf9acdd0286cb398.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And very often we couldn't use the command because of missing block:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="s3_tablet_5ca30173bfba4461bd7618d0c38f43c5.png" style="width: 600px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/4918i710B0AE421B541A1/image-size/large?v=v2&amp;amp;px=999" role="button" title="s3_tablet_5ca30173bfba4461bd7618d0c38f43c5.png" alt="s3_tablet_5ca30173bfba4461bd7618d0c38f43c5.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Finally we have also notice the "Write buffer memory usage" on tablets. It seems a bit high for our workload isn't it ?&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="s4.png" style="width: 600px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/4919i822605E440BF375A/image-size/large?v=v2&amp;amp;px=999" role="button" title="s4.png" alt="s4.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you in advance,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Vincent&lt;/P&gt;</description>
      <pubDate>Mon, 08 Oct 2018 12:55:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/80833#M12583</guid>
      <dc:creator>vincenth</dc:creator>
      <dc:date>2018-10-08T12:55:04Z</dc:date>
    </item>
    <item>
      <title>Re: Kudu Tablet Server - Leak Memory ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/81245#M12584</link>
      <description>Hi Vincent. Sorry for the delay in responding. You might try running the fs check with the --repair option to see if it can fix the problems. Additionally, everything we've seen so far is consistent with the explanation that your tablet servers have a very large number of small data blocks, and this is responsible for the increased memory usage. It will also affect your scan performance- you can see it in the metrics, where there were 455 cfiles missed from cache (all of the blocks read) but only 400KB of data. Since each cfile (~a block) involves some fixed cost to read, this is slowing scans. I think the reason this happened is that your workload is slowly streaming writes in to Kudu-- I'm guessing inserts are roughly in order of increasing primary key? Unfortunately, there's no easy process to fix the state the table is in. Rewriting it (using a CTAS and rename, say) will make things better. In the future, upping the value of --flush_threshold_secs so it covers a long enough period so that blocks are a good size will help fix this problem. The tradeoff is the server will use some more disk space and memory for WALs. KUDU-1400 is the issue tracking the lack of a compaction policy to automatically deal with the situation you're in. It's being worked on right now.</description>
      <pubDate>Thu, 18 Oct 2018 18:05:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/81245#M12584</guid>
      <dc:creator>wdberkeley</dc:creator>
      <dc:date>2018-10-18T18:05:45Z</dc:date>
    </item>
    <item>
      <title>Re: Kudu Tablet Server - Leak Memory ?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/81346#M12585</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;No problem for the delay.&lt;BR /&gt;Yes to resume, we have between 10 and 1000 messages per seconds to ingest indu Kudu, and each message is about 200+ bytes.&lt;BR /&gt;And using Impyla we do individual row insertion (or insertion for 5 or 10 messages), does that explain all the small data blocks?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Using CTAS it's much better thanks.&lt;/P&gt;&lt;P&gt;But in general, do you have any recommandation for fast individual row insertion without too increasing memory usage? And in case of a slow streaming write ? The thing is that we would like to query the table fast enough with the latest data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Many thanks,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Vincent&lt;/P&gt;</description>
      <pubDate>Mon, 22 Oct 2018 09:42:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kudu-Tablet-Server-Leak-Memory/m-p/81346#M12585</guid>
      <dc:creator>vincenth</dc:creator>
      <dc:date>2018-10-22T09:42:55Z</dc:date>
    </item>
  </channel>
</rss>

