About drake4

drake4 · ‎11-13-2019

Hi @Grant Henke The timestamp predicate works after I cast it to timestamp, thank you for your help!

drake4 · ‎11-12-2019

I have a kudu table with schema: create table test_table ( `time` timestamp not null, -- `id` string not null, -- ..... primary key(`time`,`id`) ) partition by hash(id) partitions 6 stored as kudu; and I try to use spark to copy the data to a parquet table in hdfs: val df = spark.read.options(Map("kudu.master" -> kuduMasters, "kudu.table" -> KuduTable)).format("kudu").load .where("time> '2019-10-29 08:05:10' AND time < '2019-10-29 08:05:30'") df.write .mode("append") .parquet("hdfs://parquet") But the performance is low and the job seems to be doing a full table scan against the kudu table (from spark UI, I can see the "Scan Kudu impala::table" is the number of entire table). For comparison I did a copy using impala's "insert into from" which is much faster and the "where" predicate seems to be working. Is this full table scan behavior expected or am I missing something here? The kudu version is 1.10.0 and spark client is kudu-spark2_2.11:1.10.0

drake4 · ‎07-10-2019

We only have two disks(data directories) per t-server for current setup, so the maintenance_manager_num_threads is only 1. Will this be the bottleneck for write? Do you have recommended number for data directories(I can't find any in the document yet)? Also we obseves that with less tablets, the memory usage will be lower and so that the ingestion rate can be kept. Do you think reducing the tablets(partitions) can also benefit the ingestion rate?

drake4 · ‎07-10-2019

The ingestion rate drops about 60-70 percent. Is this write rejection the backpressure you mentioned? Does this mean I need to provide more memory in order to achieve the ingestion rate without rejection?

drake4 · ‎07-10-2019

Hi Adar, I have made another test with following setting: --memory_limit_hard_bytes=80530636800 (75G) --memory_limit_soft_percentage=85 --memory_limit_warn_threshold_percentage=90 --memory_pressure_percentage=70 One of the t-server starts to return: Service unavailable: Soft memory limit exceeded (at 89.17% of capacity) after the ingestion job running for around 20 mins and the ingestion rate drops. Below is the heap sample of that t-server: Let me know if you have any thoughts

drake4 · ‎07-10-2019

Thanks for reply Adar! It seems there is no backpressure you mentioned take place before it hits the memory limit (Service unavailable: Soft memory limit exceeded). Then it starts to reject incoming writes which bring down the ingestion rate a lot, that is not the backpressure you mentioned right? I will try to paste a heap sample here.

drake4 · ‎07-10-2019

Hi, We are testing heavy write on kudu-1.8 shipped by CDH 6.1.1. Now we have around 350 "hot replicas" on one t-server. When we are ingesting the data, the memory consumed by each tablet keep on increasing and eventally the total used memory end up being exceeding the memory limit we give to kudu. From the mem-tracker, some replica's memrowset use as much as 500 MB,does that mean we need to provide up to 350 * 500 MB= 175 GB for each t-server? Is there any way/config to limit/throttle the memory consumed by memrowset(The limit shows in mem-tradcker UI is "none", see below screenshot)? Any suggestion will be appreaciated, thanks!

drake4 · ‎01-24-2017

Hi Vinithra, Our on-premise software will depend on some services from CDH. As one of the prerequisites, we want to provide some kind of installer (or script) to easiliy provision a CDH cluster. If there is no good solution for auto deploy CDH cluster, we can only refer to the instructions and ask customers install CDH by themselves as you mentioned.

drake4 · ‎01-24-2017

Hi Mike, I understand BYON plugin is for showcase purpose. But say if you (Cloudera) have a client who wants to deploy CDH to their private datacenter, is manual deployment the only option then? Or you have separate solution for them?

drake4 · ‎01-23-2017

Hi Jadair: We are evaluating Director with BYON plugin too. For the production usage part, since the Cloudera Director Service Provider Interface is hierarchical structured, can you elaborate which part for BYON plugin is not production ready (e.g, compare to AWS plugin)?

Online	Offline
Last Visited	‎03-05-2020 09:49 PM

Member Since	‎01-09-2017 11:11 AM
Last Visited	‎03-05-2020 09:49 PM
Posts	12

Cloudera Community

Re: Issue of copying data from kudu to hdfs using ...

Issue of copying data from kudu to hdfs using spar...

Re: How to limit Kudu's memrowset

Re: How to limit Kudu's memrowset

Re: How to limit Kudu's memrowset

Re: How to limit Kudu's memrowset

How to limit Kudu's memrowset

Re: Cloudera Director BYON plugin blacklisted

Re: Cloudera Director BYON plugin blacklisted

Re: Cloudera Director BYON plugin blacklisted