Member since
06-29-2017
11
Posts
0
Kudos Received
0
Solutions
05-09-2018
06:14 PM
1 Kudo
Kudu does not use HDFS at all. It requires its own storage space. If you use 3x replication (the default) and no compression then Kudu will take 3x the amount of space that you ingest. However Kudu tends to efficiently encode and compress data so you will have to evaluate how much space Kudu takes based on the schema and data ingestion patterns you have. The more RAM you give Kudu the better it will perform... treat Kudu like a database (think MySQL or Vertica). Right now there is no way to specify a quota, the only available settings related to that are: --fs_wal_dir_reserved_bytes ( https://kudu.apache.org/docs/configuration_reference.html#kudu-tserver_fs_wal_dir_reserved_bytes ) and --fs_data_dirs_reserved_bytes ( https://kudu.apache.org/docs/configuration_reference.html#kudu-tserver_fs_data_dirs_reserved_bytes ) If you need to closely control the amount of space Kudu uses then you can consider putting it on its own partitions or machines. However if it possible to put Kudu on the same machines that have HDFS running on them if you want to do that. Hope that helps!
... View more
02-27-2018
08:06 AM
1 Kudo
Kudu has the capability to evaluate simple filters natively, e.g. using the primary index of a table, so Impala will push such filters directly to Kudu. More complex filters (e.g. those involving UDFs) are evaluated by Impala after receiving rows from Kudu. Impala clearly distinguishes the filters evaluated by Kudu and those by Impala in the explain plan.
... View more