Member since
07-01-2015
460
Posts
78
Kudos Received
43
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1346 | 11-26-2019 11:47 PM | |
1304 | 11-25-2019 11:44 AM | |
9474 | 08-07-2019 12:48 AM | |
2178 | 04-17-2019 03:09 AM | |
3487 | 02-18-2019 12:23 AM |
01-29-2018
07:29 AM
Hi, I did not found any documents describing how the spark tasks are assigned to which exectur when data is read from Kudu into dataframes. I noticed, that in some cases (did not have enough time to test) Spark reads data ONLY from the Leaders of the tablets, so moving data across network. Is there any setting or configuration for co-locate the spark task in an executor with a Kudu tablet? Based on the Kudu documentation, the LEADER is for write, but the FOLLOWERs can server reads too.. Thanks
... View more
Labels:
- Labels:
-
Apache Kudu
-
Apache Spark
01-29-2018
01:52 AM
So the correct answer is that: Tables with range partitions defined via upper and lower boundaries cannot be extended. Tables with partitions defined as a single value can be extended.
... View more
01-28-2018
05:02 AM
And also on p.29: New Features in Kudu 0.10.0 • Users may now manually manage the partitioning of a range-partitioned table. When a table is created, the user may specify a set of range partitions that do not cover the entire available key space. A user may add or drop range partitions to existing tables. This feature can be particularly helpful with time series workloads in which new partitions can be created on an hourly or daily basis. Old partitions may be efficiently dropped if the application does not need to retain historical data past a certain point.
... View more
01-28-2018
05:00 AM
It is confusing, Apache Kudu User Guide, p.27: Partitioning Limitations • Tables must be manually pre-split into tablets using simple or compound primary keys. Automatic splitting is not yet possible. Range partitions may be added or dropped after a table has been created. See Schema Design for more information.
... View more
01-20-2018
09:56 AM
Hi, I have a simple table with range partitions defined by upper and lower bounds. CREATE TABLE work.sales_by_year ( year INT, sale_id INT, amount INT, PRIMARY KEY (sale_id, year) ) PARTITION BY RANGE (year) ( PARTITION VALUES < 2015, PARTITION 2015 <= VALUES < 2016, PARTITION 2016 <= VALUES ) STORED AS KUDU; So this table has three partitions: +--------+-----------+----------+-------------------------------------------------+------------+ | # Rows | Start Key | Stop Key | Leader Replica | # Replicas | +--------+-----------+----------+-------------------------------------------------+------------+ | -1 | 800007DF | host1:7050 | 3 | | -1 | 800007DF | 800007E0 | host2:7050 | 3 | | -1 | 800007E0 | | host3:7050 | 3 | +--------+-----------+----------+-------------------------------------------------+------------+ Now I would like to end the last range with 2017 and have another interval for values >= 2017. I tried multiple syntaxes, but it does not work: alter table work.sales_by_year add range partition 2016 <= VALUES < 2017;
Query: alter table work.sales_by_year add range partition 2016 <= VALUES < 2017
ERROR: ImpalaRuntimeException: Error adding range partition in table sales_by_year
CAUSED BY: NonRecoverableException: New range partition conflicts with existing range partition: 2016 <= VALUES < 2017 alter table work.sales_by_year add range partition VALUE = 2017;
Query: alter table work.sales_by_year add range partition VALUE = 2017
ERROR: ImpalaRuntimeException: Error adding range partition in table sales_by_year
CAUSED BY: NonRecoverableException: New range partition conflicts with existing range partition: 2017 <= VALUES < 2018 These error messages are misleading, if I run show partitions, I am having still those three intervals, so no 2017 and 2018. Any hints how to extend the range partitons? Thanks
... View more
Labels:
- Labels:
-
Apache Kudu
01-19-2018
08:00 AM
Hi, can somebody give a hint or guideline how to maximize the Kudu scan (read from kudu table) performance from Spark? I tried a simple dataframe read, tried also to create multiple data frames, where each had different filters on one of the column in the primary key columns, and then union the dataframes and write to HDFS but it seems to me that the Tablet server is handling out the data via one scanner, so there are 5 tablet servers, 5 scanners and 5 tasks in 5 execturos. Is it possible to trigger more scanners via spark? Thanks
... View more
Labels:
- Labels:
-
Apache Kudu
-
Apache Spark
01-10-2018
09:15 AM
I stopped CDH and did a Kerberos configuration redeploy. The /etc/krb5.conf is more or less the same. The only difference is the last line "[domain_realm], it was added by the CM. After the redeploy the CDH started and now everything is in green Thanks Tomas [libdefaults]
default_realm = MYREALM.LOCAL
dns_lookup_kdc = false
dns_lookup_realm = false
ticket_lifetime = 86400
renew_lifetime = 604800
forwardable = true
default_tgs_enctypes = aes256-cts aes128-cts
default_tkt_enctypes = aes256-cts aes128-cts
permitted_enctypes = aes256-cts aes128-cts
udp_preference_limit = 1
kdc_timeout = 3000
[realms]
MYREALM.LOCAL = {
kdc = 10.197.16.197 10.197.16.88
admin_server = 10.197.16.197 10.197.16.88
}
[domain_realm]
... View more
01-05-2018
07:23 AM
Hi, after an upgrade from CM 5.11 to 5.13 the Cloudera Manager complains with a red excl mark: Cluster has stale Kerberos client configuration. The cluster was all in green before upgrade and had no problem with kerberos configs (/etc/krb5.conf). What is more concerning, that after opening this warning, three (gateway) nodes does not require upgrade, but the rest of them does: Consider stopping roles on these hosts to ensure that they are updated by this command: ip-10-197-13-169.eu-west-1.compute.internal; ip-10-197-15-82.eu-west-1.compute.internal; ip-10-197-18-[113, 248].eu-west-1.compute.internal... But the command is not there. What should I do? Stop the whole CDH and then rerun the deploy? Thanks for the advise, T.
... View more
Labels:
- Labels:
-
Cloudera Manager
12-06-2017
10:17 PM
No. Then I dont know. Can you paste here all the commands how you generated the keystore and keys?
... View more
12-06-2017
06:07 AM
I dont know the solution, but I think cloudera manager agent which is running under root is starting these processes and sets the correct permissions. Now I could imagine, that maybe the cloudera-scm-agent is not running under root, or maybe the permissions are set wrongly. I would check the exact process directory of the hive metastore. I would also verify in processes, that just one copy of HiveMetastore is running (to eliminate the possibility that two processes are running in the same time) Try to restart the cloudera-scm-agent.
... View more