Member since
07-29-2015
535
Posts
140
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6039 | 12-18-2020 01:46 PM | |
3925 | 12-16-2020 12:11 PM | |
2773 | 12-07-2020 01:47 PM | |
1985 | 12-07-2020 09:21 AM | |
1276 | 10-14-2020 11:15 AM |
12-18-2020
06:33 AM
We have restarted nearly every component of the affected HDFS cluster and impala performance has improved. Sadly that doesn't explain the underlying issue.
... View more
12-16-2020
12:11 PM
1 Kudo
In that case - scheduling of remote reads - for Kudu it's based on distributing the work for each scan across nodes as evenly as possible. For Kudu we randomize the assignment somewhat to even things out, but it's distribution is not based on resource availability. I.e. we generate the schedule and then wait for the resources to become available on the nodes we picked. I understand that reversing that (i.e. find available nodes, then distribute work on them) would be desirable in some cases but there are pros and cons of doing that. For remote reads from filesystems/object stores, on more recent versions, we do something a bit different - each file has affinity to a set of executors and we try to schedule it on those so that we're more likely to get hits in the remote data cache.
... View more
12-15-2020
08:59 PM
You need to run compute stats on the base tables referenced by the views - compute stats directly on a view isn't supported.
... View more
12-10-2020
04:28 PM
Great news!
... View more
12-08-2020
09:36 AM
Glad to help! I'm excited about the S3 changes just cause it simplifies ingestion so much. I add a disclaimer here in case other people read the solution. There's *some* potential for performance impact when disabling s3guard for S3-based tables with large partition counts, just because of the difference in implementation - retrieving the listing from dynamodb may be quicker than retrieving it from S3 in some scenarios.
... View more
12-07-2020
09:21 AM
Slide 17 here has some rules of thumb - https://blog.cloudera.com/latest-impala-cookbook/ Can you mention what version you're running and whether you have any other non-standard configs set, e.g. load_catalog_in_background. We made some improvements in this area and have added some options in more recent versions.
... View more
12-07-2020
09:17 AM
These are good questions that come up frequently. https://docs.cloudera.com/runtime/7.2.2/administering-kudu/topics/kudu-security-trusted-users.html discusses the issue. In summary, Hive/Impala tables (i.e. those with entries in the Hive Metastore) are authorized in the same way, regardless of whether backing storage is HDFS, S3, Kudu, HBase, etc - the SQL interface does the authorization to confirm that the end user has access to the table, columns, etc, then the service accesses the storage as the privileged user (Impala in this case). In this model, if you create an external Kudu table in Impala and give permissions to a user to access the table via Impala, then they will have permissions to access the data in the underlying Kudu table. The thing that closes the loophole here is that creating the external Kudu table requires very high privileges - ALL permission on SERVER - a regular user can't create an external Kudu table pointed at an arbitrary Kudu cluster or table.
... View more
11-30-2020
09:14 AM
@Tim Armstrong any hints how to configure the JDBC connection to use impersonation? Assuming I use the recommended Cloudera drivers, can you send a code snippet that invokes a simple SQL query on behalf of some user Thanks!
... View more
11-19-2020
10:13 AM
You can use an expression incited of a query. in the expression , your query should be something like this. =" SELECT A.COL1 , A. COL2 FROM schema.tableName A WHERE A.COL1 = '" & Parameters!parameterName.Value & "' " Notice the Quotation marks besides the parameter ( " , ' ) and equal ( = ) sign at the beginning You should create fields manually( Use query designer without parameters and let SSRS do the Refresh Fields task)
... View more
11-13-2020
09:35 PM
could you give a working example of this in spark 2.4 using scala dataframe can't seem to find the correct syntax... val result = dataFrame.select(count(when( col("col_1") === "val_1" && col("col_2") === "val_2", 1)
... View more