About Tim Armstrong

hl-man · ‎12-18-2020

We have restarted nearly every component of the affected HDFS cluster and impala performance has improved. Sadly that doesn't explain the underlying issue.

Tim Armstrong · ‎12-16-2020

In that case - scheduling of remote reads - for Kudu it's based on distributing the work for each scan across nodes as evenly as possible. For Kudu we randomize the assignment somewhat to even things out, but it's distribution is not based on resource availability. I.e. we generate the schedule and then wait for the resources to become available on the nodes we picked. I understand that reversing that (i.e. find available nodes, then distribute work on them) would be desirable in some cases but there are pros and cons of doing that. For remote reads from filesystems/object stores, on more recent versions, we do something a bit different - each file has affinity to a set of executors and we try to schedule it on those so that we're more likely to get hits in the remote data cache.

Tim Armstrong · ‎12-15-2020

You need to run compute stats on the base tables referenced by the views - compute stats directly on a view isn't supported.

Tim Armstrong · ‎12-10-2020

Great news!

Tim Armstrong · ‎12-08-2020

Glad to help! I'm excited about the S3 changes just cause it simplifies ingestion so much. I add a disclaimer here in case other people read the solution. There's *some* potential for performance impact when disabling s3guard for S3-based tables with large partition counts, just because of the difference in implementation - retrieving the listing from dynamodb may be quicker than retrieving it from S3 in some scenarios.

Tim Armstrong · ‎12-07-2020

Slide 17 here has some rules of thumb - https://blog.cloudera.com/latest-impala-cookbook/ Can you mention what version you're running and whether you have any other non-standard configs set, e.g. load_catalog_in_background. We made some improvements in this area and have added some options in more recent versions.

Tim Armstrong · ‎12-07-2020

These are good questions that come up frequently. https://docs.cloudera.com/runtime/7.2.2/administering-kudu/topics/kudu-security-trusted-users.html discusses the issue. In summary, Hive/Impala tables (i.e. those with entries in the Hive Metastore) are authorized in the same way, regardless of whether backing storage is HDFS, S3, Kudu, HBase, etc - the SQL interface does the authorization to confirm that the end user has access to the table, columns, etc, then the service accesses the storage as the privileged user (Impala in this case). In this model, if you create an external Kudu table in Impala and give permissions to a user to access the table via Impala, then they will have permissions to access the data in the underlying Kudu table. The thing that closes the loophole here is that creating the external Kudu table requires very high privileges - ALL permission on SERVER - a regular user can't create an external Kudu table pointed at an arbitrary Kudu cluster or table.

PyMeH · ‎11-30-2020

@Tim Armstrong any hints how to configure the JDBC connection to use impersonation? Assuming I use the recommended Cloudera drivers, can you send a code snippet that invokes a simple SQL query on behalf of some user Thanks!

RajithaF · ‎11-19-2020

You can use an expression incited of a query. in the expression , your query should be something like this. =" SELECT A.COL1 , A. COL2 FROM schema.tableName A WHERE A.COL1 = '" & Parameters!parameterName.Value & "' " Notice the Quotation marks besides the parameter ( " , ' ) and equal ( = ) sign at the beginning You should create fields manually( Use query designer without parameters and let SSRS do the Refresh Fields task)

ChineduLB · ‎11-13-2020

could you give a working example of this in spark 2.4 using scala dataframe can't seem to find the correct syntax... val result = dataFrame.select(count(when( col("col_1") === "val_1" && col("col_2") === "val_2", 1)

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	140

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: Some nodes are way slower on HDFS scan then th...

Re: Impala queries are not distributing to all the...

Re: Impala query time out's

Re: GET_COLUMS when launching queries through ODBC

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Kudu-impala security

Re: Does Impala support Impersonation?

Re: Unable to pass parameter to Impala from SSRS

Re: Run Multiple Count Operation On Data Table