About Tim Armstrong

Tim Armstrong · ‎01-17-2018

@spurusothamanusually we go through a couple of steps to troubleshoot issues like this. The two most likely solutions are: 1. Give the query more memory by increasing mem_limit or reducing # of concurrent queries 2. Adjust the SQL by rewriting the query or adding hints to get a different query plan to that avoids having so many duplicate values on the right side of the join. Depending on the exact scenario, the solution might be 1, 2, or both. straight_join is only useful if you use it to force a plan with a different join order. If you want input on whether you have a bad plan and what a better join order might be, please provide a query profile.

Plop564 · ‎01-12-2018

Hi @Tim Armstrong Thanks for the quality answer 🙂 As you mention, we don't use in production the lastest Impala version, so it is indeed possible that there are bugs in Impala or UDF/UDAF. I will check the changelog and evaluate possibles issues on thoses very large requests. Regarding our self-made UDF, the good thing is, after reviewing our logs history, such error was also triggered before the deployement of such UDF. So if there are some memory leaks currently, it might be then unrelated to our work, and it might be a minor issue as we have just to wait to upgrade the cluster (and Impala version) Otherwise, many thanks for implementation details you give me, it helps to better understand !

Tim Armstrong · ‎01-11-2018

That is a good suggestion, I went ahead and did it.

Plop564 · ‎12-10-2017

Thanks again !

Plop564 · ‎12-10-2017

Hi @Tim Armstrong Thank you very much for the reply !

JulienMaria · ‎11-29-2017

Hello, Many thanks for the answer! The mt_dop is exactly what we need. I hope this development will be available with impala 2_8. The usecase is we are migrating from a "many small servers" cluster to a "fewer bigger servers" cluster, with a 6 time factor reduction. Even with the same hardware performances, we end up having too few fragment instances to exploit all cpu. regards Julien

ABaaya · ‎11-07-2017

Below is the error message i received . 1137 2017-11-07 03:55:11,536 [INFO ] There are no more tasks to run at this time 1138 Starting Impala Shell without Kerberos authentication 1140 Server version: impalad version 2.6.0-cdh5.8.4 RELEASE (build 207450616f75adbe082a4c2e1145a2384da83fa6) 1141 Invalidating Metadata 1142 Query: invalidate metadata 1143 1144 Fetched 0 row(s) in 4.11s 1145 Query: use `DBNAME` 1146 Query: insert overwrite table Table partition(recordtype) select adid,seg,profile,livecount, 1147 count(distinct mc) as nofs,stbnt,1 from table1 where livecount<>0 group by adid,seg,profile,livecount,stbnt 1148 WARNINGS: 1149 CatalogException: Table 'dbname.table' was modified while operation was in progress, aborting execution. 1150 1151

nkonzo · ‎10-24-2017

this has been resolve, UDFs are the best. I wrote java and C++ codes, they check the current user. if user is not classified, data is masked for classified users it appear as is. --final steps is to create UFS eg hrmask --create a view with sensitive column preffixed with mask function. [localhost.localdomain:21000] > create view redact_m as select Title,GivenName,Surname,hrmask(CCNumber),hrmask(idNumber) from redact; Query: create view redact_m as select Title,GivenName,Surname,hrmask(CCNumber),hrmask(idNumber) from redact Fetched 0 row(s) in 0.41s [localhost.localdomain:21000] > select * from redact_m limit 2; Query: select * from redact_m limit 2 +-------+-----------+---------+------------------+---------------+ | title | givenname | surname | _c3 | _c4 | +-------+-----------+---------+------------------+---------------+ | Title | GivenName | Surname | NULL | NULL | | Ms. | Eva | Howard | 5163458320525980 | 6345832052598 | +-------+-----------+---------+------------------+---------------+ WARNINGS: Error converting column: 3 TO BIGINT (Data is: CCNumber) Error converting column: 4 TO BIGINT (Data is: idNumber) file: hdfs://localhost:8020/test/hive/fake.csv record: Title,GivenName,Surname,CCNumber,idNumber Fetched 2 row(s) in 4.74s [localhost.localdomain:21000] > results are clear as i logged as admin, but once log as other user, sensitive columns are masked. [localhost.localdomain:21000] > select * from redact_m limit 2; Query: select * from redact_m limit 2 +-------+-----------+---------+----------+----------+ | title | givenname | surname | _c3 | _c4 | +-------+-----------+---------+----------+----------+ | Title | GivenName | Surname | NULL | NULL | | Ms. | Eva | Howard | 99999999 | 99999999 | +-------+-----------+---------+----------+----------+ WARNINGS: Error converting column: 3 TO BIGINT (Data is: CCNumber) Error converting column: 4 TO BIGINT (Data is: idNumber) file: hdfs://localhost:8020/test/hive/fake.csv record: Title,GivenName,Surname,CCNumber,idNumber Fetched 2 row(s) in 0.15s [localhost.localdomain:21000] > I wish if this could be a built in function coming with sentry by default.

shiyu17 · ‎10-23-2017

Got it, thanks Tim.

Tim Armstrong · ‎09-29-2017

Yeah I agree - I'd like to spend some time cleaning that up 🙂

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	141

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: Memory limit exceeded cannot perform hash join

Re: FunctionContextImpl::AllocateLocal's allocatio...

Re: Impala: "Cancelled due to unreachable impalad(...

Re: Impala UDF C++ : if an error occurs, is CLOSE_...

Re: Impala UDF C++ : risks to deploy in a producti...

Re: Impala: Control fragment number

Re: Errors while running alter table and compute s...

Re: masking UFD function for impala

Re: Understand Impalad memz Breakdown

Re: Memory handling in Impala UDA functions