Member since
07-29-2015
535
Posts
141
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7611 | 12-18-2020 01:46 PM | |
4985 | 12-16-2020 12:11 PM | |
3799 | 12-07-2020 01:47 PM | |
2472 | 12-07-2020 09:21 AM | |
1613 | 10-14-2020 11:15 AM |
01-17-2018
02:14 PM
@spurusothamanusually we go through a couple of steps to troubleshoot issues like this. The two most likely solutions are: 1. Give the query more memory by increasing mem_limit or reducing # of concurrent queries 2. Adjust the SQL by rewriting the query or adding hints to get a different query plan to that avoids having so many duplicate values on the right side of the join. Depending on the exact scenario, the solution might be 1, 2, or both. straight_join is only useful if you use it to force a plan with a different join order. If you want input on whether you have a bad plan and what a better join order might be, please provide a query profile.
... View more
01-12-2018
02:34 AM
Hi @Tim Armstrong Thanks for the quality answer 🙂 As you mention, we don't use in production the lastest Impala version, so it is indeed possible that there are bugs in Impala or UDF/UDAF. I will check the changelog and evaluate possibles issues on thoses very large requests. Regarding our self-made UDF, the good thing is, after reviewing our logs history, such error was also triggered before the deployement of such UDF. So if there are some memory leaks currently, it might be then unrelated to our work, and it might be a minor issue as we have just to wait to upgrade the cluster (and Impala version) Otherwise, many thanks for implementation details you give me, it helps to better understand !
... View more
01-11-2018
04:07 PM
That is a good suggestion, I went ahead and did it.
... View more
12-10-2017
11:17 AM
Thanks again !
... View more
12-10-2017
11:14 AM
Hi @Tim Armstrong Thank you very much for the reply !
... View more
11-29-2017
01:04 AM
Hello, Many thanks for the answer! The mt_dop is exactly what we need. I hope this development will be available with impala 2_8. The usecase is we are migrating from a "many small servers" cluster to a "fewer bigger servers" cluster, with a 6 time factor reduction. Even with the same hardware performances, we end up having too few fragment instances to exploit all cpu. regards Julien
... View more
11-07-2017
02:37 PM
Below is the error message i received . 1137 2017-11-07 03:55:11,536 [INFO ] There are no more tasks to run at this time 1138 Starting Impala Shell without Kerberos authentication 1140 Server version: impalad version 2.6.0-cdh5.8.4 RELEASE (build 207450616f75adbe082a4c2e1145a2384da83fa6) 1141 Invalidating Metadata 1142 Query: invalidate metadata 1143 1144 Fetched 0 row(s) in 4.11s 1145 Query: use `DBNAME` 1146 Query: insert overwrite table Table partition(recordtype) select adid,seg,profile,livecount, 1147 count(distinct mc) as nofs,stbnt,1 from table1 where livecount<>0 group by adid,seg,profile,livecount,stbnt 1148 WARNINGS: 1149 CatalogException: Table 'dbname.table' was modified while operation was in progress, aborting execution. 1150 1151
... View more
10-24-2017
12:45 AM
this has been resolve, UDFs are the best. I wrote java and C++ codes, they check the current user. if user is not classified, data is masked for classified users it appear as is. --final steps is to create UFS eg hrmask --create a view with sensitive column preffixed with mask function. [localhost.localdomain:21000] > create view redact_m as select Title,GivenName,Surname,hrmask(CCNumber),hrmask(idNumber) from redact; Query: create view redact_m as select Title,GivenName,Surname,hrmask(CCNumber),hrmask(idNumber) from redact Fetched 0 row(s) in 0.41s [localhost.localdomain:21000] > select * from redact_m limit 2; Query: select * from redact_m limit 2 +-------+-----------+---------+------------------+---------------+ | title | givenname | surname | _c3 | _c4 | +-------+-----------+---------+------------------+---------------+ | Title | GivenName | Surname | NULL | NULL | | Ms. | Eva | Howard | 5163458320525980 | 6345832052598 | +-------+-----------+---------+------------------+---------------+ WARNINGS: Error converting column: 3 TO BIGINT (Data is: CCNumber) Error converting column: 4 TO BIGINT (Data is: idNumber) file: hdfs://localhost:8020/test/hive/fake.csv record: Title,GivenName,Surname,CCNumber,idNumber Fetched 2 row(s) in 4.74s [localhost.localdomain:21000] > results are clear as i logged as admin, but once log as other user, sensitive columns are masked. [localhost.localdomain:21000] > select * from redact_m limit 2; Query: select * from redact_m limit 2 +-------+-----------+---------+----------+----------+ | title | givenname | surname | _c3 | _c4 | +-------+-----------+---------+----------+----------+ | Title | GivenName | Surname | NULL | NULL | | Ms. | Eva | Howard | 99999999 | 99999999 | +-------+-----------+---------+----------+----------+ WARNINGS: Error converting column: 3 TO BIGINT (Data is: CCNumber) Error converting column: 4 TO BIGINT (Data is: idNumber) file: hdfs://localhost:8020/test/hive/fake.csv record: Title,GivenName,Surname,CCNumber,idNumber Fetched 2 row(s) in 0.15s [localhost.localdomain:21000] > I wish if this could be a built in function coming with sentry by default.
... View more
09-29-2017
05:29 PM
Yeah I agree - I'd like to spend some time cleaning that up 🙂
... View more