Member since
07-29-2015
535
Posts
141
Kudos Received
103
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 7636 | 12-18-2020 01:46 PM | |
| 4994 | 12-16-2020 12:11 PM | |
| 3804 | 12-07-2020 01:47 PM | |
| 2476 | 12-07-2020 09:21 AM | |
| 1621 | 10-14-2020 11:15 AM |
09-25-2017
10:03 AM
1 Kudo
I'd expect ORDER BY trunc(ts, "DD") to work. E.g. on my system this works: [localhost:21000] > select timestamp_col, tinyint_col from functional_hbase.alltypestiny order by trunc(timestamp_col, 'DD'), tinyint_col desc;
+---------------------+-------------+
| timestamp_col | tinyint_col |
+---------------------+-------------+
| 2009-01-01 00:01:00 | 1 |
| 2009-01-01 00:00:00 | 0 |
| 2009-02-01 00:01:00 | 1 |
| 2009-02-01 00:00:00 | 0 |
| 2009-03-01 00:01:00 | 1 |
| 2009-03-01 00:00:00 | 0 |
| 2009-04-01 00:01:00 | 1 |
| 2009-04-01 00:00:00 | 0 |
+---------------------+-------------+
... View more
09-25-2017
07:43 AM
If you want to implement a C++ UDF though, I'd recommend starting with the docs here: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_udf.html. There are some examples of string manipulation UDFs on that page.
... View more
09-25-2017
07:43 AM
[localhost:21000] > select concat(substring(l_comment, 1, 3), regexp_replace(substring(l_comment, 4, length(l_comment) - 3), '[^ ]', '*'), substring(l_comment, length(l_comment) - 3)) from tpch.lineitem limit 5;
Query: select concat(substring(l_comment, 1, 3), regexp_replace(substring(l_comment, 4, length(l_comment) - 3), '[^ ]', '*'), substring(l_comment, length(l_comment) - 3)) from tpch.lineitem limit 5
Query submitted at: 2017-09-25 07:38:09 (Coordinator: http://tarmstrong-box:25000)
Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=34f1a993e3cb99a:51d89bf800000000
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| concat(substring(l_comment, 1, 3), regexp_replace(substring(l_comment, 4, length(l_comment) - 3), '[^ ]', '*'), substring(l_comment, length(l_comment) - 3)) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| egu*** ****** ***** *** the |
| ly ***** ************* ***** **** old |
| rio***** ******** ******* *** dep |
| lit*** ******** **** **n de |
| pe***** ****** ***** **y re |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ I'd recommend doing it with builtin functions since it will be easier to maintain. I included an example above of how you might do it using regexp_replace. I expect it will be quite fast - Impala's query compilation can inline functions like length() and substring() so those are essentially free in Impala (unlike many other SQL engines). The main cost is regexp_replace() but I'd expect that to be quite fast too.
... View more
09-25-2017
07:30 AM
Impala doesn't have a date data type, it does have a timestamp type though: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_timestamp.html. to_date() converts timestamp to a string.https://www.cloudera.com/documentation/enterprise/latest/topics/impala_datetime_functions.html If you want to remove the time portion of the timestamp you can use trunc(ts, "DD") to get the timestamp of midnight on that day: https://www.cloudera.com/documentation/enterprise/latest/topics/impala_datetime_functions.html I don't understand your question about "order by". Order by should work for all scalar data types in Impala (timestamp, double, etc). What do you want it to do and what is it doing now?
... View more
09-20-2017
12:19 AM
1 Kudo
If you're starting Impala from the command line like that you can configure flags and environment variables with the /etc/default/impala - https://www.cloudera.com/documentation/enterprise/latest/topics/impala_processes.html#starting_via_cmdline . The relevant variable in that file is IMPALA_SERVER_ARGS. (If anyone else reads this, if you're using Cloudera Manager you can configure the scratch directories through the UI. You probably won't have to since CM does a pretty good job of autoconfiguring scratch directories).
... View more
09-19-2017
09:25 AM
The user that Impala is running under needs to be able to remove and recreate the scratch directory at startup (i.e. /tmp/impala-scratch). This is done to ensure that the directory is free of old files and that Impala has ownership of the directory. Based on the log message that user doesn't have the required permissions to do that. I suspect if you just delete that directory and let Impala create it at startup, that will solve your problem.
... View more
09-18-2017
06:00 PM
We've seen this before when a bug caused a zombie impalad process to get stuck listening on port 22000. It's worth seeing if one is stilll hanging around and if so, running kill -9 on it.
... View more
09-18-2017
05:15 PM
The Impala daemon wasn't able to set up the scratch directories during startup. The reason will be logged in one of the impalad*.WARNING logs, probably one of the first messages in there.
... View more
09-14-2017
02:42 PM
1 Kudo
Very astute questions! The version of StringConcatUpdate() in impala-udf-samples is correct. The use of the "local" allocation in the second version of StringConcatUpdate() is incorrect. I filed a bug to correct that: https://issues.apache.org/jira/browse/IMPALA-5939. The problem is that the StringVal() constructor and StringVal::CopyFrom() use AllocateLocal() behind the scenes. Your UDA does not own the memory returned by AllocateLocal() and it will be automatically cleaned up by Impala at some point after your Update function returns. It's a bit unfortunate that the two sets of examples have diverged. I recommend looking at https://github.com/cloudera/impala-udf-samples/ because that's intended to be the public-facing version and I think is more up to date. You may be also be interested in this PR https://github.com/cloudera/impala-udf-samples/pull/18, which improves the UDF examples to better handle failed memory allocations. With regards to 2). That is a builtin aggregate function that uses some internal functionality that we added recently. Some builtin functions only require a fixed-size intermediate value, so there's a way to declare this and have it preallocated by the Impala runtime. That functionality isn't exposed to UDAs for now.
... View more
08-24-2017
08:32 AM
I'm not sure there are risks specifically. The best practice is to use Cloudera manager to configure memory limits for different services, so this is the right way to configure things. Cloudera Manager does have support to help set up memory limits for applications: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_autoconfig.html#concept_xjy_vb3_rn . For a production system, it's important to put thought into how much memory your system needs and how it's allocated between different services. E.g. as an earlier poster saw, 256MB is not enough memory to do much interesting with Impala.
... View more