Member since
07-31-2017
7
Posts
0
Kudos Received
0
Solutions
09-14-2017
09:35 PM
Thanks Tim for the detailed and concise reply ! Special thanks for pointing the pull request. Its really nice to see one of the impala committers answer my query 🙂 The source of all confusion was the header file in impala-udf-samples https://github.com/cloudera/impala-udf-samples/blob/master/uda-sample.h // Note: As of Impala 1.2, UDAs must have the same intermediate and result types (see the
// udf.h header for the full Impala UDA specification, which can be found at
// https://github.com/cloudera/impala/blob/master/be/src/udf/udf.h).
...
... This made me believe that this repo is old. Also, the code in .h and .cc file for impala-udf-samples still doesnt use intermediate type that is being used in the samples in https://github.com/cloudera/Impala/blob/cdh5-trunk/be/src/udf_samples/uda-sample.cc repo It would be nice to have impala-udf-samples updated with intermediate types.
... View more
09-13-2017
11:06 AM
I am writing a UDA in Impala and have gone through the documentation I am not able to understand clearly the way memory is to be allocated in Impala UDAs. In udf.h (links to source) , its mentioned, For allocations that are not returned to Impala, the UDA should use
the FunctionContext::Allocate()/Free() methods.
For StringVal allocations returned to Impala (e.g. returned by UdaSerialize()), the UDA should allocate the result via StringVal(FunctionContext*, int) ctor or the function StringVal::CopyFrom(FunctionContext*, const uint8_t*, size_t) I'm looking at the StringConcat example in the samples and am confused by the two variants of the StringConcat example. At one place (links to source) , Allocate and Free are used, void StringConcatUpdate(FunctionContext* context, const StringVal& str,
const StringVal& separator, StringVal* result) {
...
uint8_t* copy = context->Allocate(str.len);
...
} whereas at another place (links to source) StringVal::CopyFrom and StringVal constructor are used. void StringConcatUpdate(FunctionContext* context, const StringVal& arg1,
const StringVal& arg2, StringVal* val) {
...
*val = StringVal::CopyFrom(context, arg1.ptr, arg1.len);
...
StringVal new_val(context, new_len);
...
} Both of these are exactly the same example ! As per the comment in udf.h, Allocate/Free is to be used when allocations are not returned to Impala. The Update function doesnt return allocation to Impala. The second example which i pointed to uses StringVal method whereas the first one uses Allocate/Free. Question 1) Can Allocate/Free and the constructor method be used interchangably ? Which of these 2 examples is correct from a memory allocation perspective ? There is another function - "Avg" implemented differently w.r.t memory allocations. In the first example, Allocate is used void AvgInit(FunctionContext* context, StringVal* val) {
val->is_null = false;
val->len = sizeof(AvgStruct);
val->ptr = context->Allocate(val->len);
memset(val->ptr, 0, val->len);
} whereas in the second example nothing is used to allocate memory (not even constructor). void AvgInit(FunctionContext* context, BufferVal* val) {
static_assert(sizeof(AvgStruct) == 16, "AvgStruct is an unexpected size");
memset(*val, 0, sizeof(AvgStruct));
} Question 2) How does this one work without any memory allocation ?
... View more
Labels:
- Labels:
-
Apache Impala
09-11-2017
08:17 AM
It has been a long time since this question was answered. Has there been further recommendation by Cloudera on this ? Does cloudera support STRING data type to be used to store BINARY data ? or its something that works today and may stop working tomorrow ?
... View more