Created 01-23-2017 10:03 AM
We recently installed and configured CDH 5.9.0 on 4 high memory linux CentOS nodes on Google's Compute Cloud. For the most part, CDH ( impala & hive) are working as expected, with the exception of UDFs.
Following the instructions here (as much as possible):
http://www.cloudera.com/documentation/enterprise/latest/topics/impala_udf.html
Successfully created UDF (as per section: Using Hive UDFs with Impala)
>> create function udfs.myDayOfMonth(string) returns string location '/tmp/hive-udf2.jar' symbol='org.apache.hadoop.hive.ql.udf.UDFDayOfMonth';
However, when I attempt:
>> select udfs.myDayOfMonth("2015-03-05");
Built in function work fine:
>> selectDayOfMonth("2015-03-05");
return
error:
Before this, I was attempting to create/user a custom C++ UDF that worked fine in CDH 5.4.7 created as:
>> create function if not exists default.getcleanurl (string) returns string location '/user/impala/udfs/libudfibi.so' symbol='GetCleanUrl';
>> select default.getcleanurl(' hTtp://www.investopedia.com/a-BB-c/file.asp-more/stuff') as clearurl;
however, it does the following:
takes 30 seconds to return this error code:
hadp-inv-ibi-a.c.investopedia-1062.internal is 1 of my 3 worker nodes
Any suggestions? Doesn't seem like these errors are related except that they are both UDF related.
Created 01-23-2017 03:02 PM
Hi Gord,
The problem with the Java UDF is probably the return type - it's declared as "returns string" but the Java functions all return an IntWritable.
The C++ UDF problem looks like it may be crashing Impala somehow. Are you able to share the definition of the UDF?
Thanks,
Tim
Created on 01-23-2017 06:50 PM - edited 01-23-2017 07:23 PM
Thanks Tim - I overlooked that return type when trying a different function other than the example. However, when I tried to execute the following impala query:
create function udfs.myDayOfMonth(string) returns IntWritable location '/tmp/hive-udf2.jar' symbol='org.apache.hadoop.hive.ql.udf.UDFDayOfMonth';
It would not run and returned athe following error:
AnalysisException: Syntax error in line 1: ...ayOfMonth(string) returns IntWritable location '/tmp/h... ^ Encountered: IDENTIFIER Expected: ARRAY, BIGINT, BINARY, BOOLEAN, CHAR, DATE, DATETIME, DECIMAL, REAL, FLOAT, INTEGER, MAP, SMALLINT, STRING, STRUCT, TIMESTAMP, TINYINT, VARCHAR CAUSED BY: Exception: Syntax error
The example on the page I was following makes no mention of using only IntWritable return types. But the page does say:
Here is our custom function code:
// Copyright 2012 Cloudera Inc. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. #include "udf-ibi.h" #include <cctype> #include <cmath> #include <string> #include <algorithm> #include <vector> #include <sstream> // trim from start static inline std::string <rim(std::string &s) { s.erase(s.begin(), std::find_if(s.begin(), s.end(), std::not1(std::ptr_fun<int, int>(std::isspace)))); return s; } // trim from end static inline std::string &rtrim(std::string &s) { s.erase(std::find_if(s.rbegin(), s.rend(), std::not1(std::ptr_fun<int, int>(std::isspace))).base(), s.end()); return s; } // trim from both ends static inline std::string &trim(std::string &s) { return ltrim(rtrim(s)); } // replace strings void replace_all(std::string& str, const std::string& from, const std::string& to) { if (from.empty()) return; size_t start_pos = 0; while ((start_pos = str.find(from, start_pos)) != std::string::npos) { str.replace(start_pos, from.length(), to); start_pos += to.length(); } } // split string by character into vector std::vector<std::string> split_string(std::string str, char delimiter) { std::vector<std::string> internal; std::stringstream ss(str); std::string tok; while (std::getline(ss, tok, delimiter)) { internal.push_back(tok); } return internal; } // clean a URL to allow consistent joining across data sources StringVal GetCleanUrl(FunctionContext* context, const StringVal& arg1) { if (arg1.is_null) return StringVal::null(); std::string raw_url((const char *)arg1.ptr, arg1.len); std::string clean_url(""); // trim whitespace clean_url = trim(raw_url); // lower case std::transform(clean_url.begin(), clean_url.end(), clean_url.begin(), ::tolower); // remove the domain (only keep relative paths) int find_domain = clean_url.find("://"); if (find_domain != std::string::npos) { int next_slash = clean_url.find("/", find_domain + 3); if (next_slash != std::string::npos) { clean_url = clean_url.substr(next_slash, clean_url.length() - next_slash); } } // remove text after: ?&# clean_url = clean_url.substr(0, clean_url.find("?", 0)); clean_url = clean_url.substr(0, clean_url.find("&", 0)); clean_url = clean_url.substr(0, clean_url.find("#", 0)); // ensure starting slash clean_url = "/" + clean_url; // ensure trailing slash if folder path std::vector<std::string> url_parts = split_string(clean_url, '/'); std::string last_url_part = url_parts.back(); if (last_url_part.find(".") == std::string::npos) { clean_url = clean_url + "/"; } // remove all duplicate slashes while (clean_url.find("//") != std::string::npos) { replace_all(clean_url, "//", "/"); } // The modified string is stored in 'clean_url', which is destroyed when this function // ends. We need to make a string val and copy the contents. // NB: Only the version of the actor that takes a context object allocates new memory. StringVal result(context, clean_url.size()); memcpy(result.ptr, clean_url.c_str(), clean_url.size()); return result; } // Return part of the URL (scheme, domain, path, query, fragment) StringVal GetUrlPart(FunctionContext* context, const StringVal& arg1, const StringVal& arg2) { if (arg1.is_null) return StringVal::null(); if (arg2.is_null) return StringVal::null(); // parse input params std::string raw_url((const char *)arg1.ptr, arg1.len); std::string part_name((const char *)arg2.ptr, arg2.len); // declare variables for parts std::string url_scheme(""); std::string url_path(""); std::string url_domain(""); std::string url_query(""); std::string url_fragment(""); // get the querystring int pos_querystring = raw_url.find("?"); if (pos_querystring != std::string::npos) { url_query = raw_url.substr(pos_querystring + 1, raw_url.length() - pos_querystring - 1); raw_url = raw_url.substr(0, pos_querystring); } // get the fragment (from querystring is not empty otherwise from the url) if (!url_query.empty()) { int pos_fragment = url_query.find("#"); if (pos_fragment != std::string::npos) { url_fragment = url_query.substr(pos_fragment + 1, url_query.length() - pos_fragment - 1); url_query = url_query.substr(0, pos_fragment); } } else { int pos_fragment = raw_url.find("#"); if (pos_fragment != std::string::npos) { url_fragment = raw_url.substr(pos_fragment + 1, raw_url.length() - pos_fragment - 1); raw_url = raw_url.substr(0, pos_fragment); } } // get the scheme int pos_scheme = raw_url.find("://"); if (pos_scheme != std::string::npos) { url_scheme = raw_url.substr(0, pos_scheme); raw_url = raw_url.substr(pos_scheme + 3, raw_url.length() - pos_scheme - 3); // get the domain int pos_slash = raw_url.find("/"); if (pos_slash != std::string::npos) { url_domain = raw_url.substr(0, pos_slash); raw_url = raw_url.substr(pos_slash + 1, raw_url.length() - pos_scheme - 1); } } // get the path url_path = raw_url; // return part name part_name = trim(part_name); std::transform(part_name.begin(), part_name.end(), part_name.begin(), ::tolower); if (part_name.compare("scheme") == 0) { StringVal result(context, url_scheme.size()); memcpy(result.ptr, url_scheme.c_str(), url_scheme.size()); return result; } else if (part_name.compare("domain") == 0) { StringVal result(context, url_domain.size()); memcpy(result.ptr, url_domain.c_str(), url_domain.size()); return result; } else if (part_name.compare("path") == 0) { StringVal result(context, url_path.size()); memcpy(result.ptr, url_path.c_str(), url_path.size()); return result; } else if (part_name.compare("query") == 0) { StringVal result(context, url_query.size()); memcpy(result.ptr, url_query.c_str(), url_query.size()); return result; } else if (part_name.compare("fragment") == 0) { StringVal result(context, url_fragment.size()); memcpy(result.ptr, url_fragment.c_str(), url_fragment.size()); return result; } else return StringVal::null(); }
Created on 01-27-2017 07:51 AM - edited 01-27-2017 08:26 AM
Hi Tim. I work with Gord and have offered to take over this issue as he's busy with other things at the moment.
In summary, we are trying to get past one linker error trying to compile a C++ UDF function.
First, I just wanted to clarify a few things:
opt/rh/devtoolset-2/root/usr/libexec/gcc/x86_64-redhat-linux/4.8.2/ld: /usr/lib/../lib64/libImpalaUdf.a(udf.cc.o)(.text+0x3): unresolvable R_X86_64_NONE relocation against symbol `_ZNSs4_Rep20_S_empty_rep_storageE@@GLIBCXX_3.4' /opt/rh/devtoolset-2/root/usr/libexec/gcc/x86_64-redhat-linux/4.8.2/ld: final link failed: Nonrepresentable section on output
Here are the steps:
vagrant init kaorimatz/centos-6.8-x86_64; vagrant up --provider virtualbox
sudo yum install wget sudo wget http://people.centos.org/tru/devtools-2/devtools-2.repo -O /etc/yum.repos.d/devtools-2.repo sudo yum upgrade sudo yum install devtoolset-2-gcc devtoolset-2-binutils devtoolset-2-gcc-c++
sudo yum install cmake boost-devel
sudo wget http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/cloudera-cdh5.repo -O /etc/yum.repos.d/cloudera-cdh5.repo sudo yum install impala-udf-devel
#old: SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g") #new: SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -ggdb -std=c++11")
scl enable devtoolset-2 bash
cmake . make
opt/rh/devtoolset-2/root/usr/libexec/gcc/x86_64-redhat-linux/4.8.2/ld: /usr/lib/../lib64/libImpalaUdf.a(udf.cc.o)(.text+0x3): unresolvable R_X86_64_NONE relocation against symbol `_ZNSs4_Rep20_S_empty_rep_storageE@@GLIBCXX_3.4' /opt/rh/devtoolset-2/root/usr/libexec/gcc/x86_64-redhat-linux/4.8.2/ld: final link failed: Nonrepresentable section on output
Again, this is on a fresh CentOS6.8 VM with the Impala UDF sample code, so hopefully it's easy for anyone to reproduce this issue. Thanks!
Update: I have just tried this on a CentOS v7.3.1611 VM using:
vagrant init centos/7; vagrant up --provider virtualbox
and installing the standard C++ compiler via:
sudo yum install gcc-c++ cmake boost-devel
I get the same linker error there.
Created 02-08-2017 03:06 PM
Sorry for the slow reply. It looks like we made a mistake in including the "noexcept" specifiers in the UDF SDK when we switched to building Impala with C++11 support. The UDF SDK should be still built with C++11 disabled.
If you use an older udf.h or manually remove the noexcept specifiers from udf.h I think that may solve your problem.
Created on 02-15-2017 05:06 PM - edited 02-15-2017 06:45 PM
Hi Tim. I tried going back to g++ v4.4.7 and commenting out all the 'noexcept' keywords in /usr/include/impala_udf/udf.h.
When running make (against the UDF sample code), I get a bunch more errors.
It's a bit large so I put them in a pastebin: http://pastebin.com/qv9EbS5h
Any other ideas?
Created 02-17-2017 01:22 AM
We are still investigating the linking error. In the meantime, would you mind giving an older version of the UDF SDK a try ? It should be mostly compatible with 5.9.0.
Sorry for the trouble.