Member since
10-22-2016
28
Posts
5
Kudos Received
0
Solutions
05-04-2018
02:32 PM
Further digging around in the hive source code I have found: https://github.com/apache/hive/commit/8ce0118ffe517f0c622571778251cbd9f760c4f5#diff-a0e344e574e0fe542ad8a98e64c967cf in particular https://github.com/apache/hive/blob/1eea5a80ded2df33d57b2296b3bed98cb18383fd/ql/src/test/queries/clientpositive/reloadJar.q leads me to believe that hfs should be supported. --! qt:dataset:src
dfs -mkdir ${system:test.tmp.dir}/aux;
dfs -cp ${system:hive.root}/data/files/identity_udf.jar ${system:test.tmp.dir}/aux/udfexample.jar;
SET hive.reloadable.aux.jars.path=${system:test.tmp.dir}/aux;
RELOAD;
CREATE TEMPORARY FUNCTION example_iden AS 'IdentityStringUDF';
EXPLAIN
SELECT example_iden(key)
FROM src LIMIT 1;
SELECT example_iden(key)
FROM src LIMIT 1;
DROP TEMPORARY FUNCTION example_iden;
dfs -rm -r ${system:test.tmp.dir}/aux;
EDIT It appears, that CREATE TEMPORARY FUNCTION example_iden AS 'IdentityStringUDF'; throws a Warning of : WARN. . permanent functions created without USIJNG clause will not be replicated so I assume the USING /path/to/jar.jar is mandatory for permanent UDFs even when reloadable flag is set.
... View more
05-03-2018
05:04 PM
Hi,I want to update Hive UDFs without requiring a restart of hive. According to:
https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cm_mc_hive_udf.html#concept_zb2_rxr_lw
setting hive.reloadable.aux.jars.path
is required. I have set it to
/user/hive/libs/udf
(which resides on HDFS). However following their documentation I see:
file:///usr/lib/hive/lib/foo.jar
which is confusing me. Does this property only work for files residing on the local file system? Do I understand correctly. that I should execute beelines reload manually? Also in case this property works for HDFS does it automatically pick up the classes in the jar (load them) and no longer requires to specify the CREATE FUNCTION foo AS 'my/path/to/jar-1.jar'?
Desired behaviour:
1. copy jar to HDFS
/user/hive/lib/udf/foo-1.jar
2. add function to hive:
DROP FUNCTION IF EXISTS foo;
CREATE FUNCTION foo AS 'my.class.path.in.jar.FooUDF' USING JAR '/user/hive/lib/udf/foo-1.jar';
3. add a new jar to HDFS
/user/hive/lib/udf/foo-2.jar
4. update function in hive: DROP FUNCTION IF EXISTS foo;
CREATE FUNCTION foo AS 'my.class.path.in.jar.FooUDF' USING JAR '/user/hive/lib/udf/foo-2.jar'; This currently does not work and requires a restart of hive. It results in round robin seeing the updated UDF (or still the old one). How can I get hive to not require a restart when updating UDF? Also I do not want to put the UDF locally into a directory. It should reside on HDFS. Best,
Georg
... View more
Labels:
- Labels:
-
Apache Hive
04-16-2018
07:21 AM
confirmed that set to false, also Maximum Total Concurrent Queries > 0. The problem seems to have resolved itself after restarting a couple of times.
... View more
04-12-2018
08:18 AM
When executing the following query on LLAP: WITH first as (SELECT 1 as a, 1 as b) SELECT * FROM first as f JOIN first as f2 ON f.a=f2.a Hive fails with: IllegalArgumentException: max parallelism must be positive, currently is 0
... View more
Labels:
- Labels:
-
Apache Hive
02-19-2018
08:44 AM
I face problems with a hive udf and java dependency hell: 2018-02-19 10:11:49,328 [ERROR] [main] |app.DAGAppMaster|: Error starting DAGAppMaster
java.lang.VerifyError: Bad return type
Exception Details:
Location:
org/apache/hadoop/hdfs/DFSClient.getQuotaUsage(Ljava/lang/String;)Lorg/apache/hadoop/fs/QuotaUsage; @94: areturn
Reason:
Type 'org/apache/hadoop/fs/ContentSummary' (current frame, stack[0]) is not assignable to 'org/apache/hadoop/fs/QuotaUsage' (from method signature) Obviously, somehow dependencies are clashing with wrong versions.
When looking at the jar via jar tf myjar.jar | grep hdfs no contents are returned. The same for fd This seems strange to me as the exception clearly states that these should be involved in the problem.
The error occurs only on Tez, not on regular hive.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
07-05-2017
01:59 PM
I want to use sparks JDBC connection to write a data frame to oracle. My data frame has a string column which is very long. At least longer than the default 255 characters the spark will allocate when creating the schema.
How can I still write to the oracle table? Does it work when I manually create the schema first with CLOB datatypes? If yes, how can I get spark to only `TRUNCATE` instead of overwrite the table?
... View more
Labels:
- Labels:
-
Apache Spark
04-24-2017
05:38 PM
Thanks a lot for the great tutorial. How could this be extended to not only listen to a web socket, but rather periodically send control commands like: https://blockchain.info/api/api_websocket for example `{"op":"unconfirmed_sub"}`?
... View more
04-07-2017
08:00 PM
I see some hdp will only ship default spark? I mean it is just a compile switch for spark. Having that would be nice. Can I just replace the hdp binaries?
... View more
04-07-2017
04:52 PM
Well, at least spark has the option to include the machine learning options. I am just interested if hdp is deploying these optimizations or if manual work is required.
... View more
04-06-2017
05:18 PM
1 Kudo
Hi, is there any possibility in the newly released HDP 2.6 to install NIFI in cluster mode via ambari?
... View more
Labels:
- Labels:
-
Cloudera DataFlow (CDF)