Member since
10-22-2016
28
Posts
5
Kudos Received
0
Solutions
05-04-2018
02:32 PM
Further digging around in the hive source code I have found: https://github.com/apache/hive/commit/8ce0118ffe517f0c622571778251cbd9f760c4f5#diff-a0e344e574e0fe542ad8a98e64c967cf in particular https://github.com/apache/hive/blob/1eea5a80ded2df33d57b2296b3bed98cb18383fd/ql/src/test/queries/clientpositive/reloadJar.q leads me to believe that hfs should be supported. --! qt:dataset:src
dfs -mkdir ${system:test.tmp.dir}/aux;
dfs -cp ${system:hive.root}/data/files/identity_udf.jar ${system:test.tmp.dir}/aux/udfexample.jar;
SET hive.reloadable.aux.jars.path=${system:test.tmp.dir}/aux;
RELOAD;
CREATE TEMPORARY FUNCTION example_iden AS 'IdentityStringUDF';
EXPLAIN
SELECT example_iden(key)
FROM src LIMIT 1;
SELECT example_iden(key)
FROM src LIMIT 1;
DROP TEMPORARY FUNCTION example_iden;
dfs -rm -r ${system:test.tmp.dir}/aux;
EDIT It appears, that CREATE TEMPORARY FUNCTION example_iden AS 'IdentityStringUDF'; throws a Warning of : WARN. . permanent functions created without USIJNG clause will not be replicated so I assume the USING /path/to/jar.jar is mandatory for permanent UDFs even when reloadable flag is set.
... View more
05-03-2018
05:04 PM
Hi,I want to update Hive UDFs without requiring a restart of hive. According to:
https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cm_mc_hive_udf.html#concept_zb2_rxr_lw
setting hive.reloadable.aux.jars.path
is required. I have set it to
/user/hive/libs/udf
(which resides on HDFS). However following their documentation I see:
file:///usr/lib/hive/lib/foo.jar
which is confusing me. Does this property only work for files residing on the local file system? Do I understand correctly. that I should execute beelines reload manually? Also in case this property works for HDFS does it automatically pick up the classes in the jar (load them) and no longer requires to specify the CREATE FUNCTION foo AS 'my/path/to/jar-1.jar'?
Desired behaviour:
1. copy jar to HDFS
/user/hive/lib/udf/foo-1.jar
2. add function to hive:
DROP FUNCTION IF EXISTS foo;
CREATE FUNCTION foo AS 'my.class.path.in.jar.FooUDF' USING JAR '/user/hive/lib/udf/foo-1.jar';
3. add a new jar to HDFS
/user/hive/lib/udf/foo-2.jar
4. update function in hive: DROP FUNCTION IF EXISTS foo;
CREATE FUNCTION foo AS 'my.class.path.in.jar.FooUDF' USING JAR '/user/hive/lib/udf/foo-2.jar'; This currently does not work and requires a restart of hive. It results in round robin seeing the updated UDF (or still the old one). How can I get hive to not require a restart when updating UDF? Also I do not want to put the UDF locally into a directory. It should reside on HDFS. Best,
Georg
... View more
Labels:
- Labels:
-
Apache Hive
04-16-2018
07:21 AM
confirmed that set to false, also Maximum Total Concurrent Queries > 0. The problem seems to have resolved itself after restarting a couple of times.
... View more
04-12-2018
08:18 AM
When executing the following query on LLAP: WITH first as (SELECT 1 as a, 1 as b) SELECT * FROM first as f JOIN first as f2 ON f.a=f2.a Hive fails with: IllegalArgumentException: max parallelism must be positive, currently is 0
... View more
Labels:
- Labels:
-
Apache Hive
02-19-2018
08:44 AM
I face problems with a hive udf and java dependency hell: 2018-02-19 10:11:49,328 [ERROR] [main] |app.DAGAppMaster|: Error starting DAGAppMaster
java.lang.VerifyError: Bad return type
Exception Details:
Location:
org/apache/hadoop/hdfs/DFSClient.getQuotaUsage(Ljava/lang/String;)Lorg/apache/hadoop/fs/QuotaUsage; @94: areturn
Reason:
Type 'org/apache/hadoop/fs/ContentSummary' (current frame, stack[0]) is not assignable to 'org/apache/hadoop/fs/QuotaUsage' (from method signature) Obviously, somehow dependencies are clashing with wrong versions.
When looking at the jar via jar tf myjar.jar | grep hdfs no contents are returned. The same for fd This seems strange to me as the exception clearly states that these should be involved in the problem.
The error occurs only on Tez, not on regular hive.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
07-05-2017
01:59 PM
I want to use sparks JDBC connection to write a data frame to oracle. My data frame has a string column which is very long. At least longer than the default 255 characters the spark will allocate when creating the schema.
How can I still write to the oracle table? Does it work when I manually create the schema first with CLOB datatypes? If yes, how can I get spark to only `TRUNCATE` instead of overwrite the table?
... View more
Labels:
- Labels:
-
Apache Spark
04-24-2017
05:38 PM
Thanks a lot for the great tutorial. How could this be extended to not only listen to a web socket, but rather periodically send control commands like: https://blockchain.info/api/api_websocket for example `{"op":"unconfirmed_sub"}`?
... View more
04-07-2017
08:00 PM
I see some hdp will only ship default spark? I mean it is just a compile switch for spark. Having that would be nice. Can I just replace the hdp binaries?
... View more
04-07-2017
04:52 PM
Well, at least spark has the option to include the machine learning options. I am just interested if hdp is deploying these optimizations or if manual work is required.
... View more
04-06-2017
05:18 PM
1 Kudo
Hi, is there any possibility in the newly released HDP 2.6 to install NIFI in cluster mode via ambari?
... View more
Labels:
- Labels:
-
Cloudera DataFlow (CDF)
04-04-2017
02:33 PM
There is a similar question: https://community.hortonworks.com/questions/52359/netlib-java-and-anaconda-for-spark-ml.html without netlike lgpl answered.
... View more
04-04-2017
02:32 PM
Hi, I wonder if the spark version provided by HDP will actually provide http://spark.apache.org/docs/latest/ml-guide.html mentioned optimizations compiled in: MLlib uses the linear algebra package Breeze, which depends on netlib-java for optimised numerical processing. If native libraries1 are not available at runtime, you will see a warning message and a pure JVM implementation will be used instead.
... View more
Labels:
- Labels:
-
Apache Spark
03-27-2017
10:39 AM
How can I batch read data from a source (lets say CSV or JDBC) into NIFI to query a webservice? I want to read only every k-th row (lets say 10th row) and read 50.000 records into a single batch. However, I must keep a state, i.e. know which records have already been processed by NIFI.
... View more
Labels:
- Labels:
-
Apache NiFi
02-20-2017
09:32 AM
Where is the environment variable $ACCUMULO_HOME defined for HDP 2.5.3? In the config I can only find references to this variable for further directories but not the actual definition. In /usr/hdp/current/ There no longer is a single accumulo folder, rather I see several folders (accumulo-master, accumulo-gc, accumulo-tracer, ...)
... View more
Labels:
02-16-2017
08:04 AM
Great. Do you know when 2.5.4 should be released?
... View more
02-15-2017
06:24 AM
Ambari will support the installation of spark 2.0.0 as a technical preview. This version of spark contains a lot of bugs which are fixed in 2.0.2 or 2.1.0 how can I install any of these versions via ambari?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Spark
02-14-2017
01:23 PM
Is it possible to export a blueprint of the current cluster configuration?
... View more
02-14-2017
01:03 PM
2 Kudos
Initially, I deployed a blueprint to ambari.Having used the nice ambari UI to create some configuration changes I would like to know how to export the current cluster configuration as a blueprint. If this is not possible, how can I access the "default" configuration to know the config values which need to be passed in a blueprint.
... View more
Labels:
- Labels:
-
Apache Ambari
02-13-2017
07:06 PM
I used``` "policymgr_external_url" : "https://{% if isSingleNode %} {{ groups[cluster_name+'_mn01'][0] }} {% else %} {{ groups[cluster_name+'_mn03'][0] }} {% endif %}:6182",
{% else %}
"policymgr_external_url" : "http://{% if isSingleNode %} {{ groups[cluster_name+'_mn01'][0] }} {% else %} {{ groups[cluster_name+'_mn03'][0] }}{% endif %}:6080",
{% endif %} ``` to create the JSON. It is adopted from https://github.com/bushnoh/ansible-hadoop-asap/blob/master/blueprints/bare_cluster.bp.j2 When removing the spaces inn all places ineed, finally ambari successfully installs ranger.
... View more
02-13-2017
05:27 PM
When installing Ranger via a blueprint I get the following exception: 2017-02-13 18:22:10,754 [E] create_dbversion_catalog.sql file import failed!
2017-02-13 18:22:40,763 [JISQL] /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64/bin/java -cp /usr/hdp/current/ranger-admin/ews/lib/mysql-connector-java.jar:/usr/hdp/current/ranger-admin/jisql/lib/* org.apache.util.sql.Jisql -driver mysqlconj -cstring jdbc:mysql://mn01.vagrant :3306/ranger -u 'ranger' -p '********' -noheader -trim -c \; -query "show tables like 'x_db_version_h';"
SQLException : SQL state: 3D000 java.sql.SQLException: No database selected ErrorCode: 1046
SQLException : SQL state: 3D000 java.sql.SQLException: No database selected ErrorCode: 1046
2017-02-13 18:22:41,149 [I] Table x_db_version_h does not exist in database ranger
2017-02-13 18:22:41,149 [JISQL] /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64/bin/java -cp /usr/hdp/current/ranger-admin/ews/lib/mysql-connector-java.jar:/usr/hdp/current/ranger-admin/jisql/lib/* org.apache.util.sql.Jisql -driver mysqlconj -cstring jdbc:mysql://mn01.vagrant :3306/ranger -u 'ranger' -p '********' -noheader -trim -c \; -input /usr/hdp/current/ranger-admin/db/mysql/create_dbversion_catalog.sql
Error executing: create table if not exists x_db_version_h ( id bigint not null auto_increment primary key, version varchar(64) not null, inst_at timestamp not null default current_timestamp, inst_by varchar(256) not null, updated_at timestamp null default null, updated_by varchar(256) not null, active ENUM('Y', 'N') default 'Y' ) ;
java.sql.SQLException: No database selected
SQLException : SQL state: 3D000 java.sql.SQLException: No database selected ErrorCode: 1046
2017-02-13 18:22:41,556 [E] create_dbversion_catalog.sql file import failed! The blueprint's json can be found here: https://gist.github.com/geoHeil/bbe4eb9cef4f4e6c2feca743f2b19bc8 The complete input json is found here https://gist.github.com/geoHeil/b54181d35c0d4549c0da25465cc93e29 and the full output txt https://gist.github.com/geoHeil/6b3d08d748e03703b35ddce424b108c1
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Ranger
02-08-2017
11:47 AM
@Ashnee Sharma is there any documentation how to perform this task?
... View more
02-08-2017
07:28 AM
Can I use ambari to install the very latest and for hdp still unsupported versions e.g. Of spark or accumulo? I don't mean the technical previews here but rather the current version of these services.
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Spark
02-02-2017
01:37 PM
1 Kudo
I recently read this news announcement http://hortonworks.com/partner/syncsort/ and am wondering why sync sort is proposed instead of nifi.
... View more
Labels:
- Labels:
-
Apache NiFi