About mbigelow

JB0000000000001 · ‎01-26-2018

I have the same issue. I am following both the documentation in https://www.bmc.com/blogs/how-to-write-a-hive-user-defined-function-udf-in-java/ and the link mentioned in previous post: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_mc_hive_udf.html . These are the steps I have taken: 1) The goal is to create a temporary user defined function FNV.java. I have put in dir /src/main/java/com/company/hive/udf/FNV.java the following code: package com.company.hive.udf; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.io.Text; import java.math.BigInteger; public final class FNV extends UDF{ <...all tha java code...> } 2)I have added the 2 required JARS for the imports to the CLASSPATH, compiled, build a jar out of this: /src/main/java/com/company/hive/udf/FNV.jar. This is present on the host where the hive metastore and hiveserver2 is running. I check with jar tvf FNV.jar and see that my class src/main/java/com/company/hive/udf/FNV.class is present 3)I put the FNV.jar file on hdfs and did a chown hive:hive and a chmod with full 777 rights 4)I changed the configuration for 'Hive Auxiliary JARs Directory' in Hive to the path of the jar: /src/main/java/com/company/hive/udf/ 5)I redeployed the client config and restarted hive. Here I notice that the 2nd hiveserver (on a different node-not where the JAR is located) has trouble restarted. The host with the hive metastore, hiveserver2 and the jar is up and running. 6) I granted access to the hdfs location and the file on the local host to a role called 'hive_jar'. This is done by logging into beeline !connect jdbc:hive2://node009.cluster.local:10000/default GRANT ALL ON URI 'file:///src/main/java/com/company/hive/udf/FNV.jar' TO ROLE HIVE_JAR; GRANT ALL ON URI 'hdfs:///user/name/FNV.jar' TO ROLE HIVE_JAR; I do notice that SHOW CURRENT ROLES in beeline for the hive user does give the HIVE_JAR role as wanted. 7)I start hive and add the jar using the local hosts's path: add jar /src/main/java/com/company/hive/udf/FNV.jar; I check with list jars that the jar is present 😎 In the same session I try to create the temporary function: create temporary function FNV as 'com.company.hive.udf.FNV'; I keep on getting error : FAILED: Class com.company.hive.udf.FNV not found FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask Any clue what I am missing?? THanks for feedback!

maziyar · ‎01-22-2018

Unfortunately this happened again in 5.13.1 as I did "Update Hive Metastore NameNodes" and it added the port twice.

abbdelben · ‎01-22-2018

Hello everybody I have the same problem is what you have solved your problem Any sugestion plz can you share your solution plz @MSharma

alpertankut · ‎01-10-2018

that means you have many services running on your server so your server doesn't provide the memory request to Java. I don't know your clusters setup, but it is possible many services are running that specific server. Try to reduce the running services.

David M. · ‎01-09-2018

Since this is an external table (EXTERNAL_TABLE), Hive will not keep any stats on the table since it is assumed that another application is changing the underlying data at will. Why keep stats if we can't trust that the data will be the same in another 5 minutes? For a managed (non-external) table, data is manipulated through Hive SQL statements (LOAD DATA, INSERT, etc.) so the Hive system will know about any changes to the underlying data and can update the stats accordingly. Using the HDFS utilities to check the directory file sizes will give you the most accurate answer.

David M. · ‎01-09-2018

If performing an ADD JAR statement in the HQL file, please reconsider and install the JAR into HiveServer2 as a permanent UDF. https://www.cloudera.com/documentation/enterprise/5-12-x/topics/cm_mc_hive_udf.html https://issues.apache.org/jira/browse/HADOOP-13809 https://issues.apache.org/jira/browse/HIVE-11681

bgooley · ‎01-03-2018

@DataYogi, IPv4 is a function of your OS networking, so that is a matter for your host and network configuration. My point is that if you are unfamiliar with how database servers and other servers interact over IPv6, perhaps it would be best to only use IPv4 for now. See the following postgres information regarding addresses (including IPv6) https://www.postgresql.org/docs/9.3/static/auth-pg-hba-conf.html It appears you were missing the /64 subnet portion of the IP as your interface shows: inet6 addr: 2402:1f00:8001:281::/64 Scope:Global I believe either of the following in the pg_hba.conf file would allow access from that one host: host hue hue 2402:1f00:8001:281::/64 md5 or host hue hue 2402:1f00:8001:281::/128 md5 Unless you need to restrict access, you can add lines to allow access from any host that is IPv6 with the following: host hue hue ::0/0 md5 NOTE: Make sure you ensure there are no servers connecting to the embedded postgres database and restart from the command line with "service cloudera-scm-server-db restart" after making any changes to ensure they took effect.

David M. · ‎12-19-2017

The way things are implemented, a MapJoin optimization will always use local task operation. If you would like to remove all instances of local tasks, you will have to disable MapJoins. Please examine these two explain plans (first with MapJoin enabled, second with disabled) | STAGE PLANS: | | Stage: Stage-5 | | Map Reduce Local Work | | Alias -> Map Local Tables: | | s07 | | Fetch Operator | | limit: -1 | | Alias -> Map Local Operator Tree: | | s07 | | TableScan | | alias: s07 | | filterExpr: code is not null (type: boolean) | | Statistics: Num rows: 225 Data size: 46055 Basic stats: COMPLETE Column stats: NONE | | Filter Operator | | predicate: code is not null (type: boolean) | | Statistics: Num rows: 113 Data size: 23129 Basic stats: COMPLETE Column stats: NONE | | HashTable Sink Operator | | keys: | | 0 code (type: string) | | 1 code (type: string) | | STAGE PLANS: | | Stage: Stage-1 | | Map Reduce | | Map Operator Tree: | | TableScan | | alias: s07 | | filterExpr: code is not null (type: boolean) | | Statistics: Num rows: 225 Data size: 46055 Basic stats: COMPLETE Column stats: NONE | | Filter Operator | | predicate: code is not null (type: boolean) | | Statistics: Num rows: 113 Data size: 23129 Basic stats: COMPLETE Column stats: NONE | | Reduce Output Operator | | key expressions: code (type: string) | | sort order: + | | Map-reduce partition columns: code (type: string) | | Statistics: Num rows: 113 Data size: 23129 Basic stats: COMPLETE Column stats: NONE | | value expressions: description (type: string), salary (type: int) | | TableScan | | alias: s08 | | filterExpr: code is not null (type: boolean) | | Statistics: Num rows: 442 Data size: 46069 Basic stats: COMPLETE Column stats: NONE | | Filter Operator | | predicate: code is not null (type: boolean) | | Statistics: Num rows: 221 Data size: 23034 Basic stats: COMPLETE Column stats: NONE | | Reduce Output Operator | | key expressions: code (type: string) | | sort order: + | | Map-reduce partition columns: code (type: string) | | Statistics: Num rows: 221 Data size: 23034 Basic stats: COMPLETE Column stats: NONE | | value expressions: salary (type: int) | We can see that the first one uses "Map Reduce Local Work" and the second one does not. set hive.auto.convert.join=false; https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties This can be important becaue I'm seeing a case where the Local Job Runners are leaking the log file output from these Local Job Runners into the HS2's /tmp directory in the following format: /tmp/hive_20171219184242_3ecaf468-51c7-4ced-99b3-6bd9eaaa980a.log Disable the MapJoin optimization and these log files are not generated.

Awisawe · ‎11-10-2017

Hello Basically the desktop.middleware is missing. To check it run below commands: 1. $ hue-3.9.0/build/env/bin/python In the Python prompt type below commands > import sys > 'desktop.middleware' in sys.modules (The output for this would be False) > import desktop.middleware > desktop.middleware' in sys.modules (The output for this would be True if the lib exists)

Fawze · ‎11-05-2017

@mbigelow Hi mbigelow, I tried to use LIKE in the CM API with now success: I ahve one like this: curl -u 'xxxx':'xxxx' 'http://CM_server.domain.com:7180/api/v11/clusters/cluster/services/impala/impalaQueries?from=2017-10-10T00:00:00&to2017-10-11T00:00:00&limit=1000&filter=statement RLIKE ".*fawzea.*"' >>f.json Can you help

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: SemanticException Error retrieving udf

Re: [CDH 5.10 upgrade] Wrong FS Hive tables

Re: java.lang.IllegalArgumentException: Illegal pr...

Re: Error with memory

Re: Can we check size of Hive tables? If so - how?

Re: java.lang.IllegalStateException(zip file close...

Re: Hue: The Cloudera Manager Agent is not able to...

Re: HiveServer2 - disable local task execution

Re: Unable to access Impala app from HUE Web UI

Re: How to filter based on query/statement in CM