Member since
04-06-2016
47
Posts
7
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2458 | 12-02-2016 10:20 PM | |
1324 | 11-23-2016 08:59 PM | |
258 | 07-26-2016 03:11 AM |
10-04-2019
04:19 PM
In case, you get below error, make sure you use Nifi host FQDN in API call and NOT IP address. Also, make sure DNS is configured correctly. <body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /nifi-api/access/kerberos. Reason:
<pre> Unauthorized</pre>
... View more
02-17-2017
12:11 PM
try increasing heap size for metastore. Make sure DB connection is working fine.
... View more
02-16-2017
05:59 AM
@rahul gulati Which version of Ambari? Is your cluster kerberized? Is Ambari SSL enabled? Is this on local Ambari Cluster? Can you share all setting from your FileView
... View more
01-26-2017
04:05 PM
I guess the easiest would be to have a lookup table with all processor name records, then you can use something like below. SELECT b.processor_name, count(*) Cnt from process_types_with_count_table A, processor_lookup_table BWHERE instr(b.processor_name, a.processor_name) > 0 Group by b.processor_name; This is not very efficient as you are joining other table here, but will do the work. Other option is if your processor name has any pattern then you can use regex in hive query to get only processor_name that you want.
... View more
01-26-2017
02:18 PM
is processor name always 4 letters word? If yes, you can use substr to get only first letters from processor name and group by that. If processor name length varies then look for set of character that comes after processor name and eliminate anything after those characters. Here you can use mix of instr and substr hive function.
... View more
12-02-2016
10:20 PM
1 Kudo
@Manish Gupta try adding hive-metastore.jar as well in squirrel jar list.
... View more
11-30-2016
02:18 AM
You can try hive windowing function. Something like below. select tdate,var, max(var) over (order by tdate ROWS between CURRENT ROW AND 30 FOLLOWING ) maxvar from testwindow; You can also include "PARTITION BY" clause if you need to group it by some other column/s. HTH
... View more
11-23-2016
08:59 PM
Depends on what arguments you are providing to the hash function. If your argument values are unique, you would most likely get unique value from hash. Keep in mind hive hash function return int (which is 32bit) so you may see -ve numbers as well. You can use something like reflect('java.util.UUID','randomUUID') to generate uniqueID or comeup with some unique code. I would not suggest using hash function, if you want to generate unique ids.
... View more
11-23-2016
06:06 PM
I assume you already have the jar file that you want to use. You can use "add jar <jarfilename>" in hive session or you can 1. create /usr/hdp/<hdp-version>/hive/auxlib/ on all hiveserver2 nodes, 2. copy your .jar files to the new folder on all hiveserver2 nodes, 3. restart hiveserver2.
... View more
09-19-2016
11:20 PM
Thanks @Constantin Stanca. You won't be able to RENAME hive table if there are queries using that table as it would be locked. So, this won't work. I am looking at some other approaches as well.
... View more
09-15-2016
04:03 PM
2 Kudos
Working on a very common problem of refreshing reference tables in hive using wipe and load methodology. Want to know if there is a way to refresh the data without killing all read locks (queries reading data) or waiting for all read to finish to acquire exclusive lock. Use case is, some queries may run for few hours and the refresh job needs to be run in between. Is there a way, all the running queries will use current snapshot of the data and refresh job should complete without waiting. New queries should get the new data set. These tables are not partitioned. I was thinking about using ACID but it doen't support merge yet or value from other table for UPDATE statement.
... View more
Labels:
08-04-2016
03:29 AM
Good point. So, we can ask every user to submit jobs in a specific queue. There is no other way right?
... View more
08-04-2016
03:16 AM
Assuming most of the users will be running hive queries to access the data. Is it possible to control resource Utilization by different queues in a multi-tenant environment when hive doAs is set to false. Since all the hive queries will be running as hive user and in one queue, does setting up multiple queues to control resource utilization make any sense? I read this http://hortonworks.com/blog/best-practices-for-hive-authorization-using-apache-ranger-in-hdp-2-2/, but this makes me believe set doAs=true and handle authorization through both Ranger Hive+Hdfs policies NOT only Hive policies. And we can't even use "SQL Standard Authorization" if need to setup multiple queues to control resource utilization. Am I missing anything here.
... View more
Labels:
08-03-2016
03:49 AM
Try running set; command. It should display all the values for all the variables in current session.
... View more
07-29-2016
04:27 PM
Hi Upendra, The recommendation is to use VARCHAR and Integer Types (TINYINT, SMALLINT, INT, BIGINT) where ever possible instead of using String. In hive String is treated as VARCHAR(32762). So, if you have data that is not more than 50 characters in length, using string here will have some overhead. Same for integer types. Hope this helps.
... View more
07-29-2016
04:11 PM
you can go in users list and filter on the group, it will only show members of that group
... View more
07-28-2016
07:00 PM
The error says "XAAUDIT.DB.USER_NAME" is not defined in the file: [/usr/hdp/2.2.4.10-3/ranger-hdfs-plugin/install.properties] . Did you check if you have defined this property for ranger audit db .
... View more
07-28-2016
04:47 PM
Just installed everything using HDP2.3.2 repo. Nothing custom, so I would hope everything should have correct version. thought?
... View more
07-28-2016
04:14 PM
I am getting below error when trying to run Pig using tez. Any pointer will be appreciated. The same script run successfully if I am using MR as execution engine. I checked the tez.lib.uris and it is correctly set to hdfs tez.tar.gz file location. Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.VerifyError: class org.apache.tez.dag.api.records.DAGProtos$DAGPlan overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
... View more
Labels:
07-28-2016
01:45 AM
Just starting to understand Spark memory management on yarn and got few questions that I thought would be better to ask experts here. 1. Is there a way to restrict max size that users can use for Spark executor and Driver when submitting jobs on Yarn cluster? 2. What the best practice around determining number of executor required for a job? Is there a max limit that users can be restricted to? 3. How RM handles resource allocation if most of the resources are consumed by Spark jobs in a queue? How preemption is handled?
... View more
Labels:
07-26-2016
03:11 AM
2 Kudos
You can deploy Master Slave KDC. That will provide HA. I have done this before. You can setup replication between master and slave. http://www.tldp.org/HOWTO/Kerberos-Infrastructure-HOWTO/server-replication.html HTH
... View more
07-26-2016
02:57 AM
This looks like isilon hdfs log error. Are you getting any error when you are trying to start HS2 through Ambari? Did you check HS2 logs on the node where HS2 is installed?
... View more
07-15-2016
03:29 AM
The table is using TextinputFormat
... View more
07-14-2016
05:00 PM
The group by is used to dedup and count distinct serial_number.
... View more
07-14-2016
04:47 PM
1 Kudo
I am trying to run below query in hive using TEZ and it is failing with NullPointerException whereas same query is running fine using MR execution engine. We are using hdp 2.3.2
select count(*) from (select serial_number from hive_demo.gdwi_test group by serial_number) q;
select serial_number from hive_demo.gdwi_test group by serial_number; Both these queries are throwing NullPointerException. Vertex failed, vertexName=Map 1, vertexId=vertex_1468116141308_1525_2_00, diagnostics=[Vertex vertex_1468116141308_1525_2_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: gdwi_test initializer failed, vertex=vertex_1468116141308_1525_2_00 [Map 1], java.lang.NullPointerException
at org.apache.hadoop.io.Text.encode(Text.java:450)
at org.apache.hadoop.io.Text.encode(Text.java:431)
at org.apache.hadoop.io.Text.writeString(Text.java:480)
at org.apache.hadoop.mapred.split.TezGroupedSplit.write(TezGroupedSplit.java:101)
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.createSplitProto(MRInputHelpers.java:249)
at org.apache.tez.mapreduce.hadoop.InputSplitInfoMem.createSplitsProto(InputSplitInfoMem.java:168)
at org.apache.tez.mapreduce.hadoop.InputSplitInfoMem.getSplitsProto(InputSplitInfoMem.java:117)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.createEventList(HiveSplitGenerator.java:200)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:180)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:246)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:240)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:240)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:227)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
... View more
Labels:
07-14-2016
02:40 PM
If this fixed your issue, can you accept this as Answer. It would help others in community.
... View more
07-13-2016
04:45 PM
Here is the link that has syntax to compute statisitcs on hive partitioned table. https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_dataintegration/content/cost-based-opt.html ANALYZE TABLE employees PARTITION (dt) COMPUTE STATISTICS
... View more
07-13-2016
04:28 PM
Run analyze table <table name> compute statistics;
... View more
07-11-2016
03:34 PM
Are you able to ping each host from Ambari server nodes and vice-versa? The error says "org.apache.ambari.server.HostNotFoundException: Host not found, hostname=ctsc00675971901.cts.com"
... View more