Member since
05-22-2017
14
Posts
0
Kudos Received
0
Solutions
08-22-2019
01:08 AM
Hi, I am actually using an Amazon EMR, which does not yet support hive 3.x in any version of EMR clusters, so this might make me drop this idea :(. Is there any workaround? Perhaps something like a set of libs we can import in Hive 2.x so it supports HiveKafkaStorageHandler? Thanks again!
... View more
08-14-2019
04:35 PM
Hi @Manohar Vanam, Thanks a lot for your quick answer. I was trying to test KafkaStorageHandler but I was not able to get it to work yet. I am having the following error: Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/StorageHandlerInfo at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.hive.ql.parse.ParseUtils.ensureClassExists(ParseUtils.java:261) at org.apache.hadoop.hive.ql.parse.StorageFormat.fillStorageFormat(StorageFormat.java:64) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:11907) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:11040) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11153) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:239) at org.apache.hadoop.util.RunJar.main(RunJar.java:153) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.metadata.StorageHandlerInfo at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) I think this might be due to the version of Hive I am using. The setup I have uses Hive 2.3.x but apparently the lib for KafkaStorageHandler is for 3.1.x. Is this the issue? Have you had the same problem? Thanks!
... View more
08-14-2019
01:15 AM
Hi all, I've been looking for solutions to expose Kafka data in Hive. I found a few things, however HiveKafkaStorageHandler caught my attention :). I was looking for trying it out, but I don't find anywhere in the documentation how does it support a connection to Kafka through SSL (which is a requirement for my setup). Can someone tell me for sure if this storage handler supports a broker listener that uses SSL? Thanks a lot in advance !
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Kafka
12-13-2017
02:12 PM
Hi Mark, Did you manage to solve this problem? I am facing the exact same situation.
... View more
07-28-2017
10:32 AM
I will try to reproduce the situation and then apply a fix! Thanks a lot for your feedback.
... View more
07-27-2017
09:15 AM
Just updating phoenix to version 4.7 would solve this issue then? Is it possible to update just the Phoenix version wihtout updating the whole cluster? Thank you.
... View more
07-27-2017
09:12 AM
Hi Josh, I am using HDP 2.4.0. The query I am using looks like the following: upsert into TABLE_HBASE_DENORM
select
TABLE_A.MPK,
TABLE_A.AAAA,
TABLE_A.BBBB,
TABLE_A.CCCC,
TABLE_A.DDDD,
TABLE_A.EEEE,
TABLE_B.AAA,
TABLE_B.BBB,
TABLE_B.CCC,
... 26 other parameters ...
TABLE_A.M4,
TABLE_A.M3,
TABLE_A.M2,
TABLE_A.M1,
TABLE_A.Q4,
TABLE_A.Q3,
TABLE_A.Q2,
TABLE_A.Q1
from
TABLE_A,
TABLE_B,
TABLE_C,
TABLE_F,
TABLE_E,
TABLE_D
where
TABLE_A.AAAA = TABLE_B.AAAA AND
TABLE_A.BBBB = TABLE_C.BBBB AND
TABLE_A.CCCC = TABLE_D.CCCC AND
TABLE_A.DDDD = TABLE_E.DDDD AND
TABLE_A.EEEE = TABLE_F.EEEE AND
TABLE_A.Q3 >= TO_TIMESTAMP('2017-03-04 10:40:05') AND TABLE_A.Q3 < TO_TIMESTAMP('2017-03-05 10:40:05')
;
... View more
07-22-2017
08:21 PM
Hi all, I am getting an ArrayIndexOutOfBoundsException on a phoenix query and I would like to know if you have any suggestion to solve this problem. The error is throwing when I select data from some tables to upsert into another. The select query gets all the data from a table that has a considerable amount of data (3 374 590 registers) and merges with data from 5 smaller tables. I have 4 DataNode/RegiongServer/PhoenixQS (almost dedicated) nodes in the cluster (6 vCPU; 32GB RAM per node), so I believe that resources are not the problem. As a workarround I am filtering by date in order to be able to transfer the data between the tables. It's odd to notice that sometimes I am able to upsert 50k register in a query (a few days), and sometimes I am limited to 9k registers (arround 2 days) or less. I get the error even using a hint to change the join algorithm. An exemple of the errors I am getting: Error: java.lang.ArrayIndexOutOfBoundsException: -6 (state=08000,code=101)
org.apache.phoenix.exception.PhoenixIOException: java.lang.ArrayIndexOutOfBoundsException: -6
at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108)
at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:538)
at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:510)
at org.apache.phoenix.iterate.RoundRobinResultIterator.getIterators(RoundRobinResultIterator.java:176)
at org.apache.phoenix.iterate.RoundRobinResultIterator.next(RoundRobinResultIterator.java:91)
at org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)
at org.apache.phoenix.compile.UpsertCompiler$2.execute(UpsertCompiler.java:737)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:305)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:297)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:295)
at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1255)
at sqlline.Commands.execute(Commands.java:822)
at sqlline.Commands.sql(Commands.java:732)
at sqlline.SqlLine.dispatch(SqlLine.java:808)
at sqlline.SqlLine.begin(SqlLine.java:681)
at sqlline.SqlLine.start(SqlLine.java:398)
at sqlline.SqlLine.main(SqlLine.java:292)
Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: -6
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:202)
at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:534)
... 16 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -6
at org.apache.phoenix.util.ByteUtil.vlongFromBytes(ByteUtil.java:329)
at org.apache.phoenix.util.ByteUtil.vintFromBytes(ByteUtil.java:316)
at org.apache.phoenix.schema.KeyValueSchema.next(KeyValueSchema.java:208)
at org.apache.phoenix.schema.KeyValueSchema.iterator(KeyValueSchema.java:165)
at org.apache.phoenix.schema.KeyValueSchema.iterator(KeyValueSchema.java:171)
at org.apache.phoenix.schema.KeyValueSchema.iterator(KeyValueSchema.java:175)
at org.apache.phoenix.expression.ProjectedColumnExpression.evaluate(ProjectedColumnExpression.java:112)
at org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:69)
at org.apache.phoenix.jdbc.PhoenixResultSet.getObject(PhoenixResultSet.java:515)
at org.apache.phoenix.compile.UpsertCompiler.upsertSelect(UpsertCompiler.java:164)
at org.apache.phoenix.compile.UpsertCompiler.access$000(UpsertCompiler.java:105)
at org.apache.phoenix.compile.UpsertCompiler$UpsertingParallelIteratorFactory.mutate(UpsertCompiler.java:221)
at org.apache.phoenix.compile.MutatingParallelIteratorFactory.newIterator(MutatingParallelIteratorFactory.java:61)
at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:99)
at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:90)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:172)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745
I am using HDP 2.4.0, so phoenix 4.4. The configurations I am using are: HBase RegionServer Maximum Memory=12288 (12GB)
HBase Master Maximum Memory=12288 (12GB)
Number of Handlers per RegionServer=30
Memstore Flush Size=128MB
Maximum Record Size=1MB
Maximum Region File Size=10GB
% of RegionServer Allocated to Read Buffers=40%
% of RegionServer Allocated to Write Buffers=40%
HBase RPC Timeout=6min
Zookeeper Session Timeout=6min
Phoenix Query Timeout=6min
Number of Fetched Rows when Scanning from Disk=10000
dfs.client.read.shortcircuit=true
dfs.client.read.shortcircuit.buffer.size=131072
hbase.hstore.min.locality.to.skip.major.compact=0.7
hbase.ipc.server.callqueue.read.ratio=0.8
hbase.ipc.server.callqueue.scan.ratio=0.8
phoenix.coprocessor.maxServerCacheTimeToLiveMs=30000
phoenix.mutate.batchSize=100000
phoenix.query.maxServerCacheBytes=8589934592
phoenix.query.queueSize=7500
phoenix.query.threadPoolSize=512
The other configurations are the default. What should I change in order to make a query of this size work properly? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix
07-10-2017
11:32 AM
Hi, were you able to solve this issue? I am having the same problem.
... View more
06-03-2017
04:40 PM
I will then keep it working with the ssh session. Thank you very much for your feedback.
... View more
06-03-2017
04:29 PM
Hi Namit, Thank you for your answer. Yes, I can run the command: [nosuser@RHTPINEC008 ~]$ ps -ef | grep namenode
nosuser 7201 6867 0 16:01 pts/0 00:00:00 grep --color=auto namenode
hdfs 39395 1 5 May31 ? 04:01:49 /usr/jdk64/jdk1.8.0_60/bin/java -Dproc_namenode -Xmx1024m -Dhdp.version=2.4.0.0-169 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.4.0.0-169/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.4.0.0-169/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.0.0-169/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.4.0.0-169 -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop-hdfs-namenode-RHTPINEC008.corporativo.pt.log -Dhadoop.home.dir=/usr/hdp/2.4.0.0-169/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=:/usr/hdp/2.4.0.0-169/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.0.0-169/hadoop/lib/native:/usr/hdp/2.4.0.0-169/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.0.0-169/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=512m -XX:MaxNewSize=512m -Xloggc:/var/log/hadoop/hdfs/gc.log-201705311529 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=512m -XX:MaxNewSize=512m -Xloggc:/var/log/hadoop/hdfs/gc.log-201705311529 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=512m -XX:MaxNewSize=512m -Xloggc:/var/log/hadoop/hdfs/gc.log-201705311529 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Xms4096m -Xmx4096m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode As you suggested I killed the process : [nosuser@RHTPINEC008 ~]$ sudo kill -9 39395 and started it again through Ambari, which took a while but ended successfuly: A few seconds later the NameNode went down again in the Ambari interface, however I am still able to run: [hdfs@RHTPINEC008 ~]$ jps
13494 Jps
9832 NameNode
Any ideas? Could it be the ambari server or agent having problems collecting namenode status? Thanks
... View more
06-01-2017
05:08 PM
Hi all, I am having a problem with the NameNode status ambari shows. The following points are verifiable in the system:
- The NameNode keeps going down a few seconds after I start it through ambari (it looks like it never really goes up, but the start process run successfully); - Despite being DOWN according to ambari, if I run JPS in the server the NameNode is hosted it shows that the service is running: [hdfs@RHTPINEC008 ~]$ jps
39395 NameNode
4463 Jps and I can access NameNode UI properly; - I already restarted both the namenode and ambari-agent the manually but the behavior keeps the same; - This problem started after some HBase/Phoenix heavy queries that caused the namenode to go down (not sure if this is actually related but the exact same configurations were working well before this episode); - I've been digging for some hours and I am not being able to find error details in the namenode logs nor in the ambari-agent logs that allows me to understand the problem; I am using hdp 2.4.0 and no HA options. Can someone help in this? Thanks in advance
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
05-22-2017
05:57 PM
Dear all, I am working on a cluster with several VM and I need to run some pyspark code through Oozie periodically on a specific cluster machine, however I am not able to find a configuration that allows me to do that. My workarround so far is to run a ssh client session with oozie that will spark-submit the script. Is this the only way? Thanks in advance
... View more
Labels:
- Labels:
-
Apache Oozie