Member since
04-24-2017
106
Posts
13
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1419 | 11-25-2019 12:49 AM | |
2502 | 11-14-2018 10:45 AM | |
2253 | 10-15-2018 03:44 PM | |
2123 | 09-25-2018 01:54 PM | |
1947 | 08-03-2018 09:47 AM |
11-25-2019
12:49 AM
1 Kudo
To answer my own question: Since I'm using multiple partitions for the Kafka topic, Spark uses more executors to process the data. Also Hive/Tez creates as many worker containers as the topic contains partitions.
... View more
11-24-2019
11:18 PM
I wrote a Kafka producer, that sends some simulated data to a Kafka stream (replication-factor 3, one partition).
Now, I want to access this data by using Hive and/or Spark Streaming.
First approach: Using an external Hive table with KafkaStorageHandler:
CREATE EXTERNAL TABLE mydb.kafka_timeseriestest ( description string, version int, ts timestamp, varname string, varvalue float ) STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler' TBLPROPERTIES ( "kafka.topic" = "testtopic", "kafka.bootstrap.servers"="server1:6667,server2:6667,server3:6667" ); -- e.g. SELECT max(varvalue) from mydb.kafka_timeseriestest; -- takes too long, and only one Tez task is running
Second approach: Writing a Spark Streaming app, that accesses the Kafka topic:
// started with 10 executors, but only one executor is active ... JavaInputDStream<ConsumerRecord<String, String>> stream = KafkaUtils.createDirectStream(jssc, LocationStrategies.PreferConsistent(), ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams)); ...
In both cases, only one Tez/Spark worker is active. Therefore reading all data (~500 million entries) takes a very long time. How can I increase the performance? Is the issue caused by the one-partition topic? If yes, is there a rule of thumb according to which the number of partitions should be determined?
I'm using a HDP 3.1 cluster, running Spark, Hive and Kafka on multiple nodes:
dataNode1 - dataNode3: Hive + Spark + Kafka broker
dataNode4 - dataNode8: Hive + Spark
... View more
Labels:
08-29-2019
05:28 AM
1 Kudo
I've upgraded to a HDP 3.1 and now want to read a Hive external table in my Spark application. The following table shows the compatibilites: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_configure_a_spark_hive_connection.html I don't have LLAP activated, so it seems that I'm restricted on the Spark -> Hive access and vice-versa, right? But the compatibility table sais, that I can access external Hive tables by Spark without using the HWC (and also without LLAP), but with the hint that the Table must be defined in Spark catalog. What do I have to do here? I tried the following code, but it sais Table not found! SparkSession session = SparkSession.builder() .config("spark.executor.instances", "4") .master("yarn-client") .appName("Spark LetterCount") .config("hive.metastore.uris", "thrift://myhost.com:9083") .config("hive.metastore.warehouse.dir", "/warehouse/tablespace/managed/hive") .config("hive.metastore.warehouse.external.dir", "/warehouse/tablespace/external/hive") .config("spark.sql.warehouse.dir", new File("spark-warehouse").getAbsolutePath()) .config("spark.sql.hive.hiveserver2.jdbc.url", "jdbc:hive2://localhost:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;user=student30") .enableHiveSupport(); Dataset<Row> dsRead = session.sql("SELECT * FROM hivedb.external_table"); System.out.println(dsRead.count()); Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or view not found: `hivedb`.`external_table`; line 1 pos 14; 'Project [*] +- 'UnresolvedRelation `hivedb`.`external_table` at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:86) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:84) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:84) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:92) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) at main.SparkSQLExample.main(SparkSQLExample.java:41) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Can someone help me, to solve the issue? Thank you!
... View more
Labels:
11-14-2018
10:45 AM
I found the following Java based solution for me: Using the Dataset.filter method with FilterFunction: https://spark.apache.org/docs/2.3.0/api/java/index.html?org/apache/spark/sql/Dataset.html So, my code now looks like this: Dataset<Row> dsResult = sqlC.read()
.format("org.apache.phoenix.spark")
.option("table", tableName)
.option("zkUrl", hbaseUrl).load()
.where("OTHER_COLUMN = " + inputId)
.filter(row -> {
long readTime = row.getTimestamp(row.fieldIndex("TABLE_TS_COL")).getTime();
long tsFrom = new Timestamp(sdf.parse(dateFrom).getTime()).getTime();
long tsTo = new Timestamp(sdf.parse(dateTo).getTime()).getTime();
return readTime >= tsFrom && readTime <= tsTo;
});
... View more
11-14-2018
08:10 AM
I have a Phoenix Table, that I can access via SparkSQL (with Phoenix Spark Plugin). The table has also a Timestamp column. I have to filter this Timestamp column by a user input, like 2018-11-14 01:02:03. So I want to filter my Dataset (that represents the read Phoenix table) with the where / filter methods. My actual Java code looks the following: Timestamp t1 = new Timestamp(sdf.parse(dateFrom).getTime());
Timestamp t2 = new Timestamp(sdf.parse(dateTo).getTime());
Column c1 = new Column("TABLE_TS_COL").geq(t1);
Column c2 = new Column("TABLE_TS_COL").leq(t2);
Dataset<Row> dsResult = sqlContext.read()
.format("org.apache.phoenix.spark")
.option("table", tableName)
.option("zkUrl", hbaseUrl).load()
.where("OTHER_COLUMN = " + inputId) // This works
.where(c1) // Problem!
.where(c2) // Problem!
But this leads to follwoing exception: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "06" at line 1, column 474. My Spark History UI shows the following select statement: ...
18/11/14 08:54:58 INFO PhoenixInputFormat: Select Statement: SELECT "OTHER_COLUMN", "TABLE_TS_COL" FROM HBASE_TEST3 WHERE ( "OTHER_COLUMN" = 0 AND "OTHER_COLUMN" IS NOT NULL AND "TABLE_TS_COL" IS NOT NULL AND "TABLE_TS_COL" >= 2018-09-24 06:49:01.0 AND "TABLE_TS_COL" <= 2018-09-24 06:49:01.0)
For me it looks like the quotation marks are missing for the timestamp values (not sure about that)? How can I filter a Timestamp column by a user input in Java and SparkSQL?
... View more
Labels:
- Labels:
-
Apache Phoenix
-
Apache Spark
10-15-2018
03:44 PM
Solved it - Phoenix Arrays are 1-based, so using the following query solved it: SELECT REGEXP_SPLIT(ROWKEY, ':')[1] as test, count(1) FROM "my_view" GROUP BY REGEXP_SPLIT(ROWKEY, ':')[1]
... View more
10-15-2018
03:40 PM
I have a Phoenix View which has a row key column ROWKEY that has a layout like this: <hash>:<attributeA>_<attributeB> I want to count the rows of each <hash> value of my table. Therefore I need to group my View by the <hash> value, which I get when I split the RowKey column. I tried to use the REGEXP_SPLIT function of Phoenix, but I get an exception: %jdbc(phoenix)
SELECT REGEXP_SPLIT(ROWKEY, ':')[0] as test, count(1) FROM "my_view" GROUP BY REGEXP_SPLIT(ROWKEY, ':')[0]
The exception: org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: my_view,000,1539582312877.93d8d6e785eae60fedac3c6088b4e556.: 32767
at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:93)
at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:59)
at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:271)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1301)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1660)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1734)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1699)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1296)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2404)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 32767
at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:418)
at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:379)
at org.apache.phoenix.expression.function.ArrayIndexFunction.evaluate(ArrayIndexFunction.java:64)
at org.apache.phoenix.util.TupleUtil.getConcatenatedValue(TupleUtil.java:101)
at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.scanUnordered(GroupedAggregateRegionObserver.java:418)
at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(GroupedAggregateRegionObserver.java:162)
at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:237)
... 11 more
at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:117)
at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:780)
at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:721)
at org.apache.phoenix.iterate.MergeSortResultIterator.getMinHeap(MergeSortResultIterator.java:72)
at org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:93)
at org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:58)
at org.apache.phoenix.iterate.BaseGroupedAggregatingResultIterator.next(BaseGroupedAggregatingResultIterator.java:64)
at org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)
at org.apache.phoenix.iterate.LimitingResultIterator.next(LimitingResultIterator.java:47)
at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:778)
at org.apache.commons.dbcp2.DelegatingResultSet.next(DelegatingResultSet.java:191)
at org.apache.commons.dbcp2.DelegatingResultSet.next(DelegatingResultSet.java:191)
at org.apache.zeppelin.jdbc.JDBCInterpreter.getResults(JDBCInterpreter.java:510)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:694)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:763)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:101)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:502)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: my_view,000,1539582312877.93d8d6e785eae60fedac3c6088b4e556.: 32767
at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:93)
at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:59)
at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:271)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1301)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1660)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1734)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1699)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1296)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2404)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 32767
at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:418)
at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:379)
at org.apache.phoenix.expression.function.ArrayIndexFunction.evaluate(ArrayIndexFunction.java:64)
at org.apache.phoenix.util.TupleUtil.getConcatenatedValue(TupleUtil.java:101)
at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.scanUnordered(GroupedAggregateRegionObserver.java:418)
at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(GroupedAggregateRegionObserver.java:162)
at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:237)
... 11 more
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:206)
at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:775)
... 24 more
Caused by: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: my_view,000,1539582312877.93d8d6e785eae60fedac3c6088b4e556.: 32767
at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:93)
at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:59)
at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:271)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1301)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1660)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1734)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1699)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1296)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2404)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 32767
at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:418)
at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:379)
at org.apache.phoenix.expression.function.ArrayIndexFunction.evaluate(ArrayIndexFunction.java:64)
at org.apache.phoenix.util.TupleUtil.getConcatenatedValue(TupleUtil.java:101)
at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.scanUnordered(GroupedAggregateRegionObserver.java:418)
at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(GroupedAggregateRegionObserver.java:162)
at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:237)
... 11 more
at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:117)
at org.apache.phoenix.iterate.TableResultIterator.initScanner(TableResultIterator.java:252)
at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:113)
at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:108)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:183)
... 3 more
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: my_view,000,1539582312877.93d8d6e785eae60fedac3c6088b4e556.: 32767
at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:93)
at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:59)
at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:271)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1301)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1660)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1734)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1699)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1296)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2404)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 32767
at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:418)
at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:379)
at org.apache.phoenix.expression.function.ArrayIndexFunction.evaluate(ArrayIndexFunction.java:64)
at org.apache.phoenix.util.TupleUtil.getConcatenatedValue(TupleUtil.java:101)
at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.scanUnordered(GroupedAggregateRegionObserver.java:418)
at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(GroupedAggregateRegionObserver.java:162)
at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:237)
... 11 more
at sun.reflect.GeneratedConstructorAccessor38.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:335)
at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:391)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:208)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:63)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:211)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:396)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:370)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:136)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
... 3 more
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException): org.apache.hadoop.hbase.DoNotRetryIOException: my_view,000,1539582312877.93d8d6e785eae60fedac3c6088b4e556.: 32767
at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:93)
at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:59)
at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:271)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1301)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1660)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1734)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1699)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1296)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2404)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 32767
at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:418)
at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:379)
at org.apache.phoenix.expression.function.ArrayIndexFunction.evaluate(ArrayIndexFunction.java:64)
at org.apache.phoenix.util.TupleUtil.getConcatenatedValue(TupleUtil.java:101)
at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.scanUnordered(GroupedAggregateRegionObserver.java:418)
at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(GroupedAggregateRegionObserver.java:162)
at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:237)
... 11 more
at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1227)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:218)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:292)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32831)
at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:383)
... 10 more
How is the correct query?
... View more
Labels:
- Labels:
-
Apache Phoenix
09-25-2018
01:54 PM
1 Kudo
The problem was solved after changing the MySQL Database URL from jdbc:mysql://xxxx.yyyy/hive?createDatabaseIfNotExist=true to jdbc:mysql://xxxx.yyyy/hive?createDatabaseIfNotExist=true&serverTimezone=Europe/Berlin I found the relevant information here: https://community.hortonworks.com/questions/218023/error-setting-up-hive-on-hdp-265timezone-on-mysql.html
... View more
09-25-2018
01:27 PM
I'm just setting up a Hortonworks Data Platform 3.0 installation. When I want to start the services (first time), the Hive Metastore start brings an exception: Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
Underlying cause: java.sql.SQLException : The server time zone value 'CEST' is unrecognized or represents more than one time zone. You must configure either the server or JDBC driver (via the serverTimezone configuration property) to use a more specifc time zone value if you want to utilize time zone support.
SQL Error code: 0
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
at org.apache.hadoop.hive.metastore.tools.HiveSchemaHelper.getConnectionToMetastore(HiveSchemaHelper.java:94)
at org.apache.hive.beeline.HiveSchemaTool.getConnectionToMetastore(HiveSchemaTool.java:169)
at org.apache.hive.beeline.HiveSchemaTool.testConnectionToMetastore(HiveSchemaTool.java:475)
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:581)
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:567)
at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1539)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:318)
at org.apache.hadoop.util.RunJar.main(RunJar.java:232)
Caused by: java.sql.SQLException: The server time zone value 'CEST' is unrecognized or represents more than one time zone. You must configure either the server or JDBC driver (via the serverTimezone configuration property) to use a more specifc time zone value if you want to utilize time zone support.
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:73)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:76)
at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:832)
at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:456)
at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:240)
at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:207)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at org.apache.hadoop.hive.metastore.tools.HiveSchemaHelper.getConnectionToMetastore(HiveSchemaHelper.java:88)
... 11 more In my CentOS system I set the timezone to Europe/Berlin. ls -l /etc/localtime
lrwxrwxrwx. 1 root root 35 Sep 25 10:45 /etc/localtime -> ../usr/share/zoneinfo/Europe/Berlin
timedatectl | grep -i 'time zone'
Time zone: Europe/Berlin (CEST, +0200)
Does anyone know how to solve this problem? Thank you!
... View more
Labels:
09-14-2018
10:27 AM
@Felix Albani Thank you for your help! Without the LIMIT clause, the Job works perfectly (and in parallel).
... View more