Support Questions

Find answers, ask questions, and share your expertise

Phoenix Query with Split operation on String (Key) column

avatar
Expert Contributor

I have a Phoenix View which has a row key column ROWKEY that has a layout like this:

<hash>:<attributeA>_<attributeB>

I want to count the rows of each <hash> value of my table. Therefore I need to group my View by the <hash> value, which I get when I split the RowKey column. I tried to use the REGEXP_SPLIT function of Phoenix, but I get an exception:

%jdbc(phoenix)
SELECT REGEXP_SPLIT(ROWKEY, ':')[0] as test, count(1) FROM "my_view" GROUP BY REGEXP_SPLIT(ROWKEY, ':')[0]

The exception:

org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: my_view,000,1539582312877.93d8d6e785eae60fedac3c6088b4e556.: 32767
	at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:93)
	at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:59)
	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:271)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1301)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1660)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1734)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1699)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1296)
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2404)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 32767
	at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:418)
	at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:379)
	at org.apache.phoenix.expression.function.ArrayIndexFunction.evaluate(ArrayIndexFunction.java:64)
	at org.apache.phoenix.util.TupleUtil.getConcatenatedValue(TupleUtil.java:101)
	at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.scanUnordered(GroupedAggregateRegionObserver.java:418)
	at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(GroupedAggregateRegionObserver.java:162)
	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:237)
	... 11 more
	at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:117)
	at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:780)
	at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:721)
	at org.apache.phoenix.iterate.MergeSortResultIterator.getMinHeap(MergeSortResultIterator.java:72)
	at org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:93)
	at org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:58)
	at org.apache.phoenix.iterate.BaseGroupedAggregatingResultIterator.next(BaseGroupedAggregatingResultIterator.java:64)
	at org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)
	at org.apache.phoenix.iterate.LimitingResultIterator.next(LimitingResultIterator.java:47)
	at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:778)
	at org.apache.commons.dbcp2.DelegatingResultSet.next(DelegatingResultSet.java:191)
	at org.apache.commons.dbcp2.DelegatingResultSet.next(DelegatingResultSet.java:191)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.getResults(JDBCInterpreter.java:510)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:694)
	at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:763)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:101)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:502)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
	at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: my_view,000,1539582312877.93d8d6e785eae60fedac3c6088b4e556.: 32767
	at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:93)
	at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:59)
	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:271)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1301)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1660)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1734)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1699)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1296)
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2404)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 32767
	at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:418)
	at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:379)
	at org.apache.phoenix.expression.function.ArrayIndexFunction.evaluate(ArrayIndexFunction.java:64)
	at org.apache.phoenix.util.TupleUtil.getConcatenatedValue(TupleUtil.java:101)
	at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.scanUnordered(GroupedAggregateRegionObserver.java:418)
	at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(GroupedAggregateRegionObserver.java:162)
	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:237)
	... 11 more
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:206)
	at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:775)
	... 24 more
Caused by: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: my_view,000,1539582312877.93d8d6e785eae60fedac3c6088b4e556.: 32767
	at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:93)
	at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:59)
	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:271)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1301)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1660)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1734)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1699)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1296)
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2404)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 32767
	at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:418)
	at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:379)
	at org.apache.phoenix.expression.function.ArrayIndexFunction.evaluate(ArrayIndexFunction.java:64)
	at org.apache.phoenix.util.TupleUtil.getConcatenatedValue(TupleUtil.java:101)
	at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.scanUnordered(GroupedAggregateRegionObserver.java:418)
	at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(GroupedAggregateRegionObserver.java:162)
	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:237)
	... 11 more
	at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:117)
	at org.apache.phoenix.iterate.TableResultIterator.initScanner(TableResultIterator.java:252)
	at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:113)
	at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:108)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:183)
	... 3 more
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: org.apache.hadoop.hbase.DoNotRetryIOException: my_view,000,1539582312877.93d8d6e785eae60fedac3c6088b4e556.: 32767
	at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:93)
	at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:59)
	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:271)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1301)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1660)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1734)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1699)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1296)
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2404)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 32767
	at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:418)
	at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:379)
	at org.apache.phoenix.expression.function.ArrayIndexFunction.evaluate(ArrayIndexFunction.java:64)
	at org.apache.phoenix.util.TupleUtil.getConcatenatedValue(TupleUtil.java:101)
	at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.scanUnordered(GroupedAggregateRegionObserver.java:418)
	at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(GroupedAggregateRegionObserver.java:162)
	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:237)
	... 11 more
	at sun.reflect.GeneratedConstructorAccessor38.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
	at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:335)
	at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:391)
	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:208)
	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:63)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:211)
	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:396)
	at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:370)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:136)
	at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
	... 3 more
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException): org.apache.hadoop.hbase.DoNotRetryIOException: my_view,000,1539582312877.93d8d6e785eae60fedac3c6088b4e556.: 32767
	at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:93)
	at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:59)
	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:271)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1301)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1660)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1734)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1699)
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1296)
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2404)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 32767
	at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:418)
	at org.apache.phoenix.schema.types.PArrayDataType.positionAtArrayElement(PArrayDataType.java:379)
	at org.apache.phoenix.expression.function.ArrayIndexFunction.evaluate(ArrayIndexFunction.java:64)
	at org.apache.phoenix.util.TupleUtil.getConcatenatedValue(TupleUtil.java:101)
	at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.scanUnordered(GroupedAggregateRegionObserver.java:418)
	at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(GroupedAggregateRegionObserver.java:162)
	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:237)
	... 11 more
	at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1227)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:218)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:292)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32831)
	at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:383)
	... 10 more

How is the correct query?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Solved it - Phoenix Arrays are 1-based, so using the following query solved it:

SELECT REGEXP_SPLIT(ROWKEY, ':')[1] as test, count(1) FROM "my_view" GROUP BY REGEXP_SPLIT(ROWKEY, ':')[1]

View solution in original post

1 REPLY 1

avatar
Expert Contributor

Solved it - Phoenix Arrays are 1-based, so using the following query solved it:

SELECT REGEXP_SPLIT(ROWKEY, ':')[1] as test, count(1) FROM "my_view" GROUP BY REGEXP_SPLIT(ROWKEY, ':')[1]