Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Problem with writing from spark streaming to hbase

Highlighted

Problem with writing from spark streaming to hbase

Contributor

We have an application that reads messages from specific kafka topics, and process it, and when it reads message from topic it puts offset to the HBase table.

after some amount of working application fails (time varries from 30 minutes to 15 hours ), in the driver stderr we see following log entries:

18/04/17 17:31:15 WARN client.AsyncProcess: #3121, the task was rejected by the pool. This is unexpected. Server is ***hostname masked***,60020,1523949367813
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@3f377224 rejected from java.util.concurrent.ThreadPoolExecutor@639d4dae[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.sendMultiAction(AsyncProcess.java:1013)
at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$000(AsyncProcess.java:600)
at org.apache.hadoop.hbase.client.AsyncProcess.submitMultiActions(AsyncProcess.java:449)
at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:429)
at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:344)
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:238)
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:190)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1495)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1098)

And after some amount of time this ERRORS:

18/04/17 17:31:15 ERROR client.AsyncProcess: Cannot get replica 0 location for {"totalColumns":1,"row":"predictor_passport_ru_number_gold","families":{"cf":[{"qualifier":"\x00\x00\x00\x00","vlen":8,"tag":[],"timestamp":9223372036854775807}]}}
18/04/17 17:31:15 ERROR spark.Utils: Error saving offsets [OffsetRange(topic: 'predictor_passport_ru_number_gold', partition: 0, range: [2536631 -> 2536718])]
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: IOException: 1 time,
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:247)
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:227)
at org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1766)
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:240)
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:190)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1495)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1098)

In the HBase logs I see an gap in messages on that period of time you can see this on attached screenshot - memstoreflush.png

In addition full log of driver in index.zip.

68516-memstoreflush.png

memstoreflush.PNG

Please help to investigate and solve the issue.

Don't have an account?
Coming from Hortonworks? Activate your account here