Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

InvalidProtocolBufferException when trying to import data from csv to hbase table

InvalidProtocolBufferException when trying to import data from csv to hbase table

New Contributor

I am trying to import data from a csv file to a hbase database through shell script. When doing for a large dataset, I'm getting the below exception in the terminal.

2017-01-30 10:44:51,393 INFO [main] mapreduce.Job: Task Id : attempt_1482215937032_0326_m_000000_1, Status : FAILED Error: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) at com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701) at com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto$ColumnValue$QualifierValue.<init>(ClientProtos.java:8599) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto$ColumnValue$QualifierValue.<init>(ClientProtos.java:8563) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto$ColumnValue$QualifierValue$1.parsePartialFrom(ClientProtos.java:8672) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto$ColumnValue$QualifierValue$1.parsePartialFrom(ClientProtos.java:8667) at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto$ColumnValue.<init>(ClientProtos.java:8462) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto$ColumnValue.<init>(ClientProtos.java:8404) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto$ColumnValue$1.parsePartialFrom(ClientProtos.java:8498) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto$ColumnValue$1.parsePartialFrom(ClientProtos.java:8493) at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto.<init>(ClientProtos.java:7959) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto.<init>(ClientProtos.java:7890) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto$1.parsePartialFrom(ClientProtos.java:8045) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto$1.parsePartialFrom(ClientProtos.java:8040) at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) at com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241) at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253) at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259) at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutationProto.parseDelimitedFrom(ClientProtos.java:10468) at org.apache.hadoop.hbase.mapreduce.MutationSerialization$MutationDeserializer.deserialize(MutationSerialization.java:60) at org.apache.hadoop.hbase.mapreduce.MutationSerialization$MutationDeserializer.deserialize(MutationSerialization.java:50) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:146) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1651) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

I tried setting the size of the column family as shown below. But I am still getting the same exception.

alter 'dev_clickstreamrecord_denorm_new_a', {NAME => 'rec', BLOCKSIZE => '2048756000'}

How can I address this issue? Any help would be much appreciated.

3 REPLIES 3

Re: InvalidProtocolBufferException when trying to import data from csv to hbase table

What version of HBase are you using?

This has likely already been fixed via https://issues.apache.org/jira/browse/HBASE-13825

Re: InvalidProtocolBufferException when trying to import data from csv to hbase table

New Contributor

Hey josh, the version I am using is 0.98. So would this resolve if its update to 1.02?

Re: InvalidProtocolBufferException when trying to import data from csv to hbase table

Per the fixVersion field on that JIRA issue: 0.98.14, 1.0.2, 1.1.2, 1.2.0, or 1.3.0 would have this fix. Also, any version in the same release line after the mentioned version would also work (e.g. 0.98.15 is good since 0.98.14 is good).

Don't have an account?
Coming from Hortonworks? Activate your account here