Member since
10-13-2016
9
Posts
2
Kudos Received
0
Solutions
04-14-2020
12:10 AM
you can use .repartition(1) DF..repartition(1) .....
... View more
02-08-2017
03:36 PM
1 Kudo
As for >2 GB blobs, Hive STRING or even BINARY won't handle AFAIK. But that is just googled, Hive experts please add your thoughts. Please note that the "InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit." part in your stack trace tells you that you hit the limits of ProtocolBuffers, not Hive field type limitations. That could explain the 500 MB limit that you got in your investigations. In Hive code, orc input stream implementation I could see that there is 1 GB protobuf limit set but that is for the whole message and the blob is only a part of it.
... View more