question How to fix size limit error when working with hive table in pyspark in Support Questions

question How to fix size limit error when working with hive table in pyspark in Support Questions https://community.cloudera.com/t5/Support-Questions/How-to-fix-size-limit-error-when-working-with-hive-table-in/m-p/194945#M157004 I have a hive table with 4 billion rows that I need to work with in pyspark:<PRE>my_table = sqlContext.table('my_hive_table')</PRE> When I try to do any actions such as counting against that table, I get the following exception (followed by <CODE>TaskKilled</CODE>exceptions): <PRE>my_table.count() Py4JJavaError: An error occurred while calling o89.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 6732 in stage 13.0 failed 4 times, most recent failure: Lost task 6732.3 in stage 13.0 (TID 30759, some_server.XX.net, executor 38): org.apache.hive.com.google.protobuf.InvalidProtocolBufferException: Protocol mess age was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limi t. </PRE> Is there some way I can get around this issue without upgrading anything, maybe by modifying an environment variable or config somewhere, or by passing an argument to pyspark via the command line? Wed, 26 Jul 2017 20:55:54 GMT maya_tydykov 2017-07-26T20:55:54Z