<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question How to fix size limit error when working with hive table in pyspark in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-to-fix-size-limit-error-when-working-with-hive-table-in/m-p/194945#M157004</link>
    <description>&lt;P&gt;
	I have a hive table with 4 billion rows that I need to work with in pyspark:&lt;/P&gt;&lt;PRE&gt;my_table = sqlContext.table('my_hive_table')&lt;/PRE&gt;&lt;P&gt;
	When I try to do any actions such as counting against that table, I get the following exception (followed by &lt;CODE&gt;TaskKilled&lt;/CODE&gt;exceptions):&lt;/P&gt;
&lt;PRE&gt;my_table.count()
Py4JJavaError: An error occurred while calling o89.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 6732 in stage 13.0 failed
4 times, most recent failure: Lost task 6732.3 in stage 13.0 (TID 30759, some_server.XX.net, executor 38): org.apache.hive.com.google.protobuf.InvalidProtocolBufferException: Protocol mess
age was too large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase the size limi
t.
&lt;/PRE&gt;&lt;P&gt;
	Is there some way I can get around this issue without upgrading anything, maybe by modifying an environment variable or config somewhere, or by passing an argument to pyspark via the command line?&lt;/P&gt;</description>
    <pubDate>Wed, 26 Jul 2017 20:55:54 GMT</pubDate>
    <dc:creator>maya_tydykov</dc:creator>
    <dc:date>2017-07-26T20:55:54Z</dc:date>
  </channel>
</rss>

