Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Cannot access Parquet tables with computed stats after upgrading to CDH 5.2.0

Cannot access Parquet tables with computed stats after upgrading to CDH 5.2.0

Hello,

 

Something very strange has happened to our data after upgrading to CDH 5.2.0.  I haven't been able to access table data or metadata on any Parquet-formatted tables with computed stats - every command fails with an identical exception. Everything is still accessible in Hive, but not Impala. At first I thought that I had corrupted our Hive metastore during the upgrade, but I've been able to reproduce this using a freshly cretaed metastore DB and newly created data. Everything is fine until the COMPUTE STATS is issued (or when accessing existing Parquet data with stats computed on an earlier CDH5 release). I've created a pastebin illustrating the problem here : http://pastebin.com/PQUHJ2Nq . The only useful info in the Hive and Impala log files is a truncated stack trace of the exception (from catalogd.INFO) :

 

I1018 21:08:23.137027 32731 TableLoader.java:60] Loading metadata for: default.pqtest2
I1018 21:08:23.137629 32731 HiveMetaStoreClient.java:308] Trying to connect to metastore with URI thrift://582540-master1.vmmplatform.com:9083
I1018 21:08:23.138365 32731 HiveMetaStoreClient.java:396] Connected to metastore.
E1018 21:08:23.218286 32731 HiveMetaStoreClient.java:411] Unable to shutdown local metastore client
Java exception follows:
org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe
at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)
at com.facebook.fb303.FacebookService$Client.send_shutdown(FacebookService.java:431)
at com.facebook.fb303.FacebookService$Client.shutdown(FacebookService.java:425)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.close(HiveMetaStoreClient.java:408)
at com.cloudera.impala.catalog.MetaStoreClientPool$MetaStoreClient.release(MetaStoreClientPool.java:89)
at com.cloudera.impala.catalog.TableLoader.load(TableLoader.java:102)
at com.cloudera.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:232)
at com.cloudera.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:229)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:159)
... 12 more
I1018 21:09:48.626922 31619 catalog-server.cc:228] Catalog Version: 12 Last Catalog Version: 12

 

Any help? Fortunately we are using external tables and have been able to work around this by dropping and re-adding tables/partitions, but we're at a loss as to how a simple compute stats statement could render both data and metadata completely inaccessible. 

 

Thanks,

 

Charlie

 

 

2 REPLIES 2

Re: Cannot access Parquet tables with computed stats after upgrading to CDH 5.2.0

Quick followup - this is unrelated to Parquet, we're seeing the same problem with text-format tables that have stats.

Re: Cannot access Parquet tables with computed stats after upgrading to CDH 5.2.0

New Contributor

Im having the same problem. Any ideas?