Reply
Explorer
Posts: 10
Registered: ‎05-20-2014

Cannot access Parquet tables with computed stats after upgrading to CDH 5.2.0

Hello,

 

Something very strange has happened to our data after upgrading to CDH 5.2.0.  I haven't been able to access table data or metadata on any Parquet-formatted tables with computed stats - every command fails with an identical exception. Everything is still accessible in Hive, but not Impala. At first I thought that I had corrupted our Hive metastore during the upgrade, but I've been able to reproduce this using a freshly cretaed metastore DB and newly created data. Everything is fine until the COMPUTE STATS is issued (or when accessing existing Parquet data with stats computed on an earlier CDH5 release). I've created a pastebin illustrating the problem here : http://pastebin.com/PQUHJ2Nq . The only useful info in the Hive and Impala log files is a truncated stack trace of the exception (from catalogd.INFO) :

 

I1018 21:08:23.137027 32731 TableLoader.java:60] Loading metadata for: default.pqtest2
I1018 21:08:23.137629 32731 HiveMetaStoreClient.java:308] Trying to connect to metastore with URI thrift://582540-master1.vmmplatform.com:9083
I1018 21:08:23.138365 32731 HiveMetaStoreClient.java:396] Connected to metastore.
E1018 21:08:23.218286 32731 HiveMetaStoreClient.java:411] Unable to shutdown local metastore client
Java exception follows:
org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe
at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:161)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)
at com.facebook.fb303.FacebookService$Client.send_shutdown(FacebookService.java:431)
at com.facebook.fb303.FacebookService$Client.shutdown(FacebookService.java:425)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.close(HiveMetaStoreClient.java:408)
at com.cloudera.impala.catalog.MetaStoreClientPool$MetaStoreClient.release(MetaStoreClientPool.java:89)
at com.cloudera.impala.catalog.TableLoader.load(TableLoader.java:102)
at com.cloudera.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:232)
at com.cloudera.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:229)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at org.apache.thrift.transport.TIOStreamTransport.flush(TIOStreamTransport.java:159)
... 12 more
I1018 21:09:48.626922 31619 catalog-server.cc:228] Catalog Version: 12 Last Catalog Version: 12

 

Any help? Fortunately we are using external tables and have been able to work around this by dropping and re-adding tables/partitions, but we're at a loss as to how a simple compute stats statement could render both data and metadata completely inaccessible. 

 

Thanks,

 

Charlie

 

 

Explorer
Posts: 10
Registered: ‎05-20-2014

Re: Cannot access Parquet tables with computed stats after upgrading to CDH 5.2.0

Quick followup - this is unrelated to Parquet, we're seeing the same problem with text-format tables that have stats.

New Contributor
Posts: 4
Registered: ‎01-27-2014

Re: Cannot access Parquet tables with computed stats after upgrading to CDH 5.2.0

Im having the same problem. Any ideas?