Created 10-12-2018 10:32 AM
I tried 2 option:
1#. When trying to create parquet table in Hive 3.1 through Spark 2.3, Spark throws below error.
df.write.format("parquet").mode("overwrite").saveAsTable("database_name.test1") --> This throws below error Error : pyspark.sql.utils.AnalysisException: u'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Table datamart.test1 failed strict managed table checks due to the following reason: Table is marked as a managed table but is not transactional.);'
2#. Successfully able to insert data into existing parquet table and retrieve through Spark.
df.write.format("parquet").mode("overwrite").insertInto("database_name.test2") --> This works fine, loaded data can be retrieved from Spark but NOT Hive spark.sql("select * from database_name.test2").show() --> This works fine spark.read.parquet("/path-to-table-dir/part-00000.snappy.parquet").show() --> This works fine
But when I try to read the same table through Hive, Hive session gets disconnected and throws below error.
SELECT * FROM database_name.test2 Error : org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376) at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453) at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435) at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_FetchResults(TCLIService.java:567) at org.apache.hive.service.rpc.thrift.TCLIService$Client.FetchResults(TCLIService.java:554) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1572) at com.sun.proxy.$Proxy22.FetchResults(Unknown Source) at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:373) at org.apache.hive.beeline.BufferedRows.<init>(BufferedRows.java:56) at org.apache.hive.beeline.IncrementalRowsWithNormalization.<init>(IncrementalRowsWithNormalization.java:50) at org.apache.hive.beeline.BeeLine.print(BeeLine.java:2250) at org.apache.hive.beeline.Commands.executeInternal(Commands.java:1026) at org.apache.hive.beeline.Commands.execute(Commands.java:1201) at org.apache.hive.beeline.Commands.sql(Commands.java:1130) at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1425) at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1287) at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1071) at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:538) at org.apache.hive.beeline.BeeLine.main(BeeLine.java:520) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:318) at org.apache.hadoop.util.RunJar.main(RunJar.java:232) Unknown HS2 problem when communicating with Thrift server. Error: org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe (Write failed) (state=08S01,code=0)
After this error Hive session gets disconnected and I have to re-connect. All other queries are working fine, only this query is showing above error and getting disconnected.
Environment details:
Horotonworks HDP3.0
Spark 2.3.1
Hive 3.1
Created 10-14-2018 03:35 PM
There is a architecture change in HDP 3.0. Since all the hive tables are transactional by default there is a different way to integrate spark and hive. Hive Warehouse Connector needs to be used to connect to hive tables.
Hope this helps.
Created 11-08-2018 08:41 AM
@Shantanu Sharma If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.