Created on 12-12-2016 09:36 AM - edited 09-16-2022 03:50 AM
Hi
I have CDH 5.7 and Kerberos, Sentry, Hive and Spark.
I've tried to create table in Hive from DF in Spark and it was created, but nothing but sqlContext can read it back.
During creation I get this WARNING:
scala> val df = sqlContext.sql("SELECT * FROM myschema.mytab") df: org.apache.spark.sql.DataFrame = [browserid: int, browser: string] scala> df.write.format("parquet").saveAsTable("myschema.mytab_v2") SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/jars/hive-exec-1.1.0-cdh5.7.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/jars/hive-jdbc-1.1.0-cdh5.7.0-standalone.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/jars/parquet-format-2.1.0-cdh5.7.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/jars/parquet-hadoop-bundle-1.5.0-cdh5.7.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/jars/parquet-pig-bundle-1.5.0-cdh5.7.0.jar!/shaded/parquet/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [shaded.parquet.org.slf4j.helpers.NOPLoggerFactory] 16/12/12 16:40:28 WARN hive.HiveContext$$anon$2: Could not persist `myschema`.`mytab_v2` in a Hive compatible way. Persisting it into Hive metastore in Spark SQL specific format. org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:User isegrim does not have privileges for CREATETABLE) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:759) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:716) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply$mcV$sp(ClientWrapper.scala:415) ... at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: MetaException(message:User isegrim does not have privileges for CREATETABLE) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:29992) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:29960) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result.read(ThriftHiveMetastore.java:29886) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) ...
The table is written to HiveMetastore, and can be read by Spark from Spark sqlContext:
scala> sqlContext.sql("select count(1) from myschema.mytable_v2").show(200, false) +---+ |_c0| +---+ |107| +---+
But can not be read from beeline and Hue:
0: jdbc:hive2://myhs2:10> select count(1) from myschema.mytable_v2; ... Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
Table schema has some spark properties:
SHOW CREATE TABLE myschema.mytable_v2; | CREATE TABLE `mytable_v2`( | `col` array<string> COMMENT 'from deserializer') | ROW FORMAT SERDE | 'org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe' | WITH SERDEPROPERTIES ( | 'path'='hdfs://myns/user/hive/warehouse/myschema.db/mytable_v2') | STORED AS INPUTFORMAT | 'org.apache.hadoop.mapred.SequenceFileInputFormat' | OUTPUTFORMAT | 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' | LOCATION | 'hdfs://myns/user/hive/warehouse/myschema.db/mytable_v2' | TBLPROPERTIES ( | 'COLUMN_STATS_ACCURATE'='false', | 'EXTERNAL'='FALSE', | 'numFiles'='2', | 'numRows'='-1', | 'rawDataSize'='-1', | 'spark.sql.sources.provider'='parquet', | 'spark.sql.sources.schema.numParts'='1', | 'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"myfield\"}]}', | 'totalSize'='2300', | 'transient_lastDdlTime'='1481557228')
Even when I set Spark as execution engine for Hive it won't work:
0: jdbc:hive2://myhs2:10> set hive.execution.engine=spark; 0: jdbc:hive2://myhs2:10> select count(1) from myschema.mytable_v2; ... Status: Running (Hive on Spark job[0]) INFO : Job Progress Format CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost] INFO : 2016-12-12 18:00:43,189 Stage-0_0: 0(+2)/2 Stage-1_0: 0/1 INFO : 2016-12-12 18:00:45,205 Stage-0_0: 0(+2,-4)/2 Stage-1_0: 0/1 ERROR : Status: Failed ERROR : FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask INFO : Completed executing command(queryId=hive_20161212180000_4d80266f-15f5-4fe2-b044-c9324fd75ba6); Time taken: 25.74 seconds Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask (state=08S01,code=3)
User isegrim is in ldap group mygroup:
# id isegrim uid=1001(isegrim) gid=501(mygroup) groups=501(mygroup)
Sentry posess role mygroup, to which group mygroup is attached:
0: jdbc:hive2://myhs2:10> show role grant group mygroup; INFO : OK +-------+---------------+-------------+----------+--+ | role | grant_option | grant_time | grantor | +-------+---------------+-------------+----------+--+ | mygroup | false | NULL | -- | +-------+---------------+-------------+----------+--+
and this role mygroup has granted ALL privilege to myschema:
0: jdbc:hive2://myhs2:10> show grant role mygroup; INFO : OK +-------------------------------------------+--------+------------+---------+-----------------+-----------------+------------+---------------+-------------------+----------+--+ | database | table | partition | column | principal_name | principal_type | privilege | grant_option | grant_time | grantor | +-------------------------------------------+--------+------------+---------+-----------------+-----------------+------------+---------------+-------------------+----------+--+ | default | | | | mygroup | ROLE | select | false | 1473364008245000 | -- | | myschema | | | | mygroup | ROLE | * | false | 1473364207568000 | -- | +-------------------------------------------+--------+------------+---------+-----------------+-----------------+------------+---------------+-------------------+----------+--+
I suppose all would be fine if (as WARNING says) I could persist `myschema`.`mytable_v2` in a Hive compatible way, instead persisting it into Hive metastore in Spark SQL specific format.
What to do to store tables in Hive compatible way from Spark DataFrame?
And why Sentry denies CREATETABLE in Hive compatible way, and grants in Spark SQL specific format?