Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Do not permit all users to see all Spark databases

Highlighted

Do not permit all users to see all Spark databases

New Contributor

Hello

 

We run an HDP-3.1 stack on a cluster with Spark 2.3 and Hive 3.x installed.

 

Goal

Certain Spark users shall only be able to access (list, query, create, modify) some databases. In no case they shall see any databases they do not have access to.

 

Performed attempts

1. We attempted to hide Spark databases for less privileged users. It cannot be done via Ranger Hive table policies as Spark's Hive integration has issues [1] and Ranger integration into Spark is apparently not actively pursued [2].

 

2. We tried to remove the Execute-Permission (x) from Spark Warehouse directory hdfs:/apps/spark/warehouse by Ranger HDFS-policies. When doing so, Spark refuses to list databases at all for any user. Worse, it even prevents access to any database even if database is specified by name and no listing would be necessary. Spark devs are aware of the issue and marked it as "Not a Problem" [3], therefore, HDFS-policies cannot be used to restrict access to Spark databases.

Explicitly specifying the Warehouse directory as suggested by [4] is not an option as the users cannot be bothered with HDFS paths in their DDL.

 

3. Using the HiveWarehouseConnector to funnel Spark workload through Hive is not an option, as HiveWarehouseConnector prevents proper splitting.

"The warehouse connector is not a drop-in replacement. I.e. sparks SQL optimization is broken at the boundaries:

hwx.sql("SELECT * from db.table").count

might run out of memory, whereas sparks native counterpart will work just fine." [1]

Therefore, we have to go for native Spark.

 

4. As a last resort we copied spark configuration files as well as hive-site.xml from HDP-3.1 stack to a location outside of the HDP stack, configured and started a separate Hive-MetaStore server binding to a dedicated port as well a separate Spark Thrift server hooked onto the separate Hive-MetaStore server. The MetaStore data is stored in a MySQL 5.7 database.

 

The configuration was set up like this (lists changes only, see below for full configuration).

 

spark-defaults.conf:

spark.sql.warehouse.dir /apps/spark2/warehouse

hive-site.xml:

<property>
<name>metastore.catalog.default</name>
<value>spark2</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>

<property>
<name>ambari.hive.db.schema.name</name>
<value>spark2</value>
</property>

<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>

<property>
<name>hive.metastore.uris</name>
<value>thrift://annamaster.lan.riscsw.shp:57002</value>
</property>

deleted:
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/warehouse/tablespace/managed/hive</value>
</property>

deleted:
<property>
<name>hive.metastore.warehouse.external.dir</name>
<value>/warehouse/tablespace/external/hive</value>
</property>

<property>
<name>hive.server2.thrift.http.port</name>
<value>22002</value>
</property>

<property>
<name>hive.server2.thrift.port</name>
<value>21002</value>
</property>

<property>
<name>hive.users.in.admin.role</name>
<value>root,hive</value>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://damaster/spark2?createDatabaseIfNotExist=true</value>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>

<property>
<name>metastore.create.as.acid</name>
<value>false</value>
</property>

spark-env.sh:

 

export SPARK_PID_DIR=${SPARK_PID_DIR:-/var/run/spark2}


The MetaStore server is started like this under user hive:

export HIVE_CONF_DIR=/etc/custom-spark2
export HADOOP_CLASSPATH=$HIVE_CONF_DIR:$(hadoop classpath)
/usr/hdp/current/hive-metastore/bin/hive --service metastore -p 20002 &

The Thrift server is started like this under user spark:

export SPARK_CONF_DIR=/etc/custom-spark2
export SPARK_DIST_CLASSPATH=$SPARK_CONF_DIR:$(hadoop classpath)
export SPARK_PID_DIR=/var/run/custom-spark2


The log exhibits the following error:

...
20/03/10 11:20:40 INFO HiveClientImpl: Warehouse location for Hive client (version 3.0.0) is /apps/spark2/warehouse
20/03/10 11:20:41 INFO HiveMetaStoreClient: Trying to connect to metastore with URI thrift://damaster:21002
20/03/10 11:20:41 INFO HiveMetaStoreClient: Opened a connection to metastore, current connections: 1
20/03/10 11:20:41 INFO HiveMetaStoreClient: Connected to metastore.
20/03/10 11:20:41 INFO RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=spark (auth:SIMPLE) retries=24 delay=5 lifetime=0
20/03/10 11:20:41 INFO HiveMetaStore: 7: source:10.32.0.91 get_all_functions
20/03/10 11:20:41 INFO audit: ugi=spark ip=10.32.0.91 cmd=source:10.32.0.91 get_all_functions
20/03/10 11:20:41 INFO HiveMetaStore: 7: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
20/03/10 11:20:41 INFO ObjectStore: RawStore: org.apache.hadoop.hive.metastore.ObjectStore@30b90086, with PersistenceManager: null will be shutdown
20/03/10 11:20:41 INFO ObjectStore: ObjectStore, initialize called
20/03/10 11:20:41 INFO ObjectStore: RawStore: org.apache.hadoop.hive.metastore.ObjectStore@30b90086, with PersistenceManager: org.datanucleus.api.jdo.JDOPersistenceManager@7f803264 created in the thread with id: 349
20/03/10 11:20:41 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
20/03/10 11:20:41 INFO ObjectStore: Initialized ObjectStore
20/03/10 11:20:41 INFO HiveMetaStore: Created RawStore: org.apache.hadoop.hive.metastore.ObjectStore@30b90086 from thread id: 349
20/03/10 11:20:41 INFO HiveMetaStore: 7: source:10.32.0.91 get_database: @spark2#default
20/03/10 11:20:41 INFO audit: ugi=spark ip=10.32.0.91 cmd=source:10.32.0.91 get_database: @spark2#default
20/03/10 11:20:41 WARN ObjectStore: Failed to get database spark2.default, returning NoSuchObjectException
20/03/10 11:20:41 INFO HiveMetaStore: 7: source:10.32.0.91 create_database: Database(name:default, description:default database, locationUri:/apps/spark2/warehouse, parameters:{}, catalogName:spark2)
20/03/10 11:20:41 INFO audit: ugi=spark ip=10.32.0.91 cmd=source:10.32.0.91 create_database: Database(name:default, description:default database, locationUri:/apps/spark2/warehouse, parameters:{}, catalogName:spark2)
20/03/10 11:20:41 WARN ObjectStore: Failed to get database spark2.default, returning NoSuchObjectException
20/03/10 11:20:41 ERROR HiveMetaStore: No such catalog spark2
...

The Thrift server does start up, though.

 

When we run pyspark as an unprivileged user (with full access to all databases).

export SPARK_CONF_DIR=/etc/custom-spark2
export SPARK_DIST_CLASSPATH=$SPARK_CONF_DIR:$(hadoop classpath)
pyspark

spark.sql("show databases").show(10000, False)

yields the same error

...
py4j.protocol.Py4JJavaError: An error occurred while calling o66.sql.
: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: InvalidObjectException(message:No such catalog spark2);
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
...
Caused by: InvalidObjectException(message:No such catalog spark2)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_database_result$create_database_resultStandardScheme.read(ThriftHiveMetastore.java:40426)


This is where I'm puzzled! We specified the correct database connection and schema with

javax.jdo.option.ConnectionURL=jdbc:mysql://damaster/spark2?createDatabaseIfNotExist=true

and initialised the schema with (under user hive)

export HIVE_CONF_DIR=/usr/local/etc/spark2
unset HADOOP_CLASSPATH
/usr/hdp/current/hive-metastore/bin/schematool -initOrUpgradeSchema -dbType mysql

which actually created a schema "spark2" with spark2.DBS containing

*************************** 1. row ***************************
DB_ID: 1
DESC: Default Hive database
DB_LOCATION_URI: hdfs://damaster/user/hive/warehouse
NAME: default
OWNER_NAME: public
OWNER_TYPE: ROLE
CTLG_NAME: hive
1 row in set (0.00 sec)


Even though /user/hive/warehouse does not exist the error message "No such catalog spark2" suggests that HiveMetaStore cannot find the MySQL database at all.

Question

What are we missing? Which settings should be applied to hive-site.xml and/or spark-defaults.conf in order to make HiveMetaStore connect to the "spark2" database properly?


[1] https://georgheiler.com/2019/12/10/spark-and-hive-3/
[2] https://issues.apache.org/jira/browse/SPARK-24503
[3] https://issues.apache.org/jira/browse/SPARK-29078
[4] https://community.cloudera.com/t5/Support-Questions/How-can-I-specify-table-output-directory-using-S...


-- spark-defaults.conf

# Generated by Apache Ambari. Wed Mar 4 10:13:38 2020

spark.datasource.hive.warehouse.load.staging.dir /tmp
spark.datasource.hive.warehouse.metastoreUri thrift://damaster:9083
spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.eventLog.dir hdfs://daha/spark2-history/
spark.eventLog.enabled true
spark.executor.extraJavaOptions -XX:+UseNUMA
spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.hadoop.hive.llap.daemon.service.hosts @llap0
spark.hadoop.hive.zookeeper.quorum damaster:2181,daslave01:2181,daslave01:2181,daslave03:2181
spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.interval 7d
spark.history.fs.cleaner.maxAge 90d
spark.history.fs.logDirectory hdfs://daha/spark2-history/
spark.history.kerberos.keytab none
spark.history.kerberos.principal none
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.store.path /var/lib/spark2/shs_db
spark.history.ui.port 18081
spark.io.compression.lz4.blockSize 128kb
spark.jars hdfs:///user/spark/jars/elasticsearch-hadoop-7.3.2.jar,hdfs:///user/spark/jars/spark-avro_2.11-4.0.0.jar,file:///usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar,hdfs:///user/spark/jars/sqlite-jdbc-3.27.2.1.jar
spark.master yarn
spark.shuffle.file.buffer 1m
spark.shuffle.io.backLog 8192
spark.shuffle.io.serverThreads 128
spark.shuffle.unsafe.file.output.buffer 5m
spark.sql.autoBroadcastJoinThreshold 26214400
spark.sql.hive.convertMetastoreOrc true
spark.sql.hive.hiveserver2.jdbc.url jdbc:hive2://damaster:2181,daslave01:2181,daslave02:2181,daslave03:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive
spark.sql.hive.metastore.jars /usr/hdp/current/spark2-client/standalone-metastore/*
spark.sql.hive.metastore.version 3.0
spark.sql.orc.filterPushdown true
spark.sql.orc.impl native
spark.sql.statistics.fallBackToHdfs true
spark.sql.warehouse.dir /apps/spark2/warehouse
spark.submit.pyFiles /usr/hdp/current/hive_warehouse_connector/pyspark_hwc-1.0.0.3.1.0.0-78.zip
spark.unsafe.sorter.spill.reader.buffer.size 1m
spark.yarn.historyServer.address damaster:18081
spark.yarn.queue default
#spark.executor.extraJavaOptions -Dhdp.version=3.1.0.0-78
#spark.driver.extraJavaOptions -Dhdp.version=3.1.0.0-78
#spark.yarn.am.extraJavaOptions -Dhdp.version=3.1.0.0-78
spark.sql.warehouse.dir /apps/spark2/warehouse
#spark.sql.catalogImplementation hive

-- hive-site.xml

<configuration xmlns:xi="http://www.w3.org/2001/XInclude">

<property>
<name>metastore.catalog.default</name>
<value>spark2</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>

<property>
<name>ambari.hive.db.schema.name</name>
<value>spark2</value>
</property>

<property>
<name>atlas.hook.hive.maxThreads</name>
<value>1</value>
</property>

<property>
<name>atlas.hook.hive.minThreads</name>
<value>1</value>
</property>

<property>
<name>credentialStoreClassPath</name>
<value>/var/lib/ambari-agent/cred/lib/*</value>
</property>

<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>

<property>
<name>datanucleus.cache.level2.type</name>
<value>none</value>
</property>

<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>

<property>
<name>hadoop.security.credential.provider.path</name>
<value>jceks://file/usr/hdp/current/hive-server2/conf/hive-site.jceks</value>
</property>

<property>
<name>hive.auto.convert.join</name>
<value>true</value>
</property>

<property>
<name>hive.auto.convert.join.noconditionaltask</name>
<value>true</value>
</property>

<property>
<name>hive.auto.convert.join.noconditionaltask.size</name>
<value>2004318071</value>
</property>

<property>
<name>hive.auto.convert.sortmerge.join</name>
<value>true</value>
</property>

<property>
<name>hive.auto.convert.sortmerge.join.to.mapjoin</name>
<value>true</value>
</property>

<property>
<name>hive.cbo.enable</name>
<value>true</value>
</property>

<property>
<name>hive.cli.print.header</name>
<value>false</value>
</property>

<!--
-->
<property>
<name>hive.cluster.delegation.token.store.class</name>
<value>org.apache.hadoop.hive.thrift.ZooKeeperTokenStore</value>
</property>

<property>
<name>hive.cluster.delegation.token.store.zookeeper.connectString</name>
<value>damaster:2181,daslave01:2181,daslave02:2181,daslave03:2181,daslave04:2181,daslave05:2181,daslave06:2181</value>
</property>

<property>
<name>hive.cluster.delegation.token.store.zookeeper.znode</name>
<value>/hive/cluster/delegation</value>
</property>
<!--
-->

<property>
<name>hive.compactor.abortedtxn.threshold</name>
<value>1000</value>
</property>

<property>
<name>hive.compactor.check.interval</name>
<value>300</value>
</property>

<property>
<name>hive.compactor.delta.num.threshold</name>
<value>10</value>
</property>

<property>
<name>hive.compactor.delta.pct.threshold</name>
<value>0.1f</value>
</property>

<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
</property>

<property>
<name>hive.compactor.worker.threads</name>
<value>1</value>
</property>

<property>
<name>hive.compactor.worker.timeout</name>
<value>86400</value>
</property>

<property>
<name>hive.compute.query.using.stats</name>
<value>true</value>
</property>

<property>
<name>hive.conf.restricted.list</name>
<value>hive.security.authenticator.manager,hive.security.authorization.manager,hive.users.in.admin.role</value>
</property>

<property>
<name>hive.convert.join.bucket.mapjoin.tez</name>
<value>false</value>
</property>

<property>
<name>hive.create.as.insert.only</name>
<value>true</value>
</property>

<property>
<name>hive.default.fileformat</name>
<value>TextFile</value>
</property>

<property>
<name>hive.default.fileformat.managed</name>
<value>ORC</value>
</property>

<property>
<name>hive.driver.parallel.compilation</name>
<value>true</value>
</property>

<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>

<property>
<name>hive.enforce.sorting</name>
<value>true</value>
</property>

<property>
<name>hive.enforce.sortmergebucketmapjoin</name>
<value>true</value>
</property>

<property>
<name>hive.exec.compress.intermediate</name>
<value>false</value>
</property>

<property>
<name>hive.exec.compress.output</name>
<value>false</value>
</property>

<property>
<name>hive.exec.dynamic.partition</name>
<value>true</value>
</property>

<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>

<property>
<name>hive.exec.failure.hooks</name>
<value></value>
</property>

<property>
<name>hive.exec.max.created.files</name>
<value>100000</value>
</property>

<property>
<name>hive.exec.max.dynamic.partitions</name>
<value>5000</value>
</property>

<property>
<name>hive.exec.max.dynamic.partitions.pernode</name>
<value>2000</value>
</property>

<property>
<name>hive.exec.orc.compression.strategy</name>
<value>SPEED</value>
</property>

<property>
<name>hive.exec.orc.default.compress</name>
<value>ZLIB</value>
</property>

<property>
<name>hive.exec.orc.default.stripe.size</name>
<value>67108864</value>
</property>

<property>
<name>hive.exec.orc.encoding.strategy</name>
<value>SPEED</value>
</property>

<property>
<name>hive.exec.orc.split.strategy</name>
<value>HYBRID</value>
</property>

<property>
<name>hive.exec.parallel</name>
<value>false</value>
</property>

<property>
<name>hive.exec.parallel.thread.number</name>
<value>8</value>
</property>

<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.hadoop.hive.ql.hooks.HiveProtoLoggingHook</value>
</property>

<property>
<name>hive.exec.pre.hooks</name>
<value></value>
</property>

<property>
<name>hive.exec.reducers.bytes.per.reducer</name>
<value>67108864</value>
</property>

<property>
<name>hive.exec.reducers.max</name>
<value>1009</value>
</property>

<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive</value>
</property>

<property>
<name>hive.exec.submit.local.task.via.child</name>
<value>true</value>
</property>

<property>
<name>hive.exec.submitviachild</name>
<value>false</value>
</property>

<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>

<property>
<name>hive.execution.mode</name>
<value>container</value>
</property>

<property>
<name>hive.fetch.task.aggr</name>
<value>false</value>
</property>

<property>
<name>hive.fetch.task.conversion</name>
<value>more</value>
</property>

<property>
<name>hive.fetch.task.conversion.threshold</name>
<value>1073741824</value>
</property>

<property>
<name>hive.heapsize</name>
<value>1024</value>
</property>

<property>
<name>hive.hook.proto.base-directory</name>
<value>/warehouse/tablespace/external/hive/sys.db/query_data/</value>
</property>

<property>
<name>hive.limit.optimize.enable</name>
<value>false</value>
</property>

<property>
<name>hive.limit.pushdown.memory.usage</name>
<value>0.04</value>
</property>

<property>
<name>hive.load.data.owner</name>
<value>hive</value>
</property>

<property>
<name>hive.lock.manager</name>
<value></value>
</property>

<property>
<name>hive.map.aggr</name>
<value>true</value>
</property>

<property>
<name>hive.map.aggr.hash.force.flush.memory.threshold</name>
<value>0.9</value>
</property>

<property>
<name>hive.map.aggr.hash.min.reduction</name>
<value>0.5</value>
</property>

<property>
<name>hive.map.aggr.hash.percentmemory</name>
<value>0.5</value>
</property>

<property>
<name>hive.mapjoin.bucket.cache.size</name>
<value>10000</value>
</property>

<property>
<name>hive.mapjoin.hybridgrace.hashtable</name>
<value>false</value>
</property>

<property>
<name>hive.mapjoin.optimized.hashtable</name>
<value>true</value>
</property>

<property>
<name>hive.mapred.reduce.tasks.speculative.execution</name>
<value>false</value>
</property>

<property>
<name>hive.materializedview.rewriting.incremental</name>
<value>false</value>
</property>

<property>
<name>hive.merge.mapfiles</name>
<value>true</value>
</property>

<property>
<name>hive.merge.mapredfiles</name>
<value>false</value>
</property>

<property>
<name>hive.merge.orcfile.stripe.level</name>
<value>true</value>
</property>

<property>
<name>hive.merge.rcfile.block.level</name>
<value>true</value>
</property>

<property>
<name>hive.merge.size.per.task</name>
<value>256000000</value>
</property>

<property>
<name>hive.merge.smallfiles.avgsize</name>
<value>16000000</value>
</property>

<property>
<name>hive.merge.tezfiles</name>
<value>false</value>
</property>

<property>
<name>hive.metastore.authorization.storage.checks</name>
<value>false</value>
</property>

<property>
<name>hive.metastore.cache.pinobjtypes</name>
<value>Table,Database,Type,FieldSchema,Order</value>
</property>

<property>
<name>hive.metastore.client.connect.retry.delay</name>
<value>5s</value>
</property>

<property>
<name>hive.metastore.client.socket.timeout</name>
<value>1800s</value>
</property>

<property>
<name>hive.metastore.connect.retries</name>
<value>24</value>
</property>

<property>
<name>hive.metastore.db.type</name>
<value>MYSQL</value>
</property>

<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>

<property>
<name>hive.metastore.event.listeners</name>
<value></value>
</property>

<property>
<name>hive.metastore.execute.setugi</name>
<value>true</value>
</property>

<property>
<name>hive.metastore.failure.retries</name>
<value>24</value>
</property>

<property>
<name>hive.metastore.kerberos.keytab.file</name>
<value>/etc/security/keytabs/hive.service.keytab</value>
</property>

<property>
<name>hive.metastore.kerberos.principal</name>
<value>hive/_HOST@EXAMPLE.COM</value>
</property>

<property>
<name>hive.metastore.pre.event.listeners</name>
<value>org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener</value>
</property>

<property>
<name>hive.metastore.sasl.enabled</name>
<value>false</value>
</property>

<property>
<name>hive.metastore.server.max.threads</name>
<value>100000</value>
</property>

<property>
<name>hive.metastore.uris</name>
<value>thrift://damaster:20002</value>
</property>

<property>
<name>hive.optimize.bucketmapjoin</name>
<value>true</value>
</property>

<property>
<name>hive.optimize.bucketmapjoin.sortedmerge</name>
<value>false</value>
</property>

<property>
<name>hive.optimize.constant.propagation</name>
<value>true</value>
</property>

<property>
<name>hive.optimize.dynamic.partition.hashjoin</name>
<value>true</value>
</property>

<property>
<name>hive.optimize.index.filter</name>
<value>true</value>
</property>

<property>
<name>hive.optimize.metadataonly</name>
<value>true</value>
</property>

<property>
<name>hive.optimize.null.scan</name>
<value>true</value>
</property>

<property>
<name>hive.optimize.reducededuplication</name>
<value>true</value>
</property>

<property>
<name>hive.optimize.reducededuplication.min.reducer</name>
<value>4</value>
</property>

<property>
<name>hive.optimize.sort.dynamic.partition</name>
<value>false</value>
</property>

<property>
<name>hive.orc.compute.splits.num.threads</name>
<value>10</value>
</property>

<property>
<name>hive.orc.splits.include.file.footer</name>
<value>false</value>
</property>

<property>
<name>hive.prewarm.enabled</name>
<value>false</value>
</property>

<property>
<name>hive.prewarm.numcontainers</name>
<value>3</value>
</property>

<property>
<name>hive.security.authenticator.manager</name>
<value>org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator</value>
</property>

<property>
<name>hive.security.authorization.enabled</name>
<value>true</value>
</property>

<property>
<name>hive.security.authorization.manager</name>
<value>org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory</value>
</property>

<property>
<name>hive.security.metastore.authenticator.manager</name>
<value>org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator</value>
</property>

<property>
<name>hive.security.metastore.authorization.auth.reads</name>
<value>true</value>
</property>

<property>
<name>hive.security.metastore.authorization.manager</name>
<value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value>
</property>

<property>
<name>hive.server2.allow.user.substitution</name>
<value>true</value>
</property>

<property>
<name>hive.server2.authentication</name>
<value>NONE</value>
</property>

<property>
<name>hive.server2.authentication.spnego.keytab</name>
<value>HTTP/_HOST@EXAMPLE.COM</value>
</property>

<property>
<name>hive.server2.authentication.spnego.principal</name>
<value>/etc/security/keytabs/spnego.service.keytab</value>
</property>

<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>

<property>
<name>hive.server2.idle.operation.timeout</name>
<value>6h</value>
</property>

<property>
<name>hive.server2.idle.session.timeout</name>
<value>1d</value>
</property>

<property>
<name>hive.server2.logging.operation.enabled</name>
<value>true</value>
</property>

<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/tmp/hive/operation_logs</value>
</property>

<property>
<name>hive.server2.max.start.attempts</name>
<value>5</value>
</property>

<property>
<name>hive.server2.support.dynamic.service.discovery</name>
<value>true</value>
</property>

<property>
<name>hive.server2.table.type.mapping</name>
<value>CLASSIC</value>
</property>

<property>
<name>hive.server2.tez.default.queues</name>
<value>default</value>
</property>

<property>
<name>hive.server2.tez.initialize.default.sessions</name>
<value>false</value>
</property>

<property>
<name>hive.server2.tez.sessions.per.default.queue</name>
<value>1</value>
</property>

<property>
<name>hive.server2.thrift.http.path</name>
<value>cliservice</value>
</property>

<property>
<name>hive.server2.thrift.http.port</name>
<value>22002</value>
</property>

<property>
<name>hive.server2.thrift.max.worker.threads</name>
<value>500</value>
</property>

<property>
<name>hive.server2.thrift.port</name>
<value>21002</value>
</property>

<property>
<name>hive.server2.thrift.sasl.qop</name>
<value>auth</value>
</property>

<property>
<name>hive.server2.transport.mode</name>
<value>binary</value>
</property>

<property>
<name>hive.server2.use.SSL</name>
<value>false</value>
</property>

<property>
<name>hive.server2.webui.cors.allowed.headers</name>
<value>X-Requested-With,Content-Type,Accept,Origin,X-Requested-By,x-requested-by</value>
</property>

<property>
<name>hive.server2.webui.enable.cors</name>
<value>true</value>
</property>

<property>
<name>hive.server2.webui.port</name>
<value>10002</value>
</property>

<property>
<name>hive.server2.webui.use.ssl</name>
<value>false</value>
</property>

<property>
<name>hive.server2.zookeeper.namespace</name>
<value>hiveserver2</value>
</property>

<property>
<name>hive.service.metrics.codahale.reporter.classes</name>
<value>org.apache.hadoop.hive.common.metrics.metrics2.JsonFileMetricsReporter,org.apache.hadoop.hive.common.metrics.metrics2.JmxMetricsReporter,org.apache.hadoop.hive.common.metrics.metrics2.Metrics2Reporter</value>
</property>

<property>
<name>hive.smbjoin.cache.rows</name>
<value>10000</value>
</property>

<property>
<name>hive.start.cleanup.scratchdir</name>
<value>false</value>
</property>

<property>
<name>hive.stats.autogather</name>
<value>true</value>
</property>

<property>
<name>hive.stats.dbclass</name>
<value>fs</value>
</property>

<property>
<name>hive.stats.fetch.column.stats</name>
<value>true</value>
</property>

<property>
<name>hive.stats.fetch.partition.stats</name>
<value>true</value>
</property>

<property>
<name>hive.strict.managed.tables</name>
<value>true</value>
</property>

<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>

<property>
<name>hive.tez.auto.reducer.parallelism</name>
<value>true</value>
</property>

<property>
<name>hive.tez.bucket.pruning</name>
<value>true</value>
</property>

<property>
<name>hive.tez.cartesian-product.enabled</name>
<value>true</value>
</property>

<property>
<name>hive.tez.container.size</name>
<value>7168</value>
</property>

<property>
<name>hive.tez.cpu.vcores</name>
<value>-1</value>
</property>

<property>
<name>hive.tez.dynamic.partition.pruning</name>
<value>true</value>
</property>

<property>
<name>hive.tez.dynamic.partition.pruning.max.data.size</name>
<value>104857600</value>
</property>

<property>
<name>hive.tez.dynamic.partition.pruning.max.event.size</name>
<value>1048576</value>
</property>

<property>
<name>hive.tez.exec.print.summary</name>
<value>true</value>
</property>

<property>
<name>hive.tez.input.format</name>
<value>org.apache.hadoop.hive.ql.io.HiveInputFormat</value>
</property>

<property>
<name>hive.tez.input.generate.consistent.splits</name>
<value>true</value>
</property>

<property>
<name>hive.tez.java.opts</name>
<value>-server -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps</value>
</property>

<property>
<name>hive.tez.log.level</name>
<value>INFO</value>
</property>

<property>
<name>hive.tez.max.partition.factor</name>
<value>2.0</value>
</property>

<property>
<name>hive.tez.min.partition.factor</name>
<value>0.25</value>
</property>

<property>
<name>hive.tez.smb.number.waves</name>
<value>0.5</value>
</property>

<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>

<property>
<name>hive.txn.max.open.batch</name>
<value>1000</value>
</property>

<property>
<name>hive.txn.strict.locking.mode</name>
<value>false</value>
</property>

<property>
<name>hive.txn.timeout</name>
<value>300</value>
</property>

<property>
<name>hive.user.install.directory</name>
<value>/user/</value>
</property>

<property>
<name>hive.users.in.admin.role</name>
<value>root,hive</value>
</property>

<property>
<name>hive.vectorized.execution.enabled</name>
<value>true</value>
</property>

<property>
<name>hive.vectorized.execution.mapjoin.minmax.enabled</name>
<value>true</value>
</property>

<property>
<name>hive.vectorized.execution.mapjoin.native.enabled</name>
<value>true</value>
</property>

<property>
<name>hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled</name>
<value>true</value>
</property>

<property>
<name>hive.vectorized.execution.reduce.enabled</name>
<value>true</value>
</property>

<property>
<name>hive.vectorized.groupby.checkinterval</name>
<value>4096</value>
</property>

<property>
<name>hive.vectorized.groupby.flush.percent</name>
<value>0.1</value>
</property>

<property>
<name>hive.vectorized.groupby.maxentries</name>
<value>100000</value>
</property>

<property>
<name>hive.warehouse.subdir.inherit.perms</name>
<value>true</value>
</property>

<!--
-->
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
</property>

<property>
<name>hive.zookeeper.namespace</name>
<value>hive_zookeeper_namespace</value>
</property>

<property>
<name>hive.zookeeper.quorum</name>
<value>damaster:2181,daslave01:2181,daslave02:2181,daslave03:2181,daslave04:2181,daslave05:2181,daslave06:2181</value>
</property>
<!--
-->

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://damaster/spark2?createDatabaseIfNotExist=true</value>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>

<property>
<name>metastore.create.as.acid</name>
<value>false</value>
</property>

</configuration>

 

Don't have an account?
Coming from Hortonworks? Activate your account here