Community Articles

SpliceTony · ‎10-12-2016

This topic describes installing and configuring Splice Machine on a Hortonworks Ambari-managed cluster. Follow these steps:

Verify Prerequisites

Download and Install Splice Machine

Configure Hadoop Services

Make any needed Optional Configuration Modifications

Restart the Cluster

Verify your Splice Machine Installation

Verify Prerequisites

Before starting your Splice Machine installation, please make sure that your cluster contains the prerequisite software components:

A cluster running HDP 2.4.2
HBase installed
HDFS installed
YARN installed
ZooKeeper installed

Download and Install Splice Machine

Perform the following steps on each node in your cluster:

1.Create the splice installation directory:

sudo mkdir -p /opt/splice

2.Download the Splice Machine package into the splice directory on the node:

sudo curl 'https://s3.amazonaws.com/20snapshot/installer/2.0.1.23/hdp2.4.2/SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2...' -o /opt/splice/SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108.tar.gz

3.Extract the Splice Machine package:

sudo tar -xf SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108.tar.gz --directory /opt/splice

4.Create symbolic links:

sudo ln -sf /opt/splice/SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108 /opt/splice/default

sudo ln -sf /opt/splice/default/bin/sqlshell.sh /usr/bin/sqlshell.sh

NOTE: This Means that you can always access splice by simply entering sqlshell.sh on your command line.

sudo ln -sf /opt/splice/default/lib/spark-assembly-hadoop2.7.1.2.4.2.0-258-1.6.2.jar /usr/hdp/2.4.2.0-258/hadoop-yarn/lib/spark-assembly-hadoop2.7.1.2.4.2.0-258-1.6.2.jar

Configure Hadoop Services

Now it's time to make a few modifications in the Hadoop services configurations:

Configure and Restart ZooKeeper

Configure and Restart HDFS

Configure and Restart YARN

Configure MapReduce2

Configure and Restart HBASE

Configure and Restart ZooKeeper

To edit the ZooKeeper configuration, select the Services tab at the top of the Ambari dashboard screen, then click ZooKeeper in the Ambari in the left pane of the screen.

1.Select the Configs tab to configure ZooKeeper

2.Make configuration changes:

Scroll down to where you see Custom zoo.cfg and click Add Property to add the maxClientCnxns property and then again to add the maxSessionTimeout property, with these values:

maxClientCnxns = 0

maxSessionTimeout = 120000

3.Save Changes

Click the Save button to save your changes. You'll be prompted to optionally add a note such as Updated ZooKeeper configuration for Splice Machine. Click Save again.

4.Restart ZooKeeper

After you save your changes, you'll land back on the ZooKeeper Service Configs tab in Ambari. Then click the Restart drop-down in the upper right corner and select the Restart All action to restart ZooKeeper.Wait for the restart to complete.

Configure and Restart HDFS

To edit the HDFS configuration, select the Services tab at the top of the Ambari dashboard screen, then click HDFS in the Ambari in the left pane of the screen. Finally, click the Configs tab.

1.Edit the HDFS configuration as follows:

NameNode Java heap size	4 GB
DataNode maximum Java heap size	2 GB
Block replication	2 (for clusters with less than 8 nodes) 3 (for clusters with 8 or more nodes)

2.Update the custom hdfs-site.xml file:

3.dfs.datanode.handler.count = 20

3.Save Changes

Click the Save button to save your changes. You'll be prompted to optionally add a note such as Updated HDFS configuration for Splice Machine. Click Save again.

4.Create directories for hbase user and the Splice Machine YARN application:

Use your terminal window to create these directories:

sudo -iu hdfs hadoop fs -mkdir -p hdfs:///user/hbase hdfs:///user/splice/history

sudo -iu hdfs hadoop fs -chown -R hbase:hbase hdfs:///user/hbase hdfs:///user/splice

sudo -iu hdfs hadoop fs -chmod 1777 hdfs:///user/splice hdfs:///user/splice/history

5.Restart HDFS

Return to the HDFS Configs tab in Ambari. Then click the Restart drop-down in the upper right corner and select the Restart All action to restart HDFS. Confirm your action and then wait for the restart to complete.

Configure and Restart YARN

To edit the YARN configuration, select the Services tab at the top of the Ambari dashboard screen, then click YARN in the Ambari in the left pane of the screen. Finally, click the Configs tab.

1.Update these other configuration values:

Setting	New Value
yarn.application.classpath	$HADOOP_CONF_DIR, /usr/hdp/current/hadoop-client/, /usr/hdp/current/hadoop-client/lib/, /usr/hdp/current/hadoop-hdfs-client/, /usr/hdp/current/hadoop-hdfs-client/lib/, /usr/hdp/current/hadoop-yarn-client/, /usr/hdp/current/hadoop-yarn-client/lib/, /usr/hdp/current/hadoop-mapreduce-client/, /usr/hdp/current/hadoop-mapreduce-client/lib/, /usr/hdp/current/hbase-regionserver/, /usr/hdp/current/hbase-regionserver/lib/, /opt/splice/default/lib/*
yarn.nodemanager.aux-services.spark_shuffle.class	org.apache.spark.network.yarn.YarnShuffleService
yarn.nodemanager.delete.debug-delay-sec	86400
Memory allocated for all YARN containers on a node	30 GB (based on node specs)
Minimum Container Size (Memory)	1 GB (based on node specs)
Minimum Container Size (Memory)	30 GB (based on node specs)

2.Save Changes

Click the Save button to save your changes. You'll be prompted to optionally add a note such as Updated HDFS configuration for Splice Machine. Click Save again.

3.Restart YARN

Return to the YARNConfigs tab in Ambari. Then click the Restart drop-down in the upper right corner and select the Restart All action to restart YARN. Confirm your action and then wait for the restart to complete.

Configure MapReduce2

Ambari automatically sets these values for you:

Map Memory
Reduce Memory
Sort Allocation Memory
AppMaster Memory
MR Map Java Heap Size
MR Reduce Java Heap Size

Modify the HDP Version Information

Replace ${hdp.version} with the actual version number (e.g. 2.4.2.0-258) in these property values:

mapreduce.admin.map.child.java.opts
mapreduce.admin.reduce.child.java.opts
mapreduce.admin.user.env
mapreduce.application.classpath
mapreduce.application.framework.path
yarn.app.mapreduce.am.admin-command-opts
MR AppMaster Java Heap Size

Configure and Restart HBASE

To edit the HBASE configuration, click HBASE in the Cloudera Manager home screen, then click the Configuration tab and make these changes:

1.Change the values of these settings

Setting	New Value
% of RegionServer Allocated to Write Buffer (hbase.regionserver.global.memstore.size)	0.25
HBase RegionServer Maximum Memory (hbase_regionserver_heapsize)	24 GB
% of RegionServer Allocated to Read Buffers (hfile.block.cache.size)	0.25
HBase Master Maximum Memory (hbase_master_heapsize)	5 GB
Number of Handlers per RegionServer (hbase.regionserver.handler.count)	400
HBase RPC Timeout	1200000 (20 minutes)
Zookeeper Session Timeout	120000 (2 minutes)
hbase.coprocessor.master.classes	com.splicemachine.hbase.SpliceMasterObserver
hbase.coprocessor.region.classes	The value of this property is shown below, in Step 2
Maximum Store Files before Minor Compaction (hbase.hstore.compactionThreshold)	5
Number of Fetched Rows when Scanning from Disk (hbase.client.scanner.caching)	1000
hstore blocking storefiles (hbase.hstore.blockingStoreFiles)	20
Advanced hbase-env	The value of this property is shown below, in Step 3
Custom hbase-site	The value of this is shown below, in Step 4

2.Set the value of the hbase.coprocessor.region.classes property to the following:

com.splicemachine.hbase.MemstoreAwareObserver,com.splicemachine.derby.hbase.SpliceIndexObserver,com.splicemachine.derby.hbase.SpliceIndexEndpoint,com.splicemachine.hbase.RegionSizeEndpoint,com.splicemachine.si.data.hbase.coprocessor.TxnLifecycleEndpoint,com.splicemachine.si.data.hbase.coprocessor.SIObserver,com.splicemachine.hbase.BackupEndpointObserver

3.Replace the Advanced hbase-env property with the following:

# Set environment variables here.

# The java implementation to use. Java 1.6 required.

export JAVA_HOME={{java64_home}}

# HBase Configuration directory

export HBASE_CONF_DIR=${HBASE_CONF_DIR:-{{hbase_conf_dir}}}

# Extra Java CLASSPATH elements. Optional.

export HBASE_CLASSPATH=${HBASE_CLASSPATH}

# add Splice Machine to the HBase classpath

SPLICELIBDIR="/opt/splice/default/lib"

APPENDSTRING=$(echo $(find ${SPLICELIBDIR} -maxdepth 1 -name \*.jar | sort) | sed 's/ /:/g')

export HBASE_CLASSPATH="${HBASE_CLASSPATH}:${APPENDSTRING}"

# The maximum amount of heap to use, in MB. Default is 1000.

# export HBASE_HEAPSIZE=1000

# Extra Java runtime options.

# Below are what we set by default. May only work with SUN JVM.

# For more on why as well as other possible settings,

# see http://wiki.apache.org/hadoop/PerformanceTuning

export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:{{log_dir}}/gc.log-`date +'%Y%m%d%H%M'`"

# Uncomment below to enable java garbage collection logging.

# export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"

# Uncomment and adjust to enable JMX exporting

# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.

# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html

#

# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"

# If you want to configure BucketCache, specify '-XX: MaxDirectMemorySize=' with proper direct memory size

# export HBASE_THRIFT_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"

# export HBASE_ZOOKEEPER_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"

# File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default.

export HBASE_REGIONSERVERS=${HBASE_CONF_DIR}/regionservers

# Extra ssh options. Empty by default.

# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"

# Where log files are stored. $HBASE_HOME/logs by default.

export HBASE_LOG_DIR={{log_dir}}

# A string representing this instance of hbase. $USER by default.

# export HBASE_IDENT_STRING=$USER

# The scheduling priority for daemon processes. See 'man nice'.

# export HBASE_NICENESS=10

# The directory where pid files are stored. /tmp by default.

export HBASE_PID_DIR={{pid_dir}}

# Seconds to sleep between slave commands. Unset by default. This

# can be useful in large clusters, where, e.g., slave rsyncs can

# otherwise arrive faster than the master can service them.

# export HBASE_SLAVE_SLEEP=0.1

# Tell HBase whether it should manage it's own instance of Zookeeper or not.

export HBASE_MANAGES_ZK=false

{% if java_version < 8 %}

JDK_DEPENDED_OPTS="-XX:PermSize=512m -XX:MaxPermSize=512m"

{% endif %}

export HBASE_OPTS="${HBASE_OPTS} -XX:ErrorFile={{log_dir}}/hs_err_pid%p.log -Djava.io.tmpdir={{java_io_tmpdir}}"

export HBASE_MASTER_OPTS="${HBASE_MASTER_OPTS} -Xms{{master_heapsize}} -Xmx{{master_heapsize}} ${JDK_DEPENDED_OPTS} -XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=2g -XX:+AlwaysPreTouch -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=10101 -Dsplice.spark.enabled=true -Dsplice.spark.app.name=SpliceMachine -Dsplice.spark.master=yarn-client -Dsplice.spark.logConf=true -Dsplice.spark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory -Dsplice.spark.driver.maxResultSize=1g -Dsplice.spark.driver.memory=1g -Dsplice.spark.dynamicAllocation.enabled=true -Dsplice.spark.dynamicAllocation.executorIdleTimeout=600 -Dsplice.spark.dynamicAllocation.minExecutors=0 -Dsplice.spark.io.compression.lz4.blockSize=32k -Dsplice.spark.kryo.referenceTracking=false -Dsplice.spark.kryo.registrator=com.splicemachine.derby.impl.SpliceSparkKryoRegistrator -Dsplice.spark.kryoserializer.buffer.max=512m -Dsplice.spark.kryoserializer.buffer=4m -Dsplice.spark.locality.wait=100 -Dsplice.spark.scheduler.mode=FAIR -Dsplice.spark.serializer=org.apache.spark.serializer.KryoSerializer -Dsplice.spark.shuffle.compress=false -Dsplice.spark.shuffle.file.buffer=128k -Dsplice.spark.shuffle.memoryFraction=0.7 -Dsplice.spark.shuffle.service.enabled=true -Dsplice.spark.storage.memoryFraction=0.1 -Dsplice.spark.yarn.am.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native -Dsplice.spark.yarn.am.waitTime=10s -Dsplice.spark.yarn.executor.memoryOverhead=2048 -Dsplice.spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/etc/spark/conf/log4j.properties -Dsplice.spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native -Dsplice.spark.driver.extraClassPath=/usr/hdp/current/hbase-regionserver/conf:/usr/hdp/current/hbase-regionserver/lib/htrace-core-3.1.0-incubating.jar -Dsplice.spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/etc/spark/conf/log4j.properties -Dsplice.spark.executor.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native -Dsplice.spark.executor.extraClassPath=/usr/hdp/current/hbase-regionserver/conf:/usr/hdp/current/hbase-regionserver/lib/htrace-core-3.1.0-incubating.jar -Dsplice.spark.ui.retainedJobs=100 -Dsplice.spark.ui.retainedStages=100 -Dsplice.spark.worker.ui.retainedExecutors=100 -Dsplice.spark.worker.ui.retainedDrivers=100 -Dsplice.spark.streaming.ui.retainedBatches=100 -Dsplice.spark.executor.cores=4 -Dsplice.spark.executor.memory=8g -Dspark.compaction.reserved.slots=4 -Dsplice.spark.eventLog.enabled=true -Dsplice.spark.eventLog.dir=hdfs:///user/splice/history -Dsplice.spark.local.dir=/diska/tmp,/diskb/tmp,/diskc/tmp,/diskd/tmp"

export HBASE_REGIONSERVER_OPTS="${HBASE_REGIONSERVER_OPTS} -Xmn{{regionserver_xmn_size}} -Xms{{regionserver_heapsize}} -Xmx{{regionserver_heapsize}} ${JDK_DEPENDED_OPTS} -XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=2g -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:MaxNewSize=4g -XX:InitiatingHeapOccupancyPercent=60 -XX:ParallelGCThreads=24 -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=5000 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=10102"

{% if security_enabled %}

export HBASE_OPTS="${HBASE_OPTS} -Djava.security.auth.login.config={{client_jaas_config_file}}"

export HBASE_MASTER_OPTS="${HBASE_MASTER_OPTS} -Djava.security.auth.login.config={{master_jaas_config_file}}"

export HBASE_REGIONSERVER_OPTS="${HBASE_REGIONSERVER_OPTS} -Djava.security.auth.login.config={{regionserver_jaas_config_file}}"

{% endif %}

# HBase off-heap MaxDirectMemorySize

export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS {% if hbase_max_direct_memory_size %} -XX:MaxDirectMemorySize={{hbase_max_direct_memory_size}}m {% endif %}"

4.Add the hbase-site property, setting its value to the following:

dfs.client.read.shortcircuit.buffer.size=131072

hbase.balancer.period=60000

hbase.client.ipc.pool.size=10

hbase.client.max.perregion.tasks=100

hbase.coprocessor.regionserver.classes=com.splicemachine.hbase.RegionServerLifecycleObserver

hbase.hstore.compaction.max.size=260046848

hbase.hstore.compaction.min.size=16777216

hbase.hstore.compaction.min=5

hbase.hstore.defaultengine.compactionpolicy.class=com.splicemachine.compactions.SpliceDefaultCompactionPolicy

hbase.hstore.defaultengine.compactor.class=com.splicemachine.compactions.SpliceDefaultCompactor

hbase.htable.threads.max=96

hbase.ipc.warn.response.size=-1

hbase.ipc.warn.response.time=-1

hbase.master.loadbalance.bytable=TRUE

hbase.mvcc.impl=org.apache.hadoop.hbase.regionserver.SIMultiVersionConsistencyControl

hbase.regions.slop=0.01

hbase.regionserver.global.memstore.size.lower.limit=0.9

hbase.regionserver.lease.period=1200000

hbase.regionserver.maxlogs=48

hbase.regionserver.thread.compaction.large=1

hbase.regionserver.thread.compaction.small=4

hbase.regionserver.wal.enablecompression=TRUE

hbase.splitlog.manager.timeout=3000

hbase.status.multicast.port=16100

hbase.wal.disruptor.batch=TRUE

hbase.wal.provider=multiwal

hbase.wal.regiongrouping.numgroups=16

hbase.zookeeper.property.tickTime=6000

hfile.block.bloom.cacheonwrite=TRUE

io.storefile.bloom.error.rate=0.005

splice.authentication.native.algorithm=SHA-512

splice.authentication=NATIVE

splice.client.numConnections=1

splice.client.write.maxDependentWrites=60000

splice.client.write.maxIndependentWrites=60000

splice.compression=snappy

splice.marshal.kryoPoolSize=1100

splice.olap_server.clientWaitTime=900000

splice.ring.bufferSize=131072

splice.splitBlockSize=67108864

splice.timestamp_server.clientWaitTime=120000

splice.txn.activeTxns.cacheSize=10240

splice.txn.completedTxns.concurrency=128

splice.txn.concurrencyLevel=4096

5.Save Changes

Click the Save button to save your changes. You'll be prompted to optionally add a note such as Updated HDFS configuration for Splice Machine. Click Save again.

6.Restart HBASE

Return to the HBASE Configs tab in Ambari. Then click the Restart drop-down in the upper right corner and select the Restart All action to restart HBASE. Confirm your action and then wait for the restart to complete.

Optional Configuration Modifications

There are a few configuration modifications you might want to make:

Modify the Authentication Mechanism if you want to authenticate users with something other than the default native authentication mechanism.

Adjust the Replication Factor if you have a small cluster and need to improve resource usage or performance.

Modify the Authentication Mechanism

Splice Machine installs with Native authentication configured; native authentication uses the sys.sysusers table in the splice schema for configuring user names and passwords.

You can disable authentication or change the authentication mechanism that Splice Machine uses to LDAP by following the simple instructions in Configuring Splice Machine Authentication

Verify your Splice Machine Installation

Now start using the Splice Machine command line interpreter, which is referred to as the splice prompt or simply splice> by launching the sqlshell.sh script on any node in your cluster that is running an HBase region server.

NOTE: The command line interpreter defaults to connecting on port 1527 on localhost, with username splice, and password admin. You can override these defaults when starting the interpreter, as described in the Command Line (splice>) Reference topic in our Developer’s Guide.

Now try entering a few sample commands you can run to verify that everything is working with your Splice Machine installation.

Operation	Command to perform operation
Display tables	splice> show tables;
Create a table	splice> create table test (i int);
Add data to the table	splice> insert into test values 1,2,3,4,5;
Query data in the table	*splice> select from test;**
Drop the table	splice> drop table test;
Exit the command line interpreter	splice> exit;
Make sure you end each command with a semicolon (;), followed by the Enter key or Return key

See the Command Line (splice>) Reference section of our Developer's Guide for information about our commands and command syntax.

Cloudera Community

Community Articles

Installing and Configuring Splice Machine for Hortonworks HDP

Apache Ambari

Apache Hadoop

Apache HBase

Apache Spark

Apache YARN

Apache Zookeeper

Cloudera Manager

HDFS

Hortonworks Data Platform (HDP)

MapReduce

Security

Installing Django in Cloudera Machine Learning (CM...

Ambari and HDP Installation: Quick Start for new V...

HDP-AWS (Hortonworks-AWS)

Typical HDP Cluster Network Configuration Best Pra...

Simplified: Manual Installing HDP on raw virtual m...

Price Optimization with PyGurobi in Cloudera Machi...

Installing HAWQ on 2.4.0.0 Hortonworks Sandbox

Installing RStudio on HDP Sandbox

Installing a local Hortonworks Registry to use wit...

Configure Anaconda with Zeppelin and HDP 3.0.1