Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
New Contributor

This topic describes installing and configuring Splice Machine on a Hortonworks Ambari-managed cluster. Follow these steps:

  1. Verify Prerequisites
  1. Download and Install Splice Machine
  1. Configure Hadoop Services
  1. Make any needed Optional Configuration Modifications
  1. Restart the Cluster
  1. Verify your Splice Machine Installation

Verify Prerequisites

Before starting your Splice Machine installation, please make sure that your cluster contains the prerequisite software components:

  • A cluster running HDP 2.4.2
  • HBase installed
  • HDFS installed
  • YARN installed
  • ZooKeeper installed

Download and Install Splice Machine

Perform the following steps on each node in your cluster:

1.Create the splice installation directory:

sudo mkdir -p /opt/splice

2.Download the Splice Machine package into the splice directory on the node:

sudo curl 'https://s3.amazonaws.com/20snapshot/installer/2.0.1.23/hdp2.4.2/SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2...' -o /opt/splice/SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108.tar.gz

3.Extract the Splice Machine package:

sudo tar -xf SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108.tar.gz --directory /opt/splice

4.Create symbolic links:

sudo ln -sf /opt/splice/SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108 /opt/splice/default

sudo ln -sf /opt/splice/default/bin/sqlshell.sh /usr/bin/sqlshell.sh

NOTE: This Means that you can always access splice by simply entering sqlshell.sh on your command line.

sudo ln -sf /opt/splice/default/lib/spark-assembly-hadoop2.7.1.2.4.2.0-258-1.6.2.jar /usr/hdp/2.4.2.0-258/hadoop-yarn/lib/spark-assembly-hadoop2.7.1.2.4.2.0-258-1.6.2.jar

Configure Hadoop Services

Now it's time to make a few modifications in the Hadoop services configurations:

Configure and Restart ZooKeeper

Configure and Restart HDFS

Configure and Restart YARN

Configure MapReduce2

Configure and Restart HBASE

Configure and Restart ZooKeeper

To edit the ZooKeeper configuration, select the Services tab at the top of the Ambari dashboard screen, then click ZooKeeper in the Ambari in the left pane of the screen.

1.Select the Configs tab to configure ZooKeeper

2.Make configuration changes:

Scroll down to where you see Custom zoo.cfg and click Add Property to add the maxClientCnxns property and then again to add the maxSessionTimeout property, with these values:

maxClientCnxns = 0

maxSessionTimeout = 120000

3.Save Changes

Click the Save button to save your changes. You'll be prompted to optionally add a note such as Updated ZooKeeper configuration for Splice Machine. Click Save again.

4.Restart ZooKeeper

After you save your changes, you'll land back on the ZooKeeper Service Configs tab in Ambari. Then click the Restart drop-down in the upper right corner and select the Restart All action to restart ZooKeeper.Wait for the restart to complete.

Configure and Restart HDFS

To edit the HDFS configuration, select the Services tab at the top of the Ambari dashboard screen, then click HDFS in the Ambari in the left pane of the screen. Finally, click the Configs tab.

1.Edit the HDFS configuration as follows:

NameNode Java heap size 4 GB
DataNode maximum Java heap size 2 GB
Block replication 2 (for clusters with less than 8 nodes)

3 (for clusters with 8 or more nodes)

2.Update the custom hdfs-site.xml file:

3.dfs.datanode.handler.count = 20

3.Save Changes

Click the Save button to save your changes. You'll be prompted to optionally add a note such as Updated HDFS configuration for Splice Machine. Click Save again.

4.Create directories for hbase user and the Splice Machine YARN application:

Use your terminal window to create these directories:

sudo -iu hdfs hadoop fs -mkdir -p hdfs:///user/hbase hdfs:///user/splice/history

sudo -iu hdfs hadoop fs -chown -R hbase:hbase hdfs:///user/hbase hdfs:///user/splice

sudo -iu hdfs hadoop fs -chmod 1777 hdfs:///user/splice hdfs:///user/splice/history

5.Restart HDFS

Return to the HDFS Configs tab in Ambari. Then click the Restart drop-down in the upper right corner and select the Restart All action to restart HDFS. Confirm your action and then wait for the restart to complete.

Configure and Restart YARN

To edit the YARN configuration, select the Services tab at the top of the Ambari dashboard screen, then click YARN in the Ambari in the left pane of the screen. Finally, click the Configs tab.

1.Update these other configuration values:

Setting New Value
yarn.application.classpath $HADOOP_CONF_DIR, /usr/hdp/current/hadoop-client/*, /usr/hdp/current/hadoop-client/lib/*, /usr/hdp/current/hadoop-hdfs-client/*, /usr/hdp/current/hadoop-hdfs-client/lib/*, /usr/hdp/current/hadoop-yarn-client/*, /usr/hdp/current/hadoop-yarn-client/lib/*, /usr/hdp/current/hadoop-mapreduce-client/*, /usr/hdp/current/hadoop-mapreduce-client/lib/*, /usr/hdp/current/hbase-regionserver/*, /usr/hdp/current/hbase-regionserver/lib/*, /opt/splice/default/lib/*
yarn.nodemanager.aux-services.spark_shuffle.class org.apache.spark.network.yarn.YarnShuffleService
yarn.nodemanager.delete.debug-delay-sec 86400
Memory allocated for all YARN containers on a node 30 GB (based on node specs)
Minimum Container Size (Memory) 1 GB (based on node specs)
Minimum Container Size (Memory) 30 GB (based on node specs)

2.Save Changes

Click the Save button to save your changes. You'll be prompted to optionally add a note such as Updated HDFS configuration for Splice Machine. Click Save again.

3.Restart YARN

Return to the YARNConfigs tab in Ambari. Then click the Restart drop-down in the upper right corner and select the Restart All action to restart YARN. Confirm your action and then wait for the restart to complete.

Configure MapReduce2

Ambari automatically sets these values for you:

  • Map Memory
  • Reduce Memory
  • Sort Allocation Memory
  • AppMaster Memory
  • MR Map Java Heap Size
  • MR Reduce Java Heap Size

Modify the HDP Version Information

Replace ${hdp.version} with the actual version number (e.g. 2.4.2.0-258) in these property values:

  • mapreduce.admin.map.child.java.opts
  • mapreduce.admin.reduce.child.java.opts
  • mapreduce.admin.user.env
  • mapreduce.application.classpath
  • mapreduce.application.framework.path
  • yarn.app.mapreduce.am.admin-command-opts
  • MR AppMaster Java Heap Size

Configure and Restart HBASE

To edit the HBASE configuration, click HBASE in the Cloudera Manager home screen, then click the Configuration tab and make these changes:

1.Change the values of these settings

Setting New Value
% of RegionServer Allocated to Write Buffer (hbase.regionserver.global.memstore.size) 0.25
HBase RegionServer Maximum Memory (hbase_regionserver_heapsize) 24 GB
% of RegionServer Allocated to Read Buffers (hfile.block.cache.size) 0.25
HBase Master Maximum Memory (hbase_master_heapsize) 5 GB
Number of Handlers per RegionServer (hbase.regionserver.handler.count) 400
HBase RPC Timeout 1200000 (20 minutes)
Zookeeper Session Timeout 120000 (2 minutes)
hbase.coprocessor.master.classes com.splicemachine.hbase.SpliceMasterObserver
hbase.coprocessor.region.classes The value of this property is shown below, in Step 2
Maximum Store Files before Minor Compaction (hbase.hstore.compactionThreshold) 5
Number of Fetched Rows when Scanning from Disk (hbase.client.scanner.caching) 1000
hstore blocking storefiles (hbase.hstore.blockingStoreFiles) 20
Advanced hbase-env The value of this property is shown below, in Step 3
Custom hbase-site The value of this is shown below, in Step 4

2.Set the value of the hbase.coprocessor.region.classes property to the following:

com.splicemachine.hbase.MemstoreAwareObserver,com.splicemachine.derby.hbase.SpliceIndexObserver,com.splicemachine.derby.hbase.SpliceIndexEndpoint,com.splicemachine.hbase.RegionSizeEndpoint,com.splicemachine.si.data.hbase.coprocessor.TxnLifecycleEndpoint,com.splicemachine.si.data.hbase.coprocessor.SIObserver,com.splicemachine.hbase.BackupEndpointObserver

3.Replace the Advanced hbase-env property with the following:

# Set environment variables here.
# The java implementation to use. Java 1.6 required.
export JAVA_HOME={{java64_home}}
# HBase Configuration directory
export HBASE_CONF_DIR=${HBASE_CONF_DIR:-{{hbase_conf_dir}}}
# Extra Java CLASSPATH elements. Optional.
export HBASE_CLASSPATH=${HBASE_CLASSPATH}
# add Splice Machine to the HBase classpath
SPLICELIBDIR="/opt/splice/default/lib"
APPENDSTRING=$(echo $(find ${SPLICELIBDIR} -maxdepth 1 -name \*.jar | sort) | sed 's/ /:/g')
export HBASE_CLASSPATH="${HBASE_CLASSPATH}:${APPENDSTRING}"
# The maximum amount of heap to use, in MB. Default is 1000.
# export HBASE_HEAPSIZE=1000
# Extra Java runtime options.
# Below are what we set by default. May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:{{log_dir}}/gc.log-`date +'%Y%m%d%H%M'`"
# Uncomment below to enable java garbage collection logging.
# export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
#
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
# If you want to configure BucketCache, specify '-XX: MaxDirectMemorySize=' with proper direct memory size
# export HBASE_THRIFT_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default.
export HBASE_REGIONSERVERS=${HBASE_CONF_DIR}/regionservers
# Extra ssh options. Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"
# Where log files are stored. $HBASE_HOME/logs by default.
export HBASE_LOG_DIR={{log_dir}}
# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER
# The scheduling priority for daemon processes. See 'man nice'.
# export HBASE_NICENESS=10
# The directory where pid files are stored. /tmp by default.
export HBASE_PID_DIR={{pid_dir}}
# Seconds to sleep between slave commands. Unset by default. This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false
{% if java_version < 8 %}
JDK_DEPENDED_OPTS="-XX:PermSize=512m -XX:MaxPermSize=512m"
{% endif %}
export HBASE_OPTS="${HBASE_OPTS} -XX:ErrorFile={{log_dir}}/hs_err_pid%p.log -Djava.io.tmpdir={{java_io_tmpdir}}"
export HBASE_MASTER_OPTS="${HBASE_MASTER_OPTS} -Xms{{master_heapsize}} -Xmx{{master_heapsize}} ${JDK_DEPENDED_OPTS} -XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=2g -XX:+AlwaysPreTouch -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=10101 -Dsplice.spark.enabled=true -Dsplice.spark.app.name=SpliceMachine -Dsplice.spark.master=yarn-client -Dsplice.spark.logConf=true -Dsplice.spark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory -Dsplice.spark.driver.maxResultSize=1g -Dsplice.spark.driver.memory=1g -Dsplice.spark.dynamicAllocation.enabled=true -Dsplice.spark.dynamicAllocation.executorIdleTimeout=600 -Dsplice.spark.dynamicAllocation.minExecutors=0 -Dsplice.spark.io.compression.lz4.blockSize=32k -Dsplice.spark.kryo.referenceTracking=false -Dsplice.spark.kryo.registrator=com.splicemachine.derby.impl.SpliceSparkKryoRegistrator -Dsplice.spark.kryoserializer.buffer.max=512m -Dsplice.spark.kryoserializer.buffer=4m -Dsplice.spark.locality.wait=100 -Dsplice.spark.scheduler.mode=FAIR -Dsplice.spark.serializer=org.apache.spark.serializer.KryoSerializer -Dsplice.spark.shuffle.compress=false -Dsplice.spark.shuffle.file.buffer=128k -Dsplice.spark.shuffle.memoryFraction=0.7 -Dsplice.spark.shuffle.service.enabled=true -Dsplice.spark.storage.memoryFraction=0.1 -Dsplice.spark.yarn.am.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native -Dsplice.spark.yarn.am.waitTime=10s -Dsplice.spark.yarn.executor.memoryOverhead=2048 -Dsplice.spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/etc/spark/conf/log4j.properties -Dsplice.spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native -Dsplice.spark.driver.extraClassPath=/usr/hdp/current/hbase-regionserver/conf:/usr/hdp/current/hbase-regionserver/lib/htrace-core-3.1.0-incubating.jar -Dsplice.spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/etc/spark/conf/log4j.properties -Dsplice.spark.executor.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native -Dsplice.spark.executor.extraClassPath=/usr/hdp/current/hbase-regionserver/conf:/usr/hdp/current/hbase-regionserver/lib/htrace-core-3.1.0-incubating.jar -Dsplice.spark.ui.retainedJobs=100 -Dsplice.spark.ui.retainedStages=100 -Dsplice.spark.worker.ui.retainedExecutors=100 -Dsplice.spark.worker.ui.retainedDrivers=100 -Dsplice.spark.streaming.ui.retainedBatches=100 -Dsplice.spark.executor.cores=4 -Dsplice.spark.executor.memory=8g -Dspark.compaction.reserved.slots=4 -Dsplice.spark.eventLog.enabled=true -Dsplice.spark.eventLog.dir=hdfs:///user/splice/history -Dsplice.spark.local.dir=/diska/tmp,/diskb/tmp,/diskc/tmp,/diskd/tmp"
export HBASE_REGIONSERVER_OPTS="${HBASE_REGIONSERVER_OPTS} -Xmn{{regionserver_xmn_size}} -Xms{{regionserver_heapsize}} -Xmx{{regionserver_heapsize}} ${JDK_DEPENDED_OPTS} -XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=2g -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:MaxNewSize=4g -XX:InitiatingHeapOccupancyPercent=60 -XX:ParallelGCThreads=24 -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=5000 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=10102"
{% if security_enabled %}
export HBASE_OPTS="${HBASE_OPTS} -Djava.security.auth.login.config={{client_jaas_config_file}}"
export HBASE_MASTER_OPTS="${HBASE_MASTER_OPTS} -Djava.security.auth.login.config={{master_jaas_config_file}}"
export HBASE_REGIONSERVER_OPTS="${HBASE_REGIONSERVER_OPTS} -Djava.security.auth.login.config={{regionserver_jaas_config_file}}"
{% endif %}
# HBase off-heap MaxDirectMemorySize
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS {% if hbase_max_direct_memory_size %} -XX:MaxDirectMemorySize={{hbase_max_direct_memory_size}}m {% endif %}"

4.Add the hbase-site property, setting its value to the following:

dfs.client.read.shortcircuit.buffer.size=131072
hbase.balancer.period=60000
hbase.client.ipc.pool.size=10
hbase.client.max.perregion.tasks=100
hbase.coprocessor.regionserver.classes=com.splicemachine.hbase.RegionServerLifecycleObserver
hbase.hstore.compaction.max.size=260046848
hbase.hstore.compaction.min.size=16777216
hbase.hstore.compaction.min=5
hbase.hstore.defaultengine.compactionpolicy.class=com.splicemachine.compactions.SpliceDefaultCompactionPolicy
hbase.hstore.defaultengine.compactor.class=com.splicemachine.compactions.SpliceDefaultCompactor
hbase.htable.threads.max=96
hbase.ipc.warn.response.size=-1
hbase.ipc.warn.response.time=-1
hbase.master.loadbalance.bytable=TRUE
hbase.mvcc.impl=org.apache.hadoop.hbase.regionserver.SIMultiVersionConsistencyControl
hbase.regions.slop=0.01
hbase.regionserver.global.memstore.size.lower.limit=0.9
hbase.regionserver.lease.period=1200000
hbase.regionserver.maxlogs=48
hbase.regionserver.thread.compaction.large=1
hbase.regionserver.thread.compaction.small=4
hbase.regionserver.wal.enablecompression=TRUE
hbase.splitlog.manager.timeout=3000
hbase.status.multicast.port=16100
hbase.wal.disruptor.batch=TRUE
hbase.wal.provider=multiwal
hbase.wal.regiongrouping.numgroups=16
hbase.zookeeper.property.tickTime=6000
hfile.block.bloom.cacheonwrite=TRUE
io.storefile.bloom.error.rate=0.005
splice.authentication.native.algorithm=SHA-512
splice.authentication=NATIVE
splice.client.numConnections=1
splice.client.write.maxDependentWrites=60000
splice.client.write.maxIndependentWrites=60000
splice.compression=snappy
splice.marshal.kryoPoolSize=1100
splice.olap_server.clientWaitTime=900000
splice.ring.bufferSize=131072
splice.splitBlockSize=67108864
splice.timestamp_server.clientWaitTime=120000
splice.txn.activeTxns.cacheSize=10240
splice.txn.completedTxns.concurrency=128
splice.txn.concurrencyLevel=4096

5.Save Changes

Click the Save button to save your changes. You'll be prompted to optionally add a note such as Updated HDFS configuration for Splice Machine. Click Save again.

6.Restart HBASE

Return to the HBASE Configs tab in Ambari. Then click the Restart drop-down in the upper right corner and select the Restart All action to restart HBASE. Confirm your action and then wait for the restart to complete.

Optional Configuration Modifications

There are a few configuration modifications you might want to make:

Modify the Authentication Mechanism if you want to authenticate users with something other than the default native authentication mechanism.

Adjust the Replication Factor if you have a small cluster and need to improve resource usage or performance.

Modify the Authentication Mechanism

Splice Machine installs with Native authentication configured; native authentication uses the sys.sysusers table in the splice schema for configuring user names and passwords.

You can disable authentication or change the authentication mechanism that Splice Machine uses to LDAP by following the simple instructions in Configuring Splice Machine Authentication

Verify your Splice Machine Installation

Now start using the Splice Machine command line interpreter, which is referred to as the splice prompt or simply splice> by launching the sqlshell.sh script on any node in your cluster that is running an HBase region server.

NOTE: The command line interpreter defaults to connecting on port 1527 on localhost, with username splice, and password admin. You can override these defaults when starting the interpreter, as described in the Command Line (splice>) Reference topic in our Developer’s Guide.

Now try entering a few sample commands you can run to verify that everything is working with your Splice Machine installation.

Operation Command to perform operation
Display tables splice> show tables;
Create a table splice> create table test (i int);
Add data to the table splice> insert into test values 1,2,3,4,5;
Query data in the table splice> select * from test;
Drop the table splice> drop table test;
Exit the command line interpreter splice> exit;
Make sure you end each command with a semicolon (;), followed by the Enter key or Return key

See the Command Line (splice>) Reference section of our Developer's Guide for information about our commands and command syntax.

1,794 Views
0 Kudos