- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 10-12-2016 02:34 AM - edited 09-16-2022 01:36 AM
This topic describes installing and configuring Splice Machine on a Hortonworks Ambari-managed cluster. Follow these steps:
- Make any needed Optional Configuration Modifications
Verify Prerequisites
Before starting your Splice Machine installation, please make sure that your cluster contains the prerequisite software components:
- A cluster running HDP 2.4.2
- HBase installed
- HDFS installed
- YARN installed
- ZooKeeper installed
Download and Install Splice Machine
Perform the following steps on each node in your cluster:
1.Create the splice installation directory:
sudo mkdir -p /opt/splice
2.Download the Splice Machine package into the splice directory on the node:
sudo curl 'https://s3.amazonaws.com/20snapshot/installer/2.0.1.23/hdp2.4.2/SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2...' -o /opt/splice/SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108.tar.gz
3.Extract the Splice Machine package:
sudo tar -xf SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108.tar.gz --directory /opt/splice
4.Create symbolic links:
sudo ln -sf /opt/splice/SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108 /opt/splice/default
sudo ln -sf /opt/splice/default/bin/sqlshell.sh /usr/bin/sqlshell.sh
NOTE: This Means that you can always access splice by simply entering sqlshell.sh on your command line.
sudo ln -sf /opt/splice/default/lib/spark-assembly-hadoop2.7.1.2.4.2.0-258-1.6.2.jar /usr/hdp/2.4.2.0-258/hadoop-yarn/lib/spark-assembly-hadoop2.7.1.2.4.2.0-258-1.6.2.jar
Configure Hadoop Services
Now it's time to make a few modifications in the Hadoop services configurations:
Configure and Restart ZooKeeper
Configure and Restart ZooKeeper
To edit the ZooKeeper configuration, select the Services tab at the top of the Ambari dashboard screen, then click ZooKeeper in the Ambari in the left pane of the screen.
1.Select the Configs tab to configure ZooKeeper
2.Make configuration changes:
Scroll down to where you see Custom zoo.cfg and click Add Property to add the maxClientCnxns property and then again to add the maxSessionTimeout property, with these values:
maxClientCnxns = 0
maxSessionTimeout = 120000
3.Save Changes
Click the Save button to save your changes. You'll be prompted to optionally add a note such as Updated ZooKeeper configuration for Splice Machine. Click Save again.
4.Restart ZooKeeper
After you save your changes, you'll land back on the ZooKeeper Service Configs tab in Ambari. Then click the Restart drop-down in the upper right corner and select the Restart All action to restart ZooKeeper.Wait for the restart to complete.
Configure and Restart HDFS
To edit the HDFS configuration, select the Services tab at the top of the Ambari dashboard screen, then click HDFS in the Ambari in the left pane of the screen. Finally, click the Configs tab.
1.Edit the HDFS configuration as follows:
NameNode Java heap size | 4 GB |
DataNode maximum Java heap size | 2 GB |
Block replication | 2 (for clusters with less than 8 nodes)
3 (for clusters with 8 or more nodes) |
2.Update the custom hdfs-site.xml file:
3.dfs.datanode.handler.count = 20
3.Save Changes
Click the Save button to save your changes. You'll be prompted to optionally add a note such as Updated HDFS configuration for Splice Machine. Click Save again.
4.Create directories for hbase user and the Splice Machine YARN application:
Use your terminal window to create these directories:
sudo -iu hdfs hadoop fs -mkdir -p hdfs:///user/hbase hdfs:///user/splice/history
sudo -iu hdfs hadoop fs -chown -R hbase:hbase hdfs:///user/hbase hdfs:///user/splice
sudo -iu hdfs hadoop fs -chmod 1777 hdfs:///user/splice hdfs:///user/splice/history
5.Restart HDFS
Return to the HDFS Configs tab in Ambari. Then click the Restart drop-down in the upper right corner and select the Restart All action to restart HDFS. Confirm your action and then wait for the restart to complete.
Configure and Restart YARN
To edit the YARN configuration, select the Services tab at the top of the Ambari dashboard screen, then click YARN in the Ambari in the left pane of the screen. Finally, click the Configs tab.
1.Update these other configuration values:
Setting | New Value |
yarn.application.classpath | $HADOOP_CONF_DIR, /usr/hdp/current/hadoop-client/*, /usr/hdp/current/hadoop-client/lib/*, /usr/hdp/current/hadoop-hdfs-client/*, /usr/hdp/current/hadoop-hdfs-client/lib/*, /usr/hdp/current/hadoop-yarn-client/*, /usr/hdp/current/hadoop-yarn-client/lib/*, /usr/hdp/current/hadoop-mapreduce-client/*, /usr/hdp/current/hadoop-mapreduce-client/lib/*, /usr/hdp/current/hbase-regionserver/*, /usr/hdp/current/hbase-regionserver/lib/*, /opt/splice/default/lib/* |
yarn.nodemanager.aux-services.spark_shuffle.class | org.apache.spark.network.yarn.YarnShuffleService |
yarn.nodemanager.delete.debug-delay-sec | 86400 |
Memory allocated for all YARN containers on a node | 30 GB (based on node specs) |
Minimum Container Size (Memory) | 1 GB (based on node specs) |
Minimum Container Size (Memory) | 30 GB (based on node specs) |
2.Save Changes
Click the Save button to save your changes. You'll be prompted to optionally add a note such as Updated HDFS configuration for Splice Machine. Click Save again.
3.Restart YARN
Return to the YARNConfigs tab in Ambari. Then click the Restart drop-down in the upper right corner and select the Restart All action to restart YARN. Confirm your action and then wait for the restart to complete.
Configure MapReduce2
Ambari automatically sets these values for you:
- Map Memory
- Reduce Memory
- Sort Allocation Memory
- AppMaster Memory
- MR Map Java Heap Size
- MR Reduce Java Heap Size
Modify the HDP Version Information
Replace ${hdp.version} with the actual version number (e.g. 2.4.2.0-258) in these property values:
- mapreduce.admin.map.child.java.opts
- mapreduce.admin.reduce.child.java.opts
- mapreduce.admin.user.env
- mapreduce.application.classpath
- mapreduce.application.framework.path
- yarn.app.mapreduce.am.admin-command-opts
- MR AppMaster Java Heap Size
Configure and Restart HBASE
To edit the HBASE configuration, click HBASE in the Cloudera Manager home screen, then click the Configuration tab and make these changes:
1.Change the values of these settings
Setting | New Value |
% of RegionServer Allocated to Write Buffer (hbase.regionserver.global.memstore.size) | 0.25 |
HBase RegionServer Maximum Memory (hbase_regionserver_heapsize) | 24 GB |
% of RegionServer Allocated to Read Buffers (hfile.block.cache.size) | 0.25 |
HBase Master Maximum Memory (hbase_master_heapsize) | 5 GB |
Number of Handlers per RegionServer (hbase.regionserver.handler.count) | 400 |
HBase RPC Timeout | 1200000 (20 minutes) |
Zookeeper Session Timeout | 120000 (2 minutes) |
hbase.coprocessor.master.classes | com.splicemachine.hbase.SpliceMasterObserver |
hbase.coprocessor.region.classes | The value of this property is shown below, in Step 2 |
Maximum Store Files before Minor Compaction (hbase.hstore.compactionThreshold) | 5 |
Number of Fetched Rows when Scanning from Disk (hbase.client.scanner.caching) | 1000 |
hstore blocking storefiles (hbase.hstore.blockingStoreFiles) | 20 |
Advanced hbase-env | The value of this property is shown below, in Step 3 |
Custom hbase-site | The value of this is shown below, in Step 4 |
2.Set the value of the hbase.coprocessor.region.classes property to the following:
com.splicemachine.hbase.MemstoreAwareObserver,com.splicemachine.derby.hbase.SpliceIndexObserver,com.splicemachine.derby.hbase.SpliceIndexEndpoint,com.splicemachine.hbase.RegionSizeEndpoint,com.splicemachine.si.data.hbase.coprocessor.TxnLifecycleEndpoint,com.splicemachine.si.data.hbase.coprocessor.SIObserver,com.splicemachine.hbase.BackupEndpointObserver |
3.Replace the Advanced hbase-env property with the following:
# Set environment variables here. |
# The java implementation to use. Java 1.6 required. |
export JAVA_HOME={{java64_home}} |
# HBase Configuration directory |
export HBASE_CONF_DIR=${HBASE_CONF_DIR:-{{hbase_conf_dir}}} |
# Extra Java CLASSPATH elements. Optional. |
export HBASE_CLASSPATH=${HBASE_CLASSPATH} |
# add Splice Machine to the HBase classpath |
SPLICELIBDIR="/opt/splice/default/lib" |
APPENDSTRING=$(echo $(find ${SPLICELIBDIR} -maxdepth 1 -name \*.jar | sort) | sed 's/ /:/g') |
export HBASE_CLASSPATH="${HBASE_CLASSPATH}:${APPENDSTRING}" |
# The maximum amount of heap to use, in MB. Default is 1000. |
# export HBASE_HEAPSIZE=1000 |
# Extra Java runtime options. |
# Below are what we set by default. May only work with SUN JVM. |
# For more on why as well as other possible settings, |
# see http://wiki.apache.org/hadoop/PerformanceTuning |
export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:{{log_dir}}/gc.log-`date +'%Y%m%d%H%M'`" |
# Uncomment below to enable java garbage collection logging. |
# export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log" |
# Uncomment and adjust to enable JMX exporting |
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access. |
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html |
# |
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false" |
# If you want to configure BucketCache, specify '-XX: MaxDirectMemorySize=' with proper direct memory size |
# export HBASE_THRIFT_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103" |
# export HBASE_ZOOKEEPER_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104" |
# File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default. |
export HBASE_REGIONSERVERS=${HBASE_CONF_DIR}/regionservers |
# Extra ssh options. Empty by default. |
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR" |
# Where log files are stored. $HBASE_HOME/logs by default. |
export HBASE_LOG_DIR={{log_dir}} |
# A string representing this instance of hbase. $USER by default. |
# export HBASE_IDENT_STRING=$USER |
# The scheduling priority for daemon processes. See 'man nice'. |
# export HBASE_NICENESS=10 |
# The directory where pid files are stored. /tmp by default. |
export HBASE_PID_DIR={{pid_dir}} |
# Seconds to sleep between slave commands. Unset by default. This |
# can be useful in large clusters, where, e.g., slave rsyncs can |
# otherwise arrive faster than the master can service them. |
# export HBASE_SLAVE_SLEEP=0.1 |
# Tell HBase whether it should manage it's own instance of Zookeeper or not. |
export HBASE_MANAGES_ZK=false |
{% if java_version < 8 %} |
JDK_DEPENDED_OPTS="-XX:PermSize=512m -XX:MaxPermSize=512m" |
{% endif %} |
export HBASE_OPTS="${HBASE_OPTS} -XX:ErrorFile={{log_dir}}/hs_err_pid%p.log -Djava.io.tmpdir={{java_io_tmpdir}}" |
export HBASE_MASTER_OPTS="${HBASE_MASTER_OPTS} -Xms{{master_heapsize}} -Xmx{{master_heapsize}} ${JDK_DEPENDED_OPTS} -XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=2g -XX:+AlwaysPreTouch -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=10101 -Dsplice.spark.enabled=true -Dsplice.spark.app.name=SpliceMachine -Dsplice.spark.master=yarn-client -Dsplice.spark.logConf=true -Dsplice.spark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory -Dsplice.spark.driver.maxResultSize=1g -Dsplice.spark.driver.memory=1g -Dsplice.spark.dynamicAllocation.enabled=true -Dsplice.spark.dynamicAllocation.executorIdleTimeout=600 -Dsplice.spark.dynamicAllocation.minExecutors=0 -Dsplice.spark.io.compression.lz4.blockSize=32k -Dsplice.spark.kryo.referenceTracking=false -Dsplice.spark.kryo.registrator=com.splicemachine.derby.impl.SpliceSparkKryoRegistrator -Dsplice.spark.kryoserializer.buffer.max=512m -Dsplice.spark.kryoserializer.buffer=4m -Dsplice.spark.locality.wait=100 -Dsplice.spark.scheduler.mode=FAIR -Dsplice.spark.serializer=org.apache.spark.serializer.KryoSerializer -Dsplice.spark.shuffle.compress=false -Dsplice.spark.shuffle.file.buffer=128k -Dsplice.spark.shuffle.memoryFraction=0.7 -Dsplice.spark.shuffle.service.enabled=true -Dsplice.spark.storage.memoryFraction=0.1 -Dsplice.spark.yarn.am.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native -Dsplice.spark.yarn.am.waitTime=10s -Dsplice.spark.yarn.executor.memoryOverhead=2048 -Dsplice.spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/etc/spark/conf/log4j.properties -Dsplice.spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native -Dsplice.spark.driver.extraClassPath=/usr/hdp/current/hbase-regionserver/conf:/usr/hdp/current/hbase-regionserver/lib/htrace-core-3.1.0-incubating.jar -Dsplice.spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/etc/spark/conf/log4j.properties -Dsplice.spark.executor.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native -Dsplice.spark.executor.extraClassPath=/usr/hdp/current/hbase-regionserver/conf:/usr/hdp/current/hbase-regionserver/lib/htrace-core-3.1.0-incubating.jar -Dsplice.spark.ui.retainedJobs=100 -Dsplice.spark.ui.retainedStages=100 -Dsplice.spark.worker.ui.retainedExecutors=100 -Dsplice.spark.worker.ui.retainedDrivers=100 -Dsplice.spark.streaming.ui.retainedBatches=100 -Dsplice.spark.executor.cores=4 -Dsplice.spark.executor.memory=8g -Dspark.compaction.reserved.slots=4 -Dsplice.spark.eventLog.enabled=true -Dsplice.spark.eventLog.dir=hdfs:///user/splice/history -Dsplice.spark.local.dir=/diska/tmp,/diskb/tmp,/diskc/tmp,/diskd/tmp" |
export HBASE_REGIONSERVER_OPTS="${HBASE_REGIONSERVER_OPTS} -Xmn{{regionserver_xmn_size}} -Xms{{regionserver_heapsize}} -Xmx{{regionserver_heapsize}} ${JDK_DEPENDED_OPTS} -XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=2g -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:MaxNewSize=4g -XX:InitiatingHeapOccupancyPercent=60 -XX:ParallelGCThreads=24 -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=5000 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=10102" |
{% if security_enabled %} |
export HBASE_OPTS="${HBASE_OPTS} -Djava.security.auth.login.config={{client_jaas_config_file}}" |
export HBASE_MASTER_OPTS="${HBASE_MASTER_OPTS} -Djava.security.auth.login.config={{master_jaas_config_file}}" |
export HBASE_REGIONSERVER_OPTS="${HBASE_REGIONSERVER_OPTS} -Djava.security.auth.login.config={{regionserver_jaas_config_file}}" |
{% endif %} |
# HBase off-heap MaxDirectMemorySize |
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS {% if hbase_max_direct_memory_size %} -XX:MaxDirectMemorySize={{hbase_max_direct_memory_size}}m {% endif %}" |
4.Add the hbase-site property, setting its value to the following:
dfs.client.read.shortcircuit.buffer.size=131072 |
hbase.balancer.period=60000 |
hbase.client.ipc.pool.size=10 |
hbase.client.max.perregion.tasks=100 |
hbase.coprocessor.regionserver.classes=com.splicemachine.hbase.RegionServerLifecycleObserver |
hbase.hstore.compaction.max.size=260046848 |
hbase.hstore.compaction.min.size=16777216 |
hbase.hstore.compaction.min=5 |
hbase.hstore.defaultengine.compactionpolicy.class=com.splicemachine.compactions.SpliceDefaultCompactionPolicy |
hbase.hstore.defaultengine.compactor.class=com.splicemachine.compactions.SpliceDefaultCompactor |
hbase.htable.threads.max=96 |
hbase.ipc.warn.response.size=-1 |
hbase.ipc.warn.response.time=-1 |
hbase.master.loadbalance.bytable=TRUE |
hbase.mvcc.impl=org.apache.hadoop.hbase.regionserver.SIMultiVersionConsistencyControl |
hbase.regions.slop=0.01 |
hbase.regionserver.global.memstore.size.lower.limit=0.9 |
hbase.regionserver.lease.period=1200000 |
hbase.regionserver.maxlogs=48 |
hbase.regionserver.thread.compaction.large=1 |
hbase.regionserver.thread.compaction.small=4 |
hbase.regionserver.wal.enablecompression=TRUE |
hbase.splitlog.manager.timeout=3000 |
hbase.status.multicast.port=16100 |
hbase.wal.disruptor.batch=TRUE |
hbase.wal.provider=multiwal |
hbase.wal.regiongrouping.numgroups=16 |
hbase.zookeeper.property.tickTime=6000 |
hfile.block.bloom.cacheonwrite=TRUE |
io.storefile.bloom.error.rate=0.005 |
splice.authentication.native.algorithm=SHA-512 |
splice.authentication=NATIVE |
splice.client.numConnections=1 |
splice.client.write.maxDependentWrites=60000 |
splice.client.write.maxIndependentWrites=60000 |
splice.compression=snappy |
splice.marshal.kryoPoolSize=1100 |
splice.olap_server.clientWaitTime=900000 |
splice.ring.bufferSize=131072 |
splice.splitBlockSize=67108864 |
splice.timestamp_server.clientWaitTime=120000 |
splice.txn.activeTxns.cacheSize=10240 |
splice.txn.completedTxns.concurrency=128 |
splice.txn.concurrencyLevel=4096 |
5.Save Changes
Click the Save button to save your changes. You'll be prompted to optionally add a note such as Updated HDFS configuration for Splice Machine. Click Save again.
6.Restart HBASE
Return to the HBASE Configs tab in Ambari. Then click the Restart drop-down in the upper right corner and select the Restart All action to restart HBASE. Confirm your action and then wait for the restart to complete.
Optional Configuration Modifications
There are a few configuration modifications you might want to make:
Modify the Authentication Mechanism if you want to authenticate users with something other than the default native authentication mechanism.
Adjust the Replication Factor if you have a small cluster and need to improve resource usage or performance.
Modify the Authentication Mechanism
Splice Machine installs with Native authentication configured; native authentication uses the sys.sysusers table in the splice schema for configuring user names and passwords.
You can disable authentication or change the authentication mechanism that Splice Machine uses to LDAP by following the simple instructions in Configuring Splice Machine Authentication
Verify your Splice Machine Installation
Now start using the Splice Machine command line interpreter, which is referred to as the splice prompt or simply splice> by launching the sqlshell.sh script on any node in your cluster that is running an HBase region server.
NOTE: The command line interpreter defaults to connecting on port 1527 on localhost, with username splice, and password admin. You can override these defaults when starting the interpreter, as described in the Command Line (splice>) Reference topic in our Developer’s Guide.
Now try entering a few sample commands you can run to verify that everything is working with your Splice Machine installation.
Operation | Command to perform operation |
Display tables | splice> show tables; |
Create a table | splice> create table test (i int); |
Add data to the table | splice> insert into test values 1,2,3,4,5; |
Query data in the table | splice> select * from test; |
Drop the table | splice> drop table test; |
Exit the command line interpreter | splice> exit; |
Make sure you end each command with a semicolon (;), followed by the Enter key or Return key |
See the Command Line (splice>) Reference section of our Developer's Guide for information about our commands and command syntax.