Member since
10-11-2016
1
Post
0
Kudos Received
0
Solutions
10-12-2016
02:34 AM
This topic describes
installing and configuring Splice Machine on a Hortonworks Ambari-managed
cluster. Follow these steps:
Verify Prerequisites
Download and Install Splice Machine
Configure Hadoop Services
Make any needed Optional Configuration Modifications
Restart the Cluster
Verify your Splice Machine Installation Verify
Prerequisites Before starting your
Splice Machine installation, please make sure that your cluster contains the
prerequisite software components: A cluster running HDP 2.4.2 HBase installed HDFS installed YARN installed ZooKeeper installed Download and Install Splice Machine Perform the following
steps on each node in your cluster: 1.Create the splice installation directory: sudo mkdir -p /opt/splice 2.Download the Splice Machine package into the splice directory on the
node: sudo curl 'https://s3.amazonaws.com/20snapshot/installer/2.0.1.23/hdp2.4.2/SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108.tar.gz'
-o /opt/splice/SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108.tar.gz 3.Extract the Splice Machine package: sudo tar -xf SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108.tar.gz
--directory /opt/splice 4.Create symbolic links: sudo ln -sf /opt/splice/SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108
/opt/splice/default sudo ln -sf
/opt/splice/default/bin/sqlshell.sh /usr/bin/sqlshell.sh NOTE: This Means that you can
always access splice by simply entering sqlshell.sh on your command line. sudo ln -sf /opt/splice/default/lib/spark-assembly-hadoop2.7.1.2.4.2.0-258-1.6.2.jar
/usr/hdp/2.4.2.0-258/hadoop-yarn/lib/spark-assembly-hadoop2.7.1.2.4.2.0-258-1.6.2.jar Configure Hadoop Services Now it's time to make a
few modifications in the Hadoop services configurations: Configure and Restart ZooKeeper Configure and Restart HDFS Configure and Restart YARN Configure MapReduce2 Configure and Restart HBASE Configure and Restart ZooKeeper To edit the ZooKeeper
configuration, select the Services tab
at the top of the Ambari dashboard screen, then click ZooKeeper in
the Ambari in the left pane of the screen. 1.Select the Configs tab
to configure ZooKeeper 2.Make configuration changes: Scroll down to where you
see Custom zoo.cfg and click Add Property to add the maxClientCnxns property and then again to add the maxSessionTimeout property,
with these values: maxClientCnxns = 0 maxSessionTimeout = 120000 3.Save Changes Click the Save button
to save your changes. You'll be prompted to optionally add a note such as Updated ZooKeeper configuration for Splice Machine. Click Save again. 4.Restart ZooKeeper After
you save your changes, you'll land back on the ZooKeeper Service Configs tab
in Ambari. Then click the Restart drop-down
in the upper right corner and select the Restart All action
to restart ZooKeeper.Wait for the restart to complete. Configure and Restart HDFS To edit the
HDFS configuration, select the Services tab
at the top of the Ambari dashboard screen, then click HDFS in
the Ambari in the left pane of the screen. Finally, click the Configs tab. 1.Edit the HDFS configuration as follows:
NameNode
Java heap size
4 GB
DataNode
maximum Java heap size
2 GB
Block
replication
2 (for clusters with less than 8 nodes)
3 (for clusters with 8 or more nodes)
2.Update the custom hdfs-site.xml file: 3.dfs.datanode.handler.count
= 20 3.Save Changes Click the Save button
to save your changes. You'll be prompted to optionally add a note such as Updated HDFS configuration for Splice Machine. Click Save again. 4.Create directories for hbase user and the Splice Machine YARN
application: Use your terminal window
to create these directories: sudo -iu hdfs hadoop fs
-mkdir -p hdfs:///user/hbase hdfs:///user/splice/history sudo -iu hdfs hadoop fs
-chown -R hbase:hbase hdfs:///user/hbase hdfs:///user/splice sudo -iu hdfs hadoop fs
-chmod 1777 hdfs:///user/splice hdfs:///user/splice/history 5.Restart HDFS Return
to the HDFS Configs tab in Ambari. Then click the Restart drop-down
in the upper right corner and select the Restart All action
to restart HDFS. Confirm your action and then wait for the restart to complete. Configure and Restart YARN To edit the YARN
configuration, select the Services tab
at the top of the Ambari dashboard screen, then click YARN in
the Ambari in the left pane of the screen. Finally, click the Configs tab. 1.Update these other configuration values:
Setting
New
Value
yarn.application.classpath
$HADOOP_CONF_DIR,
/usr/hdp/current/hadoop-client/*, /usr/hdp/current/hadoop-client/lib/*,
/usr/hdp/current/hadoop-hdfs-client/*,
/usr/hdp/current/hadoop-hdfs-client/lib/*,
/usr/hdp/current/hadoop-yarn-client/*,
/usr/hdp/current/hadoop-yarn-client/lib/*, /usr/hdp/current/hadoop-mapreduce-client/*,
/usr/hdp/current/hadoop-mapreduce-client/lib/*,
/usr/hdp/current/hbase-regionserver/*,
/usr/hdp/current/hbase-regionserver/lib/*, /opt/splice/default/lib/*
yarn.nodemanager.aux-services.spark_shuffle.class
org.apache.spark.network.yarn.YarnShuffleService
yarn.nodemanager.delete.debug-delay-sec
86400
Memory
allocated for all YARN containers on a node
30
GB (based on node specs)
Minimum
Container Size (Memory)
1
GB (based on node specs)
Minimum
Container Size (Memory)
30
GB (based on node specs)
2.Save Changes Click the Save button
to save your changes. You'll be prompted to optionally add a note such as Updated HDFS configuration for Splice Machine. Click Save again. 3.Restart YARN Return
to the YARNConfigs tab in Ambari. Then click the Restart drop-down
in the upper right corner and select the Restart All action
to restart YARN. Confirm your action and then wait for the restart to complete. Configure MapReduce2 Ambari automatically sets
these values for you: Map Memory Reduce Memory Sort Allocation Memory AppMaster Memory MR Map Java Heap Size MR Reduce Java Heap Size Modify the
HDP Version Information Replace ${hdp.version} with
the actual version number (e.g. 2.4.2.0-258) in these property values: mapreduce.admin.map.child.java.opts mapreduce.admin.reduce.child.java.opts mapreduce.admin.user.env mapreduce.application.classpath mapreduce.application.framework.path yarn.app.mapreduce.am.admin-command-opts MR AppMaster Java Heap Size Configure and Restart HBASE To edit the HBASE
configuration, click HBASE in the Cloudera Manager home screen, then click the Configuration tab and make these changes: 1.Change the values of these settings
Setting
New
Value
% of
RegionServer Allocated to Write Buffer
(hbase.regionserver.global.memstore.size)
0.25
HBase
RegionServer Maximum Memory
(hbase_regionserver_heapsize)
24 GB
% of
RegionServer Allocated to Read Buffers
(hfile.block.cache.size)
0.25
HBase
Master Maximum Memory
(hbase_master_heapsize)
5 GB
Number
of Handlers per RegionServer
(hbase.regionserver.handler.count)
400
HBase
RPC Timeout
1200000 (20
minutes)
Zookeeper
Session Timeout
120000
(2 minutes)
hbase.coprocessor.master.classes
com.splicemachine.hbase.SpliceMasterObserver
hbase.coprocessor.region.classes
The value of
this property is shown below, in Step 2
Maximum
Store Files before Minor Compaction
(hbase.hstore.compactionThreshold)
5
Number
of Fetched Rows when Scanning from Disk
(hbase.client.scanner.caching)
1000
hstore
blocking storefiles
(hbase.hstore.blockingStoreFiles)
20
Advanced
hbase-env
The
value of this property is shown below, in Step 3
Custom
hbase-site
The
value of this is shown below, in Step 4
2.Set the value of the hbase.coprocessor.region.classes property to the following:
com.splicemachine.hbase.MemstoreAwareObserver,com.splicemachine.derby.hbase.SpliceIndexObserver,com.splicemachine.derby.hbase.SpliceIndexEndpoint,com.splicemachine.hbase.RegionSizeEndpoint,com.splicemachine.si.data.hbase.coprocessor.TxnLifecycleEndpoint,com.splicemachine.si.data.hbase.coprocessor.SIObserver,com.splicemachine.hbase.BackupEndpointObserver
3.Replace the Advanced hbase-env property with the following:
# Set environment variables here.
# The java implementation to use. Java 1.6
required.
export JAVA_HOME={{java64_home}}
# HBase Configuration directory
export
HBASE_CONF_DIR=${HBASE_CONF_DIR:-{{hbase_conf_dir}}}
# Extra Java CLASSPATH elements. Optional.
export HBASE_CLASSPATH=${HBASE_CLASSPATH}
# add Splice Machine to the HBase classpath
SPLICELIBDIR="/opt/splice/default/lib"
APPENDSTRING=$(echo $(find ${SPLICELIBDIR}
-maxdepth 1 -name \*.jar | sort) | sed 's/ /:/g')
export
HBASE_CLASSPATH="${HBASE_CLASSPATH}:${APPENDSTRING}"
# The maximum amount of heap to use, in MB.
Default is 1000.
# export HBASE_HEAPSIZE=1000
# Extra Java runtime options.
# Below are what we set by default. May only
work with SUN JVM.
# For more on why as well as other possible
settings,
# see
http://wiki.apache.org/hadoop/PerformanceTuning
export SERVER_GC_OPTS="-verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:{{log_dir}}/gc.log-`date
+'%Y%m%d%H%M'`"
# Uncomment below to enable java garbage
collection logging.
# export HBASE_OPTS="$HBASE_OPTS
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-Xloggc:$HBASE_HOME/logs/gc-hbase.log"
# Uncomment and adjust to enable JMX
exporting
# See jmxremote.password and
jmxremote.access in $JRE_HOME/lib/management to configure remote password
access.
# More details at:
http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
#
# export
HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false"
# If you want to configure BucketCache,
specify '-XX: MaxDirectMemorySize=' with proper direct memory size
# export
HBASE_THRIFT_OPTS="$HBASE_JMX_BASE
-Dcom.sun.management.jmxremote.port=10103"
# export
HBASE_ZOOKEEPER_OPTS="$HBASE_JMX_BASE
-Dcom.sun.management.jmxremote.port=10104"
# File naming hosts on which HRegionServers
will run. $HBASE_HOME/conf/regionservers by default.
export
HBASE_REGIONSERVERS=${HBASE_CONF_DIR}/regionservers
# Extra ssh options. Empty by default.
# export HBASE_SSH_OPTS="-o
ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"
# Where log files are stored.
$HBASE_HOME/logs by default.
export HBASE_LOG_DIR={{log_dir}}
# A string representing this instance of
hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER
# The scheduling priority for daemon
processes. See 'man nice'.
# export HBASE_NICENESS=10
# The directory where pid files are stored.
/tmp by default.
export HBASE_PID_DIR={{pid_dir}}
# Seconds to sleep between slave commands.
Unset by default. This
# can be useful in large clusters, where,
e.g., slave rsyncs can
# otherwise arrive faster than the master
can service them.
# export HBASE_SLAVE_SLEEP=0.1
# Tell HBase whether it should manage it's
own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false
{% if java_version < 8 %}
JDK_DEPENDED_OPTS="-XX:PermSize=512m
-XX:MaxPermSize=512m"
{% endif %}
export HBASE_OPTS="${HBASE_OPTS}
-XX:ErrorFile={{log_dir}}/hs_err_pid%p.log
-Djava.io.tmpdir={{java_io_tmpdir}}"
export
HBASE_MASTER_OPTS="${HBASE_MASTER_OPTS} -Xms{{master_heapsize}}
-Xmx{{master_heapsize}} ${JDK_DEPENDED_OPTS} -XX:+HeapDumpOnOutOfMemoryError
-XX:MaxDirectMemorySize=2g -XX:+AlwaysPreTouch -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70
-XX:+CMSParallelRemarkEnabled -Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.port=10101 -Dsplice.spark.enabled=true
-Dsplice.spark.app.name=SpliceMachine -Dsplice.spark.master=yarn-client
-Dsplice.spark.logConf=true -Dsplice.spark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory
-Dsplice.spark.driver.maxResultSize=1g -Dsplice.spark.driver.memory=1g
-Dsplice.spark.dynamicAllocation.enabled=true
-Dsplice.spark.dynamicAllocation.executorIdleTimeout=600 -Dsplice.spark.dynamicAllocation.minExecutors=0
-Dsplice.spark.io.compression.lz4.blockSize=32k
-Dsplice.spark.kryo.referenceTracking=false
-Dsplice.spark.kryo.registrator=com.splicemachine.derby.impl.SpliceSparkKryoRegistrator
-Dsplice.spark.kryoserializer.buffer.max=512m
-Dsplice.spark.kryoserializer.buffer=4m -Dsplice.spark.locality.wait=100
-Dsplice.spark.scheduler.mode=FAIR
-Dsplice.spark.serializer=org.apache.spark.serializer.KryoSerializer
-Dsplice.spark.shuffle.compress=false -Dsplice.spark.shuffle.file.buffer=128k
-Dsplice.spark.shuffle.memoryFraction=0.7
-Dsplice.spark.shuffle.service.enabled=true
-Dsplice.spark.storage.memoryFraction=0.1
-Dsplice.spark.yarn.am.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native
-Dsplice.spark.yarn.am.waitTime=10s -Dsplice.spark.yarn.executor.memoryOverhead=2048
-Dsplice.spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/etc/spark/conf/log4j.properties
-Dsplice.spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native
-Dsplice.spark.driver.extraClassPath=/usr/hdp/current/hbase-regionserver/conf:/usr/hdp/current/hbase-regionserver/lib/htrace-core-3.1.0-incubating.jar
-Dsplice.spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/etc/spark/conf/log4j.properties
-Dsplice.spark.executor.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native
-Dsplice.spark.executor.extraClassPath=/usr/hdp/current/hbase-regionserver/conf:/usr/hdp/current/hbase-regionserver/lib/htrace-core-3.1.0-incubating.jar
-Dsplice.spark.ui.retainedJobs=100 -Dsplice.spark.ui.retainedStages=100
-Dsplice.spark.worker.ui.retainedExecutors=100
-Dsplice.spark.worker.ui.retainedDrivers=100
-Dsplice.spark.streaming.ui.retainedBatches=100
-Dsplice.spark.executor.cores=4 -Dsplice.spark.executor.memory=8g
-Dspark.compaction.reserved.slots=4 -Dsplice.spark.eventLog.enabled=true
-Dsplice.spark.eventLog.dir=hdfs:///user/splice/history
-Dsplice.spark.local.dir=/diska/tmp,/diskb/tmp,/diskc/tmp,/diskd/tmp"
export
HBASE_REGIONSERVER_OPTS="${HBASE_REGIONSERVER_OPTS}
-Xmn{{regionserver_xmn_size}} -Xms{{regionserver_heapsize}}
-Xmx{{regionserver_heapsize}} ${JDK_DEPENDED_OPTS}
-XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=2g
-XX:+AlwaysPreTouch -XX:+UseG1GC -XX:MaxNewSize=4g
-XX:InitiatingHeapOccupancyPercent=60 -XX:ParallelGCThreads=24
-XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=5000
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.port=10102"
{% if security_enabled %}
export HBASE_OPTS="${HBASE_OPTS}
-Djava.security.auth.login.config={{client_jaas_config_file}}"
export
HBASE_MASTER_OPTS="${HBASE_MASTER_OPTS}
-Djava.security.auth.login.config={{master_jaas_config_file}}"
export
HBASE_REGIONSERVER_OPTS="${HBASE_REGIONSERVER_OPTS}
-Djava.security.auth.login.config={{regionserver_jaas_config_file}}"
{% endif %}
# HBase off-heap MaxDirectMemorySize
export
HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS {% if
hbase_max_direct_memory_size %}
-XX:MaxDirectMemorySize={{hbase_max_direct_memory_size}}m {% endif %}"
4.Add the hbase-site property, setting its value to the following:
dfs.client.read.shortcircuit.buffer.size=131072
hbase.balancer.period=60000
hbase.client.ipc.pool.size=10
hbase.client.max.perregion.tasks=100
hbase.coprocessor.regionserver.classes=com.splicemachine.hbase.RegionServerLifecycleObserver
hbase.hstore.compaction.max.size=260046848
hbase.hstore.compaction.min.size=16777216
hbase.hstore.compaction.min=5
hbase.hstore.defaultengine.compactionpolicy.class=com.splicemachine.compactions.SpliceDefaultCompactionPolicy
hbase.hstore.defaultengine.compactor.class=com.splicemachine.compactions.SpliceDefaultCompactor
hbase.htable.threads.max=96
hbase.ipc.warn.response.size=-1
hbase.ipc.warn.response.time=-1
hbase.master.loadbalance.bytable=TRUE
hbase.mvcc.impl=org.apache.hadoop.hbase.regionserver.SIMultiVersionConsistencyControl
hbase.regions.slop=0.01
hbase.regionserver.global.memstore.size.lower.limit=0.9
hbase.regionserver.lease.period=1200000
hbase.regionserver.maxlogs=48
hbase.regionserver.thread.compaction.large=1
hbase.regionserver.thread.compaction.small=4
hbase.regionserver.wal.enablecompression=TRUE
hbase.splitlog.manager.timeout=3000
hbase.status.multicast.port=16100
hbase.wal.disruptor.batch=TRUE
hbase.wal.provider=multiwal
hbase.wal.regiongrouping.numgroups=16
hbase.zookeeper.property.tickTime=6000
hfile.block.bloom.cacheonwrite=TRUE
io.storefile.bloom.error.rate=0.005
splice.authentication.native.algorithm=SHA-512
splice.authentication=NATIVE
splice.client.numConnections=1
splice.client.write.maxDependentWrites=60000
splice.client.write.maxIndependentWrites=60000
splice.compression=snappy
splice.marshal.kryoPoolSize=1100
splice.olap_server.clientWaitTime=900000
splice.ring.bufferSize=131072
splice.splitBlockSize=67108864
splice.timestamp_server.clientWaitTime=120000
splice.txn.activeTxns.cacheSize=10240
splice.txn.completedTxns.concurrency=128
splice.txn.concurrencyLevel=4096
5.Save Changes Click the Save button
to save your changes. You'll be prompted to optionally add a note such as Updated HDFS configuration for Splice Machine. Click Save again. 6.Restart HBASE Return
to the HBASE Configs tab in Ambari. Then click the Restart drop-down
in the upper right corner and select the Restart All action
to restart HBASE. Confirm your action and then wait for the restart to
complete. Optional Configuration Modifications There are a few
configuration modifications you might want to make: Modify the Authentication Mechanism if
you want to authenticate users with something other than the default native
authentication mechanism. Adjust the Replication Factor if you have
a small cluster and need to improve resource usage or performance. Modify the Authentication Mechanism Splice Machine installs
with Native authentication configured; native authentication uses the sys.sysusers table in
the splice schema
for configuring user names and passwords. You can disable
authentication or change the authentication mechanism that Splice Machine uses
to LDAP by following the simple instructions in Configuring Splice Machine Authentication Verify your Splice
Machine Installation Now start using the
Splice Machine command line interpreter, which is referred to as the splice prompt or simply splice> by
launching the sqlshell.sh script on any node in your cluster that is running an
HBase region server. NOTE:
The command line interpreter defaults to connecting on port 1527 on localhost, with username splice, and password admin. You can
override these defaults when starting the interpreter, as described in the Command Line (splice>) Reference topic
in our Developer’s Guide. Now try entering a few
sample commands you can run to verify that everything is working with your
Splice Machine installation.
Operation
Command to perform operation
Display tables
splice> show tables;
Create a table
splice> create table
test (i int);
Add data to the table
splice> insert into test
values 1,2,3,4,5;
Query data in the table
splice> select * from
test;
Drop the table
splice> drop table test;
Exit the command line interpreter
splice> exit;
Make sure you end each command with a semicolon (;), followed by the Enter key
or Return key
See the Command Line (splice>) Reference section
of our Developer's Guide for information about our commands
and command syntax.
... View more