About SpliceTony

SpliceTony · ‎10-12-2016

This topic describes installing and configuring Splice Machine on a Hortonworks Ambari-managed cluster. Follow these steps: Verify Prerequisites Download and Install Splice Machine Configure Hadoop Services Make any needed Optional Configuration Modifications Restart the Cluster Verify your Splice Machine Installation Verify Prerequisites Before starting your Splice Machine installation, please make sure that your cluster contains the prerequisite software components: A cluster running HDP 2.4.2 HBase installed HDFS installed YARN installed ZooKeeper installed Download and Install Splice Machine Perform the following steps on each node in your cluster: 1.Create the splice installation directory: sudo mkdir -p /opt/splice 2.Download the Splice Machine package into the splice directory on the node: sudo curl 'https://s3.amazonaws.com/20snapshot/installer/2.0.1.23/hdp2.4.2/SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108.tar.gz' -o /opt/splice/SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108.tar.gz 3.Extract the Splice Machine package: sudo tar -xf SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108.tar.gz --directory /opt/splice 4.Create symbolic links: sudo ln -sf /opt/splice/SPLICEMACHINE-2.0.1.23-SNAPSHOT.hdp2.4.2.p0.108 /opt/splice/default sudo ln -sf /opt/splice/default/bin/sqlshell.sh /usr/bin/sqlshell.sh NOTE: This Means that you can always access splice by simply entering sqlshell.sh on your command line. sudo ln -sf /opt/splice/default/lib/spark-assembly-hadoop2.7.1.2.4.2.0-258-1.6.2.jar /usr/hdp/2.4.2.0-258/hadoop-yarn/lib/spark-assembly-hadoop2.7.1.2.4.2.0-258-1.6.2.jar Configure Hadoop Services Now it's time to make a few modifications in the Hadoop services configurations: Configure and Restart ZooKeeper Configure and Restart HDFS Configure and Restart YARN Configure MapReduce2 Configure and Restart HBASE Configure and Restart ZooKeeper To edit the ZooKeeper configuration, select the Services tab at the top of the Ambari dashboard screen, then click ZooKeeper in the Ambari in the left pane of the screen. 1.Select the Configs tab to configure ZooKeeper 2.Make configuration changes: Scroll down to where you see Custom zoo.cfg and click Add Property to add the maxClientCnxns property and then again to add the maxSessionTimeout property, with these values: maxClientCnxns = 0 maxSessionTimeout = 120000 3.Save Changes Click the Save button to save your changes. You'll be prompted to optionally add a note such as Updated ZooKeeper configuration for Splice Machine. Click Save again. 4.Restart ZooKeeper After you save your changes, you'll land back on the ZooKeeper Service Configs tab in Ambari. Then click the Restart drop-down in the upper right corner and select the Restart All action to restart ZooKeeper.Wait for the restart to complete. Configure and Restart HDFS To edit the HDFS configuration, select the Services tab at the top of the Ambari dashboard screen, then click HDFS in the Ambari in the left pane of the screen. Finally, click the Configs tab. 1.Edit the HDFS configuration as follows: NameNode Java heap size 4 GB DataNode maximum Java heap size 2 GB Block replication 2 (for clusters with less than 8 nodes) 3 (for clusters with 8 or more nodes) 2.Update the custom hdfs-site.xml file: 3.dfs.datanode.handler.count = 20 3.Save Changes Click the Save button to save your changes. You'll be prompted to optionally add a note such as Updated HDFS configuration for Splice Machine. Click Save again. 4.Create directories for hbase user and the Splice Machine YARN application: Use your terminal window to create these directories: sudo -iu hdfs hadoop fs -mkdir -p hdfs:///user/hbase hdfs:///user/splice/history sudo -iu hdfs hadoop fs -chown -R hbase:hbase hdfs:///user/hbase hdfs:///user/splice sudo -iu hdfs hadoop fs -chmod 1777 hdfs:///user/splice hdfs:///user/splice/history 5.Restart HDFS Return to the HDFS Configs tab in Ambari. Then click the Restart drop-down in the upper right corner and select the Restart All action to restart HDFS. Confirm your action and then wait for the restart to complete. Configure and Restart YARN To edit the YARN configuration, select the Services tab at the top of the Ambari dashboard screen, then click YARN in the Ambari in the left pane of the screen. Finally, click the Configs tab. 1.Update these other configuration values: Setting New Value yarn.application.classpath $HADOOP_CONF_DIR, /usr/hdp/current/hadoop-client/*, /usr/hdp/current/hadoop-client/lib/*, /usr/hdp/current/hadoop-hdfs-client/*, /usr/hdp/current/hadoop-hdfs-client/lib/*, /usr/hdp/current/hadoop-yarn-client/*, /usr/hdp/current/hadoop-yarn-client/lib/*, /usr/hdp/current/hadoop-mapreduce-client/*, /usr/hdp/current/hadoop-mapreduce-client/lib/*, /usr/hdp/current/hbase-regionserver/*, /usr/hdp/current/hbase-regionserver/lib/*, /opt/splice/default/lib/* yarn.nodemanager.aux-services.spark_shuffle.class org.apache.spark.network.yarn.YarnShuffleService yarn.nodemanager.delete.debug-delay-sec 86400 Memory allocated for all YARN containers on a node 30 GB (based on node specs) Minimum Container Size (Memory) 1 GB (based on node specs) Minimum Container Size (Memory) 30 GB (based on node specs) 2.Save Changes Click the Save button to save your changes. You'll be prompted to optionally add a note such as Updated HDFS configuration for Splice Machine. Click Save again. 3.Restart YARN Return to the YARNConfigs tab in Ambari. Then click the Restart drop-down in the upper right corner and select the Restart All action to restart YARN. Confirm your action and then wait for the restart to complete. Configure MapReduce2 Ambari automatically sets these values for you: Map Memory Reduce Memory Sort Allocation Memory AppMaster Memory MR Map Java Heap Size MR Reduce Java Heap Size Modify the HDP Version Information Replace ${hdp.version} with the actual version number (e.g. 2.4.2.0-258) in these property values: mapreduce.admin.map.child.java.opts mapreduce.admin.reduce.child.java.opts mapreduce.admin.user.env mapreduce.application.classpath mapreduce.application.framework.path yarn.app.mapreduce.am.admin-command-opts MR AppMaster Java Heap Size Configure and Restart HBASE To edit the HBASE configuration, click HBASE in the Cloudera Manager home screen, then click the Configuration tab and make these changes: 1.Change the values of these settings Setting New Value % of RegionServer Allocated to Write Buffer (hbase.regionserver.global.memstore.size) 0.25 HBase RegionServer Maximum Memory (hbase_regionserver_heapsize) 24 GB % of RegionServer Allocated to Read Buffers (hfile.block.cache.size) 0.25 HBase Master Maximum Memory (hbase_master_heapsize) 5 GB Number of Handlers per RegionServer (hbase.regionserver.handler.count) 400 HBase RPC Timeout 1200000 (20 minutes) Zookeeper Session Timeout 120000 (2 minutes) hbase.coprocessor.master.classes com.splicemachine.hbase.SpliceMasterObserver hbase.coprocessor.region.classes The value of this property is shown below, in Step 2 Maximum Store Files before Minor Compaction (hbase.hstore.compactionThreshold) 5 Number of Fetched Rows when Scanning from Disk (hbase.client.scanner.caching) 1000 hstore blocking storefiles (hbase.hstore.blockingStoreFiles) 20 Advanced hbase-env The value of this property is shown below, in Step 3 Custom hbase-site The value of this is shown below, in Step 4 2.Set the value of the hbase.coprocessor.region.classes property to the following: com.splicemachine.hbase.MemstoreAwareObserver,com.splicemachine.derby.hbase.SpliceIndexObserver,com.splicemachine.derby.hbase.SpliceIndexEndpoint,com.splicemachine.hbase.RegionSizeEndpoint,com.splicemachine.si.data.hbase.coprocessor.TxnLifecycleEndpoint,com.splicemachine.si.data.hbase.coprocessor.SIObserver,com.splicemachine.hbase.BackupEndpointObserver 3.Replace the Advanced hbase-env property with the following: # Set environment variables here. # The java implementation to use. Java 1.6 required. export JAVA_HOME={{java64_home}} # HBase Configuration directory export HBASE_CONF_DIR=${HBASE_CONF_DIR:-{{hbase_conf_dir}}} # Extra Java CLASSPATH elements. Optional. export HBASE_CLASSPATH=${HBASE_CLASSPATH} # add Splice Machine to the HBase classpath SPLICELIBDIR="/opt/splice/default/lib" APPENDSTRING=$(echo $(find ${SPLICELIBDIR} -maxdepth 1 -name \*.jar | sort) | sed 's/ /:/g') export HBASE_CLASSPATH="${HBASE_CLASSPATH}:${APPENDSTRING}" # The maximum amount of heap to use, in MB. Default is 1000. # export HBASE_HEAPSIZE=1000 # Extra Java runtime options. # Below are what we set by default. May only work with SUN JVM. # For more on why as well as other possible settings, # see http://wiki.apache.org/hadoop/PerformanceTuning export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:{{log_dir}}/gc.log-`date +'%Y%m%d%H%M'`" # Uncomment below to enable java garbage collection logging. # export HBASE_OPTS="$HBASE_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:$HBASE_HOME/logs/gc-hbase.log" # Uncomment and adjust to enable JMX exporting # See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access. # More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html # # export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false" # If you want to configure BucketCache, specify '-XX: MaxDirectMemorySize=' with proper direct memory size # export HBASE_THRIFT_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103" # export HBASE_ZOOKEEPER_OPTS="$HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104" # File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default. export HBASE_REGIONSERVERS=${HBASE_CONF_DIR}/regionservers # Extra ssh options. Empty by default. # export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR" # Where log files are stored. $HBASE_HOME/logs by default. export HBASE_LOG_DIR={{log_dir}} # A string representing this instance of hbase. $USER by default. # export HBASE_IDENT_STRING=$USER # The scheduling priority for daemon processes. See 'man nice'. # export HBASE_NICENESS=10 # The directory where pid files are stored. /tmp by default. export HBASE_PID_DIR={{pid_dir}} # Seconds to sleep between slave commands. Unset by default. This # can be useful in large clusters, where, e.g., slave rsyncs can # otherwise arrive faster than the master can service them. # export HBASE_SLAVE_SLEEP=0.1 # Tell HBase whether it should manage it's own instance of Zookeeper or not. export HBASE_MANAGES_ZK=false {% if java_version < 8 %} JDK_DEPENDED_OPTS="-XX:PermSize=512m -XX:MaxPermSize=512m" {% endif %} export HBASE_OPTS="${HBASE_OPTS} -XX:ErrorFile={{log_dir}}/hs_err_pid%p.log -Djava.io.tmpdir={{java_io_tmpdir}}" export HBASE_MASTER_OPTS="${HBASE_MASTER_OPTS} -Xms{{master_heapsize}} -Xmx{{master_heapsize}} ${JDK_DEPENDED_OPTS} -XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=2g -XX:+AlwaysPreTouch -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=10101 -Dsplice.spark.enabled=true -Dsplice.spark.app.name=SpliceMachine -Dsplice.spark.master=yarn-client -Dsplice.spark.logConf=true -Dsplice.spark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory -Dsplice.spark.driver.maxResultSize=1g -Dsplice.spark.driver.memory=1g -Dsplice.spark.dynamicAllocation.enabled=true -Dsplice.spark.dynamicAllocation.executorIdleTimeout=600 -Dsplice.spark.dynamicAllocation.minExecutors=0 -Dsplice.spark.io.compression.lz4.blockSize=32k -Dsplice.spark.kryo.referenceTracking=false -Dsplice.spark.kryo.registrator=com.splicemachine.derby.impl.SpliceSparkKryoRegistrator -Dsplice.spark.kryoserializer.buffer.max=512m -Dsplice.spark.kryoserializer.buffer=4m -Dsplice.spark.locality.wait=100 -Dsplice.spark.scheduler.mode=FAIR -Dsplice.spark.serializer=org.apache.spark.serializer.KryoSerializer -Dsplice.spark.shuffle.compress=false -Dsplice.spark.shuffle.file.buffer=128k -Dsplice.spark.shuffle.memoryFraction=0.7 -Dsplice.spark.shuffle.service.enabled=true -Dsplice.spark.storage.memoryFraction=0.1 -Dsplice.spark.yarn.am.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native -Dsplice.spark.yarn.am.waitTime=10s -Dsplice.spark.yarn.executor.memoryOverhead=2048 -Dsplice.spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/etc/spark/conf/log4j.properties -Dsplice.spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native -Dsplice.spark.driver.extraClassPath=/usr/hdp/current/hbase-regionserver/conf:/usr/hdp/current/hbase-regionserver/lib/htrace-core-3.1.0-incubating.jar -Dsplice.spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/etc/spark/conf/log4j.properties -Dsplice.spark.executor.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native -Dsplice.spark.executor.extraClassPath=/usr/hdp/current/hbase-regionserver/conf:/usr/hdp/current/hbase-regionserver/lib/htrace-core-3.1.0-incubating.jar -Dsplice.spark.ui.retainedJobs=100 -Dsplice.spark.ui.retainedStages=100 -Dsplice.spark.worker.ui.retainedExecutors=100 -Dsplice.spark.worker.ui.retainedDrivers=100 -Dsplice.spark.streaming.ui.retainedBatches=100 -Dsplice.spark.executor.cores=4 -Dsplice.spark.executor.memory=8g -Dspark.compaction.reserved.slots=4 -Dsplice.spark.eventLog.enabled=true -Dsplice.spark.eventLog.dir=hdfs:///user/splice/history -Dsplice.spark.local.dir=/diska/tmp,/diskb/tmp,/diskc/tmp,/diskd/tmp" export HBASE_REGIONSERVER_OPTS="${HBASE_REGIONSERVER_OPTS} -Xmn{{regionserver_xmn_size}} -Xms{{regionserver_heapsize}} -Xmx{{regionserver_heapsize}} ${JDK_DEPENDED_OPTS} -XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=2g -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:MaxNewSize=4g -XX:InitiatingHeapOccupancyPercent=60 -XX:ParallelGCThreads=24 -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=5000 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=10102" {% if security_enabled %} export HBASE_OPTS="${HBASE_OPTS} -Djava.security.auth.login.config={{client_jaas_config_file}}" export HBASE_MASTER_OPTS="${HBASE_MASTER_OPTS} -Djava.security.auth.login.config={{master_jaas_config_file}}" export HBASE_REGIONSERVER_OPTS="${HBASE_REGIONSERVER_OPTS} -Djava.security.auth.login.config={{regionserver_jaas_config_file}}" {% endif %} # HBase off-heap MaxDirectMemorySize export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS {% if hbase_max_direct_memory_size %} -XX:MaxDirectMemorySize={{hbase_max_direct_memory_size}}m {% endif %}" 4.Add the hbase-site property, setting its value to the following: dfs.client.read.shortcircuit.buffer.size=131072 hbase.balancer.period=60000 hbase.client.ipc.pool.size=10 hbase.client.max.perregion.tasks=100 hbase.coprocessor.regionserver.classes=com.splicemachine.hbase.RegionServerLifecycleObserver hbase.hstore.compaction.max.size=260046848 hbase.hstore.compaction.min.size=16777216 hbase.hstore.compaction.min=5 hbase.hstore.defaultengine.compactionpolicy.class=com.splicemachine.compactions.SpliceDefaultCompactionPolicy hbase.hstore.defaultengine.compactor.class=com.splicemachine.compactions.SpliceDefaultCompactor hbase.htable.threads.max=96 hbase.ipc.warn.response.size=-1 hbase.ipc.warn.response.time=-1 hbase.master.loadbalance.bytable=TRUE hbase.mvcc.impl=org.apache.hadoop.hbase.regionserver.SIMultiVersionConsistencyControl hbase.regions.slop=0.01 hbase.regionserver.global.memstore.size.lower.limit=0.9 hbase.regionserver.lease.period=1200000 hbase.regionserver.maxlogs=48 hbase.regionserver.thread.compaction.large=1 hbase.regionserver.thread.compaction.small=4 hbase.regionserver.wal.enablecompression=TRUE hbase.splitlog.manager.timeout=3000 hbase.status.multicast.port=16100 hbase.wal.disruptor.batch=TRUE hbase.wal.provider=multiwal hbase.wal.regiongrouping.numgroups=16 hbase.zookeeper.property.tickTime=6000 hfile.block.bloom.cacheonwrite=TRUE io.storefile.bloom.error.rate=0.005 splice.authentication.native.algorithm=SHA-512 splice.authentication=NATIVE splice.client.numConnections=1 splice.client.write.maxDependentWrites=60000 splice.client.write.maxIndependentWrites=60000 splice.compression=snappy splice.marshal.kryoPoolSize=1100 splice.olap_server.clientWaitTime=900000 splice.ring.bufferSize=131072 splice.splitBlockSize=67108864 splice.timestamp_server.clientWaitTime=120000 splice.txn.activeTxns.cacheSize=10240 splice.txn.completedTxns.concurrency=128 splice.txn.concurrencyLevel=4096 5.Save Changes Click the Save button to save your changes. You'll be prompted to optionally add a note such as Updated HDFS configuration for Splice Machine. Click Save again. 6.Restart HBASE Return to the HBASE Configs tab in Ambari. Then click the Restart drop-down in the upper right corner and select the Restart All action to restart HBASE. Confirm your action and then wait for the restart to complete. Optional Configuration Modifications There are a few configuration modifications you might want to make: Modify the Authentication Mechanism if you want to authenticate users with something other than the default native authentication mechanism. Adjust the Replication Factor if you have a small cluster and need to improve resource usage or performance. Modify the Authentication Mechanism Splice Machine installs with Native authentication configured; native authentication uses the sys.sysusers table in the splice schema for configuring user names and passwords. You can disable authentication or change the authentication mechanism that Splice Machine uses to LDAP by following the simple instructions in Configuring Splice Machine Authentication Verify your Splice Machine Installation Now start using the Splice Machine command line interpreter, which is referred to as the splice prompt or simply splice> by launching the sqlshell.sh script on any node in your cluster that is running an HBase region server. NOTE: The command line interpreter defaults to connecting on port 1527 on localhost, with username splice, and password admin. You can override these defaults when starting the interpreter, as described in the Command Line (splice>) Reference topic in our Developer’s Guide. Now try entering a few sample commands you can run to verify that everything is working with your Splice Machine installation. Operation Command to perform operation Display tables splice> show tables; Create a table splice> create table test (i int); Add data to the table splice> insert into test values 1,2,3,4,5; Query data in the table splice> select * from test; Drop the table splice> drop table test; Exit the command line interpreter splice> exit; Make sure you end each command with a semicolon (;), followed by the Enter key or Return key See the Command Line (splice>) Reference section of our Developer's Guide for information about our commands and command syntax.

Online	Offline
Last Visited	‎10-11-2016 09:03 PM

Member Since	‎10-11-2016 02:30 PM
Last Visited	‎10-11-2016 09:03 PM
Posts	1

Cloudera Community

Installing and Configuring Splice Machine for Hort...