Member since
02-02-2016
583
Posts
518
Kudos Received
98
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3173 | 09-16-2016 11:56 AM | |
1354 | 09-13-2016 08:47 PM | |
5343 | 09-06-2016 11:00 AM | |
3083 | 08-05-2016 11:51 AM | |
5166 | 08-03-2016 02:58 PM |
05-13-2016
06:44 PM
Page 20 of the PDF explains how to further enable logging. ODBC user guide for HIVE
... View more
05-12-2016
08:37 AM
Ok Thanks! Seems adding this param works for me. #!/usr/bin/env bash
# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.
MASTER="yarn-cluster"
# Options read in YARN client mode
SPARK_EXECUTOR_INSTANCES="3" #Number of workers to start (Default: 2)
#SPARK_EXECUTOR_CORES="1" #Number of cores for the workers (Default: 1).
#SPARK_EXECUTOR_MEMORY="1G" #Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
#SPARK_DRIVER_MEMORY="512 Mb" #Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
#SPARK_YARN_APP_NAME="spark" #The name of your application (Default: Spark)
#SPARK_YARN_QUEUE="~@~Xdefault~@~Y" #The hadoop queue to use for allocation requests (Default: @~Xdefault~@~Y)
#SPARK_YARN_DIST_FILES="" #Comma separated list of files to be distributed with the job.
#SPARK_YARN_DIST_ARCHIVES="" #Comma separated list of archives to be distributed with the job.
# Generic options for the daemons used in the standalone deploy mode
# Alternate conf dir. (Default: ${SPARK_HOME}/conf)
export SPARK_CONF_DIR=${SPARK_CONF_DIR:-{{spark_home}}/conf}
# Where log files are stored.(Default:${SPARK_HOME}/logs)
#export SPARK_LOG_DIR=${SPARK_HOME:-{{spark_home}}}/logs
export SPARK_LOG_DIR={{spark_log_dir}}
# Where the pid file is stored. (Default: /tmp)
export SPARK_PID_DIR={{spark_pid_dir}}
# A string representing this instance of spark.(Default: $USER)
SPARK_IDENT_STRING=$USER
# The scheduling priority for daemons. (Default: 0)
SPARK_NICENESS=0
export HADOOP_HOME=${HADOOP_HOME:-{{hadoop_home}}}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-{{hadoop_conf_dir}}}
# The java implementation to use.
export JAVA_HOME={{java_home}}
if [ -d "/etc/tez/conf/" ]; then
export TEZ_CONF_DIR=/etc/tez/conf
else
export TEZ_CONF_DIR=
fi
ps:it works well but seems the params passed via command line (e.g.: --num-executors 8--num-executor-core 4--executor-memory 2G) are not taken in consideration. Instead, if I set the executors in "spark-env template" filed of Ambari, the params are taken in consideration. Anyway now it works 🙂 Thanks a lot.
... View more
05-12-2016
04:18 PM
@JR Cao please accept my post as an answer if you are good with the provided information.
... View more
04-26-2016
09:55 AM
@JR Cao Thanks for the update, I think you don't need to specify spark-env since you already had --deploy-mode client.
... View more
04-14-2016
05:04 PM
I won't use any compression in the Sqoop command. Still it store it as .deflate format. This is only happening with Teradata as I am using Teradata connector for HDP 2.3.4.0.
... View more
04-17-2016
06:59 PM
@Benjamin Leonhardi - This was indeed part of the reason. Thank you very much for your help!
... View more
11-03-2016
04:48 AM
@Saurabh Try doing : set hive.exec.scratchdir=/new_dir
... View more
03-30-2016
11:40 PM
I got it to work with the following in my repo I linked earlier hdfs dfs -put drivers/* /tmp/udfs
beeline
!connect jdbc:hive2://localhost:10000 “” ””
add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongo-hadoop-hive-1.5.0-SNAPSHOT.jar;
add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongo-hadoop-core-1.5.0-SNAPSHOT.jar;
add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongodb-driver-3.0.4.jar;
DROP TABLE IF EXISTS bars;
CREATE EXTERNAL TABLE bars
(
objectid STRING,
Symbol STRING,
TS STRING,
Day INT,
Open DOUBLE,
High DOUBLE,
Low DOUBLE,
Close DOUBLE,
Volume INT
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"objectid":"_id",
"Symbol":"Symbol", "TS":"Timestamp", "Day":"Day", "Open":"Open", "High":"High", "Low":"Low", "Close":"Close", "Volume":"Volume"}')
TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/marketdata.minibars');
... View more
03-16-2016
07:06 AM
3 Kudos
In general, Zookeeper doesn't actually required huge drives because it will only store metadata information for many services, I have seen customer using 100G to 250G of partition size for zookeeper data directory and logs which is fine of many cluster deployment. Moreover administrator need to set configuration for automatic purging policy of snapshots and logs directories so that we don't end up by filling all the local storage. Please refer below doc for more info. http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html
... View more
- « Previous
- Next »