About jyadav

ruiming_wu · ‎11-28-2018

so what's the best to resolve this issue for production

devers1 · ‎05-13-2016

Page 20 of the PDF explains how to further enable logging. ODBC user guide for HIVE

pietro_fragnit1 · ‎05-12-2016

Ok Thanks! Seems adding this param works for me. #!/usr/bin/env bash # This file is sourced when running various Spark programs. # Copy it as spark-env.sh and edit that to configure Spark for your site. MASTER="yarn-cluster" # Options read in YARN client mode SPARK_EXECUTOR_INSTANCES="3" #Number of workers to start (Default: 2) #SPARK_EXECUTOR_CORES="1" #Number of cores for the workers (Default: 1). #SPARK_EXECUTOR_MEMORY="1G" #Memory per Worker (e.g. 1000M, 2G) (Default: 1G) #SPARK_DRIVER_MEMORY="512 Mb" #Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb) #SPARK_YARN_APP_NAME="spark" #The name of your application (Default: Spark) #SPARK_YARN_QUEUE="~@~Xdefault~@~Y" #The hadoop queue to use for allocation requests (Default: @~Xdefault~@~Y) #SPARK_YARN_DIST_FILES="" #Comma separated list of files to be distributed with the job. #SPARK_YARN_DIST_ARCHIVES="" #Comma separated list of archives to be distributed with the job. # Generic options for the daemons used in the standalone deploy mode # Alternate conf dir. (Default: ${SPARK_HOME}/conf) export SPARK_CONF_DIR=${SPARK_CONF_DIR:-{{spark_home}}/conf} # Where log files are stored.(Default:${SPARK_HOME}/logs) #export SPARK_LOG_DIR=${SPARK_HOME:-{{spark_home}}}/logs export SPARK_LOG_DIR={{spark_log_dir}} # Where the pid file is stored. (Default: /tmp) export SPARK_PID_DIR={{spark_pid_dir}} # A string representing this instance of spark.(Default: $USER) SPARK_IDENT_STRING=$USER # The scheduling priority for daemons. (Default: 0) SPARK_NICENESS=0 export HADOOP_HOME=${HADOOP_HOME:-{{hadoop_home}}} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-{{hadoop_conf_dir}}} # The java implementation to use. export JAVA_HOME={{java_home}} if [ -d "/etc/tez/conf/" ]; then export TEZ_CONF_DIR=/etc/tez/conf else export TEZ_CONF_DIR= fi ps:it works well but seems the params passed via command line (e.g.: --num-executors 8--num-executor-core 4--executor-memory 2G) are not taken in consideration. Instead, if I set the executors in "spark-env template" filed of Ambari, the params are taken in consideration. Anyway now it works 🙂 Thanks a lot.

jyadav · ‎05-12-2016

@JR Cao please accept my post as an answer if you are good with the provided information.

jyadav · ‎04-26-2016

@JR Cao Thanks for the update, I think you don't need to specify spark-env since you already had --deploy-mode client.

roy_mugdha · ‎04-14-2016

I won't use any compression in the Sqoop command. Still it store it as .deflate format. This is only happening with Teradata as I am using Teradata connector for HDP 2.3.4.0.

maeve_ryan226 · ‎04-17-2016

@Benjamin Leonhardi - This was indeed part of the reason. Thank you very much for your help!

PARTOMIA · ‎11-03-2016

@Saurabh Try doing : set hive.exec.scratchdir=/new_dir

aervits · ‎03-30-2016

I got it to work with the following in my repo I linked earlier hdfs dfs -put drivers/* /tmp/udfs beeline !connect jdbc:hive2://localhost:10000 “” ”” add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongo-hadoop-hive-1.5.0-SNAPSHOT.jar; add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongo-hadoop-core-1.5.0-SNAPSHOT.jar; add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongodb-driver-3.0.4.jar; DROP TABLE IF EXISTS bars; CREATE EXTERNAL TABLE bars ( objectid STRING, Symbol STRING, TS STRING, Day INT, Open DOUBLE, High DOUBLE, Low DOUBLE, Close DOUBLE, Volume INT ) STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler' WITH SERDEPROPERTIES('mongo.columns.mapping'='{"objectid":"_id", "Symbol":"Symbol", "TS":"Timestamp", "Day":"Day", "Open":"Open", "High":"High", "Low":"Low", "Close":"Close", "Volume":"Volume"}') TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/marketdata.minibars');

jyadav · ‎03-16-2016

In general, Zookeeper doesn't actually required huge drives because it will only store metadata information for many services, I have seen customer using 100G to 250G of partition size for zookeeper data directory and logs which is fine of many cluster deployment. Moreover administrator need to set configuration for automatic purging policy of snapshots and logs directories so that we don't end up by filling all the local storage. Please refer below doc for more info. http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html

Online	Offline
Last Visited	‎06-02-2017 09:42 PM

Member Since	‎02-02-2016 09:29 AM
Last Visited	‎06-02-2017 09:42 PM
Posts	583
Kudos received	518

Cloudera Community

Re: Ambari release versioning

Re: Atlas application log showing below error

Re: failed to start hive from root

Re: corrupted block issue..i have 100+ corrupted b...

Re: What's causing ClassNotFound: RangerHiveAuthor...

Re: HBase master fails to start

Re: Hortonworks Hive ODBC Driver DSN setup

Re: Spark not using Yarn cluster resources

Re: Default Spark env config error in HDP2.4 insta...

Re: Error info: 16/04/24 13:42:24 WARN YarnSchedul...

Re: Sqoop import data from Teradata to HDP as defa...

Re: HDFS- Non DFS space allocation/capacity

Re: Can we change location of staging data dir in ...

Re: Mongodb with hive : Error, return code 1 from ...

Re: Zookeeper storage calculation