About yvora

yvora · ‎03-22-2017

@Mateusz Grabowski, Ideally other jobs running in seperate queue ( example: streaming) should not affect your zeppelin processes. You can set up max limit for q_apr_general queue to make sure that minimum 60% of the resources are reserved for default queue. ( set yarn.scheduler.capacity.root.q_apr_general.maximum-capacity=40) reference for capacity scheduler config : https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_performance_tuning/content/section_create_configure_yarn_capacity_scheduler_queues.html Regarding spark sql execution time, There have been few reports of slow execution with spark sql via zeppelin. Apache Jira tracking this issue : ZEPPELIN-323 https://community.hortonworks.com/questions/33484/spark-sql-query-execution-is-very-very-slow-when-c.html

yvora · ‎03-22-2017

@Sree Kupp, 1. Both the Spark Thrift Servers keep failing suddenly out of the blue. I am not sure if it is some configuration issue (like not having enough heap size so even though it starts up when I start it, eventually it fails). A cluster can have spark1 and spark2 thrift server running together. Is spark1 and spark2 thrift server deployed on same host ? Can you please check what is the error message for spark thrift server failure? 2. Can I have both the Sparks running simulatneously? Or will that cause any memory overload on the cluster? Yes , you can have both the spark running simultaneously. Regarding memory overload, If you are using yarn-client or yarn-cluster mode to run the spark applications, It won't memory overload the client machine. 3. In the ODBC Driver DSN setup, when I click on "Test" option, sometimes it fails even when the thrift server is up and running. The error is: "[Hortonworks][Hardy] (34) Error from server: connect() failed: errno = 10061." I found few good links to handle this issue. Seems like many people hit similar issue. I hope this helps. http://kb.tableau.com/articles/issue/error-connect-failed-hadoop-hive https://community.hortonworks.com/questions/33046/hortonworks-hive-odbc-driver-dsn-setup.html https://community.hortonworks.com/questions/10192/facing-issue-with-odbc-connection.html

yvora · ‎03-19-2017

@shiremath, Found few blogs which can help. Fault Injection and Elastic Partitioning Hadoop code injection distributed fault injection

yvora · ‎03-19-2017

@Ward Bekker, Firstly, find out the correct configuration in spark to occupy a full cluster. You will need to tune num of executors, executor cores, memory, driver memory etc. References: https://community.hortonworks.com/questions/56240/spark-num-executors-setting.html http://stackoverflow.com/questions/37871194/how-to-tune-spark-executor-number-cores-and-executor-memory After figuring out the correct configs, you can use one of the below approaches to set up zeppelin and livy interpreter. 1) You can set SPARK_SUBMIT_OPTIONS in conf/zeppelin-env.sh to specify number of executors, executor cores, memory, driver memory etc . ( This config will be applied on all the spark & livy interpreters ) export SPARK_SUBMIT_OPTIONS="--num-executors X --executor-cores Y --executor-memory Z" 2) Set configs in livy interpreter. Open livy interpreter page and add below configs in livy interpreter. livy.spark.executor.instances X livy.spark.executor.cores Y livy.spark.executor.memory Z

yvora · ‎03-17-2017

Found one tutorial for Azure Vms. Here they are using " ssh root@127.0.0.1 -p 2222" . can you try that ? https://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-hortonworks-sandbox/

yvora · ‎03-16-2017

yes, that sounds right.

yvora · ‎03-16-2017

@Suzanne Dimant, Also make sure yo do ssh to docker container, not the VM. Refer to https://community.hortonworks.com/questions/68334/-bash-ambari-admin-password-reset-command-not-foun.html https://community.hortonworks.com/questions/58247/hdp-25-sandboxvm-commandsscripts-are-not-found.html

yvora · ‎03-16-2017

In order to ambari-agent-password-reset to work, the agent should be running fine. Can you please check ambari agent logs? You can find ambari-agent logs at /var/log/ambari-agent. Let's check if it has any error/exceptions.

yvora · ‎03-16-2017

@Suzanne Dimant, From the output of ps -ef | grep Ambari, It seems ambari server (pid=4772) and agent (pid=7473) are running. There has been few issues noticed in HDP2.5 sandbox regarding ambari login. Please follow below HCC thread. https://community.hortonworks.com/questions/57064/hdp25-on-virtualbox-and-ambari-login-url.html

yvora · ‎03-16-2017

@Faisal R Ahamed, You should use spark-submit to run this application. While running application specify --master yarn and --deploy-mode cluster. Specifying to spark conf is too late to switch to yarn-cluster mode. spark-submit --class <clasname> --master yarn --deploy-mode cluster <jars> <args> https://www.mail-archive.com/user@spark.apache.org/msg57869.html

Online	Offline
Last Visited	‎10-25-2018 06:32 PM

Member Since	‎10-24-2015 06:41 PM
Last Visited	‎10-25-2018 06:32 PM
Posts	171
Kudos received	375

Cloudera Community

Re: yarn cache files does not have execute permiss...

Re: What is the use of zookeeper.out?

Re: how to know the reason for missing blocks

Re: Best way to monitor/move hadoop files through ...

Re: Limit in number of Yarn jobs

Re: Spark job in YARN queue depends on jobs in ano...

Re: Hortonworks Spark ODBC Driver Connection Keeps...

Re: how to inject fault incrementally and explore ...

Re: How to increase amount of containers (and exec...

Re: Cannot log into Ambari Dashboard for HDP2.5 Az...

Re: Cannot log into Ambari Dashboard for HDP2.5 Az...

Re: Cannot log into Ambari Dashboard for HDP2.5 Az...

Re: Cannot log into Ambari Dashboard for HDP2.5 Az...

Re: Cannot log into Ambari Dashboard for HDP2.5 Az...

Re: Difference between local[*] vs yarn cluster vs...