About nyadav

nyadav · ‎07-13-2016

does your spark job failed ? These messages can be because if spark dynamic allocation, possibly release of executor. Maybe resource are not free on the YARN, containers timeout Any other error message in the log?

nyadav · ‎07-13-2016

It is happening during pause on long-running jobs on a large data set. As per the logs, during a shuffle step an executor fails and doesn't report its output, and during the reduce step, that output can't be found where expected and rather than rerunning the failed execution, Spark goes down. Try to reduce parallelism to executors x cores.

nyadav · ‎06-07-2016

Good info @Rajkumar Singh, As HBase provides a storage handler in hive. What all storage handlers do we have for hive? as per the doc Cassandra, JDBC, MongoDB, and Google Spreadsheets.

nyadav · ‎06-07-2016

I was reading about the presto, where a single Presto query can process data from multiple sources e.g. HDFS, MySQL, Cassandra or even Kafka. Presto, where you can define objects called 'catalogs' which can point to remote data sources. Do we have such mechanism in Hive to process data from multiple sources? Also can we access another hive table(from remote source) in same beeline connection?

nyadav · ‎05-30-2016

What is the replication factor defined, as 1 datanode down should not cause the corruption. Were other datanodes also down at the same time?

nyadav · ‎05-28-2016

Yes @Jitendra Yadav, I can see the same in the logs, spark.executor.instances overrides dynamic allocation properties. But my question is where we should define dynamic allocation settings, in spark-defaults.conf or spark-thrift-sparkconf.conf

nyadav · ‎05-27-2016

Hi All, I was testing spark dynamic resource allocation in spark. By default I see "spark-thrift-sparkconf.conf" contains all the dynamic allocation properties. But when I run the spark job "spark-shell --master yarn --num-executors 5 --executor-memory 3G", I expect it complain as I've requested number of executor in the job itself. Then I modifed the custom spark-defaults.conf and added dynamic allocation properties: spark.dynamicAllocation.enabled true spark.dynamicAllocation.initialExecutors 1 spark.dynamicAllocation.maxExecutors 5 spark.dynamicAllocation.minExecutors 1 And when I run the same job, I see below messages : 16/05/23 09:18:54 WARN SparkContext: Dynamic Allocation and num executors both set, thus dynamic allocation disabled. Also print below messages if needed more resources. My doubt is is dynamic allocation is defined by default? Which config we should define dynamic allocation properties? 6/05/23 09:39:47 INFO ExecutorAllocationManager: Requesting 2 new executors because tasks are backlogged (new desired total will be 4) 16/05/23 09:39:48 INFO ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 5)

nyadav · ‎05-25-2016

Hdfs can be accessed in R by specifying namenode host (hdfs://<hostname>:/user/test), but in case of namenode failover it won't work. So we should set hdfs config directory in R so it recognizes the hadoop namenode and failover configurations.. Set Hadoop and Spark home and config directories into R environment as below: # set up the SPARK_HOME Sys.setenv (SPARK_HOME="/usr/hdp/current/spark-client") #set up the HADOOP config dir Sys.setenv (YARN_CONF_DIR="/usr/hdp/current/hadoop-client/conf") Sys.setenv (HADOOP_CONF_DIR="/usr/hdp/current/hadoop-client/conf") # read the data file sc = sparkR.init(sparkPackages="com.databricks:spark-csv_2.11:1.0.3") sqlContext = sparkRSQL.init(sc) people = read.df(sqlContext, "hdfs://HDFS-HA/users/people.json", "json") head(people)

nyadav · ‎05-05-2016

Hi Simarn, Seems like you have defined "org.apache.spark.yarn.network.YarnShuffleService" in your yarn-site.xml, but the jar containing that class "spark-<version>-yarn-shuffle.jar" is missing in your nodemanager classpath. Please add it similar to mapreduce_shuffle class.

nyadav · ‎04-29-2016

Can you please provide the jdbc connection string ? Are you giving something like this : jdbc:sqlserver://xx.xx.x.xxx:1433;databaseName=Test

Online	Offline
Last Visited	‎08-15-2019 06:35 AM

Member Since	‎04-20-2016 12:05 PM
Last Visited	‎08-15-2019 06:35 AM
Posts	61
Kudos received	17

Cloudera Community

Re: NN failed during setting up cross realm trust ...

Re: java.io.IOException: Cannot find AWS access ke...

Re: Storm impersonation is not working. Appreciate...

Re: Unable to see Hive tables in Atlas UI after ...

Re: I am trying to insert into HIVE table through ...

Re: Spark memory issue

Re: Spark memory issue

Re: Does hive supports query from multiple sources...

Does hive supports query from multiple sources?

Re: How to fix corrupt blocks

Re: spark dynamic allocation setting

spark dynamic allocation setting

How to access HDFS Files using Spark through HA co...

Re: YARN service stops when started class not foun...

Re: hi, i am able to list the databases but i am u...