Member since
04-20-2016
61
Posts
17
Kudos Received
13
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
886 | 06-30-2017 01:16 PM | |
890 | 06-30-2017 01:03 PM | |
1129 | 06-30-2017 12:50 PM | |
1132 | 06-30-2017 12:40 PM | |
13183 | 06-30-2017 12:36 PM |
07-13-2016
12:45 PM
does your spark job failed ? These messages can be because if spark dynamic allocation, possibly release of executor. Maybe resource are not free on the YARN, containers timeout Any other error message in the log?
... View more
07-13-2016
11:52 AM
It is happening during pause on long-running jobs on a large data set. As per the logs, during a shuffle step an executor fails and doesn't report its output, and during the reduce step, that output can't be found where expected and rather than rerunning the failed execution, Spark goes down.
Try to reduce parallelism to executors x cores.
... View more
06-07-2016
06:47 AM
Good info @Rajkumar Singh, As HBase provides a storage handler in hive. What all storage handlers do we have for hive? as per the doc Cassandra, JDBC, MongoDB, and Google Spreadsheets.
... View more
06-07-2016
06:23 AM
I was reading about the presto, where a single Presto query can process data from multiple sources e.g. HDFS, MySQL, Cassandra or even Kafka.
Presto, where you can define objects called 'catalogs' which can point to remote data sources.
Do we have such mechanism in Hive to process data from multiple sources? Also can we access another hive table(from remote source) in same beeline connection?
... View more
Labels:
- Labels:
-
Apache Hive
05-30-2016
05:03 AM
What is the replication factor defined, as 1 datanode down should not cause the corruption. Were other datanodes also down at the same time?
... View more
05-28-2016
10:59 AM
Yes @Jitendra Yadav, I can see the same in the logs, spark.executor.instances overrides dynamic allocation properties. But my question is where we should define dynamic allocation settings, in spark-defaults.conf or spark-thrift-sparkconf.conf
... View more
05-27-2016
10:53 AM
Hi All, I was testing spark dynamic resource allocation in spark. By default I see "spark-thrift-sparkconf.conf" contains all the dynamic allocation properties. But when I run the spark job "spark-shell --master yarn --num-executors 5 --executor-memory 3G", I expect it complain as I've requested number of executor in the job itself. Then I modifed the custom spark-defaults.conf and added dynamic allocation properties: spark.dynamicAllocation.enabled true
spark.dynamicAllocation.initialExecutors 1
spark.dynamicAllocation.maxExecutors 5
spark.dynamicAllocation.minExecutors 1 And when I run the same job, I see below messages : 16/05/23 09:18:54 WARN SparkContext: Dynamic Allocation and num executors both set, thus dynamic allocation disabled.
Also print below messages if needed more resources. My doubt is is dynamic allocation is defined by default? Which config we should define dynamic allocation properties? 6/05/23 09:39:47 INFO ExecutorAllocationManager: Requesting 2 new executors because tasks are backlogged (new desired total will be 4) 16/05/23 09:39:48 INFO ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 5)
... View more
Labels:
- Labels:
-
Apache Spark
05-25-2016
11:54 AM
Hdfs can be accessed in R by specifying namenode host (hdfs://<hostname>:/user/test), but in case of namenode failover it won't work. So we should set hdfs config directory in R so it recognizes the hadoop namenode and failover configurations.. Set Hadoop and Spark home and config directories into R environment as below: # set up the SPARK_HOME
Sys.setenv (SPARK_HOME="/usr/hdp/current/spark-client")
#set up the HADOOP config dir
Sys.setenv (YARN_CONF_DIR="/usr/hdp/current/hadoop-client/conf")
Sys.setenv (HADOOP_CONF_DIR="/usr/hdp/current/hadoop-client/conf")
# read the data file
sc = sparkR.init(sparkPackages="com.databricks:spark-csv_2.11:1.0.3")
sqlContext = sparkRSQL.init(sc)
people = read.df(sqlContext, "hdfs://HDFS-HA/users/people.json", "json")
head(people)
... View more
Labels:
05-05-2016
07:32 AM
3 Kudos
Hi Simarn, Seems like you have defined "org.apache.spark.yarn.network.YarnShuffleService" in your yarn-site.xml, but the jar containing that class "spark-<version>-yarn-shuffle.jar" is missing in your nodemanager classpath. Please add it similar to mapreduce_shuffle class.
... View more
04-29-2016
07:21 AM
Can you please provide the jdbc connection string ?
Are you giving something like this : jdbc:sqlserver://xx.xx.x.xxx:1433;databaseName=Test
... View more
- « Previous
- Next »