Member since
06-25-2016
11
Posts
2
Kudos Received
0
Solutions
08-30-2016
07:09 PM
1 Kudo
Hi @Kuldeep Kulkarni.
In the datanode & namenode directories path i am giving "/mnt/resource/hadoop/hdfs/namenode & /mnt/resource/hadoop/hdfs/data" .Is this right ? Please refer the attached screen shot of available partitions image.png. Thank you.
... View more
08-30-2016
06:30 PM
1 Kudo
The Namenode & Datanode directories in HDFS default pointing to sda1(which has less space).
I want to point the directories to sdb1 which has larger space. what is the procedure to change the Namenode & Datanode directories. As of now i do not have any data on HDFS so do not have problem of losing data while changing directories.
Please let me know the steps. Below is the screen shot. image.png
... View more
Labels:
08-17-2016
12:43 AM
i figured the cause of my issue i was facing. In the above piece of code i am doing a nested transformation which is not supported by Spark. Any approach i should follow to avoid this nested transformation ?
... View more
08-15-2016
07:05 PM
Its the same code as above without collect() preProcessedDataFrame.registerTempTable("tApplicationid")
preProcessedDataFrame.select(ApplicationId).distinct().foreach(record => {
val applicationId = record.getAs[String](ApplicationId)
val selectedApplicationDataFrame = sqlContext.sql("SELECT * FROM tApplicationId WHERE ApplicationId = " + applicationId)
logger.info("selectedApplicationId: " + applicationId)
//DO FURTHER PROCESSSING on selectedApplicationDataFrame.....
})
... View more
08-15-2016
06:51 PM
Yes i have 1 fat jar that has all the dependencies. The thing is when i use collect() in my below code it works on yarn-cluster. But using collect() removes parallelism on further operations, so i don want to use collect. Without using collect() statement i get "java.lang.NoClassDefFoundError" exception as mentioned above.
The below code on local mode works fine without collect statement. Please help me understand this behavior. preProcessedDataFrame.registerTempTable("tApplication")
preProcessedDataFrame.select(ApplicationId).distinct().collect().foreach(record => {
val applicationId = record.getAs[String](ApplicationId)
val selectedApplicationDataFrame = sqlContext.sql("SELECT * FROM tApplication WHERE ApplicationId = " + applicationId)
logger.info("selectedApplicationId: " + applicationId)
//DO FURTHER PROCESSSING on selectedApplicationDataFrame.....
})
... View more
08-15-2016
05:22 AM
Yes i am doing the same. Below is my spark-submit command.
spark-submit --class com.BreakpointPredictionDriver --master yarn-cluster --num-executors 4 --driver-memory 8g --executor-memory 8g --executor-cores 4 --name BreakPointPrediction --jars /usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,/usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar hdfs://PROD1/testdata/spark-jobs/bp-preprocessing-assembly-1.0.0.jar --hbaseZookeeperQuorum 127.0.0.1 --hbaseZookeeperPort 2181
... View more
08-15-2016
04:57 AM
Below is the exception details. The same above piece of code runs on LOCAL-MODE. On YARN-CLUSTER i see the below error during iteration. java.lang.NoClassDefFoundError: Could not initialize class com.sg.ae.bp.BreakpointPredictionDriver$
at com.sg.ae.bp.BreakpointPredictionDriver$anonfun$com$sg$ae$bp$BreakpointPredictionDriver$run$1.apply(BreakpointPredictionDriver.scala:126)
at com.sg.ae.bp.BreakpointPredictionDriver$anonfun$com$sg$ae$bp$BreakpointPredictionDriver$run$1.apply(BreakpointPredictionDriver.scala:124)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$anonfun$foreach$1$anonfun$apply$34.apply(RDD.scala:919)
at org.apache.spark.rdd.RDD$anonfun$foreach$1$anonfun$apply$34.apply(RDD.scala:919)
at org.apache.spark.SparkContext$anonfun$runJob$5.apply(SparkContext.scala:1881)
at org.apache.spark.SparkContext$anonfun$runJob$5.apply(SparkContext.scala:1881)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
... View more
08-15-2016
02:35 AM
I am working on Spark 1.6.1 version and have a requirement to fetch distinct results of a column using Spark DataFrames.
The column contains ~50 million records and doing a collect() operation slows down further operation on the result dataframe and there is No parallelism.
Using the below piece of code on a local mode works fine. But on a yarn-cluster mode i get "java.lang.NoClassDefFoundError". preProcessedDataFrame.registerTempTable("tTempTable")
preProcessedDataFrame.distinct().foreach(record => {
val applicationId = record.getAs[Int]("ApplicationId")
val selectedApplicationDataFrame = sqlContext.sql("SELECT * FROM tTempTable WHERE ApplicationId = " + applicationId)
selectedApplicationDataFrame.show(20)
//FURTHER DO SOME MORE CALC BASED ON EACH APPLICATION-ID
}) Can someone tell me the reason for the error or any other better approach to achieve the same result.
... View more
Labels:
- Labels:
-
Apache Spark