Member since
10-01-2016
156
Posts
8
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
8507 | 04-04-2019 09:41 PM | |
3239 | 06-04-2018 08:34 AM | |
1513 | 05-23-2018 01:03 PM | |
3024 | 05-21-2018 07:12 AM | |
1865 | 05-08-2018 10:48 AM |
08-27-2018
06:36 PM
Hi again @Felix Albani I use Intellij IDEA. I put the arguments into run -> edit configurations -> program arguments as below. But it didn't work.
... View more
08-27-2018
01:26 PM
Hi @Felix Albani I have used your suggestion before compiling. I think it doesn't make any different val spark = SparkSession.builder() .master("local[4]") .appName("SparkALS") .config("spark.executor.extraJavaOptions","-Xss4g") .config("driver-java-options","-Xss4g") .getOrCreate() Unfortunately, when I use TrainValidationSplit, even with only one grid params, I get the same error. It works fine without TrainValidationSplit.
... View more
08-26-2018
12:31 PM
<br> I try to build a model for movie lens rating data with Spark ALS. On Windows host, I use Spark 2.3.1. Data has just and 100.000 rows and three columns; userid, movieid, and rating. My machine has Intel i7 and 32 GB memory. I have increased executor memory to 10 G. I get java.lang.StackOverflowErrorMy error. My codes are below: object ErkansALS {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf()
.setMaster("local[4]")
.setAppName("SparkALS")
.setExecutorEnv("spark.driver.memory","8g")
.setExecutorEnv("spark.executor.memory","10g")
.setExecutorEnv("spark.sql.broadcastTimeout","1200")
val spark = SparkSession.builder()
.config(sparkConf)
.getOrCreate()
val movieRatings = spark.read.format("csv")
.option("header","true")
.option("inferSchema","true")
.load("ratings.csv")
.drop("timestamp")
val Array(training, test) = movieRatings.randomSplit(Array(0.8, 0.2),seed = 142)
training.cache()
val alsObject = new ALS()
.setUserCol("userId")
.setItemCol("movieId")
.setRatingCol("rating")
.setColdStartStrategy("drop")
.setNonnegative(true)
val paramGridObject = new ParamGridBuilder()
.addGrid(alsObject.rank, Array(12,14))
.addGrid(alsObject.maxIter, Array(18,20))
.addGrid(alsObject.regParam, Array(.17,.19))
.build()
val evaluator = new RegressionEvaluator()
.setMetricName("rmse")
.setLabelCol("rating")
.setPredictionCol("prediction")
val tvs = new TrainValidationSplit()
.setEstimator(alsObject)
.setEstimatorParamMaps(paramGridObject)
.setEvaluator(evaluator)
val model = tvs.fit(training)
val bestModel = model.bestModel
val predictions = bestModel.transform(test)
val rmse = evaluator.evaluate(predictions)
predictions.show()
println("RMSE = ", rmse)
println("Best Model")
}
}
Errors are attached. But when I try without TrainValidationSplit it works: package spark.ml.recommendation.als import org.apache.spark.ml.evaluation.RegressionEvaluator import org.apache.spark.ml.recommendation.ALS import org.apache.spark.ml.tuning.{TrainValidationSplit, ParamGridBuilder} import org.apache.spark.sql.{SparkSession} import org.apache.spark.{SparkConf, SparkContext} object ErkansALS { def main(args: Array[String]): Unit = { /* val sparkConf = new SparkConf() .setExecutorEnv("spark.driver.memory","4g") .setExecutorEnv("spark.executor.memory","8g") .setExecutorEnv("spark.sql.broadcastTimeout","1200") .setExecutorEnv("spark.eventLog.enabled","false")*/val spark = SparkSession.builder() .master("local[*]") .appName("SparkALS") .getOrCreate() val movieRatings = spark.read.format("csv") .option("header","true") .option("inferSchema","true") .load("C:\\Users\\toshiba\\SkyDrive\\veribilimi.co\\Datasets\\ml-latest-small\\ratings.csv") .drop("timestamp") // .sample(0.1,142)movieRatings.show() println(movieRatings.count()) // 100.004 adet rating var. // Create training and test setval Array(training, test) = movieRatings.randomSplit(Array(0.8, 0.2),seed = 142) training.cache() // Create ALS modelval alsObject = new ALS() .setUserCol("userId") .setItemCol("movieId") .setRatingCol("rating") .setColdStartStrategy("drop") .setNonnegative(true) /* // Tune model using ParamGridBuilder val paramGridObject = new ParamGridBuilder() .addGrid(alsObject.rank, Array(14)) .addGrid(alsObject.maxIter, Array(20)) .addGrid(alsObject.regParam, Array(.19)) .build()*/ // Define evaluator as RMSEval evaluator = new RegressionEvaluator() .setMetricName("rmse") .setLabelCol("rating") .setPredictionCol("prediction") /* // Build cross validation using TrainValidationSplit val tvs = new TrainValidationSplit() .setEstimator(alsObject) .setEstimatorParamMaps(paramGridObject) .setEvaluator(evaluator)*/ // Fit ALS model to training setval model = alsObject.fit(training) /* // Take best model val bestModel = model.bestModel*/ // Generate predictions and evaluate RMSEval predictions = model.transform(test) val rmse = evaluator.evaluate(predictions) predictions.show() // Print evaluation metrics and model parametersprintln("RMSE = ", rmse) } }
... View more
Labels:
- Labels:
-
Apache Spark
08-05-2018
03:01 AM
I got similar problem with Sandbox on Windows host. When I comment out any other 127.0.0.1 names and just left 127.0.0.1 localhost sandbox.hortonworks.com sandbox-hdp.hortonworks.com sandbox-hdf.hortonworks.com line in hosts file, then I was able to see Ambari login.
... View more
07-19-2018
02:41 PM
I copied /usr/hdp/current/atlas-client/hook/storm/atlas-storm-plugin-impl/storm-bridge-xxx.xxxxxxxxx.jar to /usr/hdp/current/storm-client/lib
/usr/hdp/current/storm-client/lib/usr/hdp/current/storm-client/extlib but it didin't work.
... View more
07-19-2018
01:49 PM
A week later I got similar problem again. When I closed some other services Atlas could be able to start. I think unsufficient resourses prevent Atlas to start.
... View more
07-18-2018
03:02 PM
Thanks @sunile.manjee your advise is helpful for me.
... View more
07-18-2018
07:11 AM
@Anshul Sisodia Should X and Y ports be different?
... View more
07-17-2018
06:48 AM
Hi @William Brooks Thank you, yes it works by writing whole username. But after username I had to press space button istead of enter.
... View more
07-16-2018
10:48 AM
Hi, in HDP 2.6.3, multinode, not kerberized environment I have integrated Zeppelin authentication with Microsoft AD. Datascientists can login and use Zeppelin properly but when they want to share notes Zeppelin doesn't fill dropbox when enter the first 3 initals.
... View more
Labels:
- Labels:
-
Apache Zeppelin