Support Questions

koukou · ‎06-25-2020

I tried to fit a random forest classifier in pyspark but i'm getting this error:

Py4JJavaError: An error occurred while calling o767.fit. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 30.0 failed 1 times, most recent failure: Lost task 0.0 in stage 30.0 (TID 853, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space

Can anyone help me please

My code :

from pyspark.ml.tuning import ParamGridBuilderrf = RandomForestClassifier(labelCol="label", featuresCol="features")
paramGrid = (ParamGridBuilder()
             .addGrid(rf.numTrees, [100])
             .build())
crossval = CrossValidator(estimator=rf, estimatorParamMaps=paramGrid, 
                          evaluator=BinaryClassificationEvaluator(), 
                          numFolds=10)
cvModel = crossval.fit(trainingData)
predictions = crossval.transform(testData)
predictions.printSchema()

AKR · ‎09-11-2020

Hi,

From the above Error "java.lang.OutOfMemoryError: Java heap space" this seems to be memory issue.Could you please increase the memory settings in your job and rerun the job again?

Thanks

AKR