I tried to fit a random forest classifier in pyspark but i'm getting this error:
Py4JJavaError: An error occurred while calling o767.fit. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 30.0 failed 1 times, most recent failure: Lost task 0.0 in stage 30.0 (TID 853, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
Can anyone help me please
My code :
from pyspark.ml.tuning import ParamGridBuilderrf = RandomForestClassifier(labelCol="label", featuresCol="features")
paramGrid = (ParamGridBuilder() .addGrid(rf.numTrees, [100]) .build())
crossval = CrossValidator(estimator=rf, estimatorParamMaps=paramGrid,
evaluator=BinaryClassificationEvaluator(),
numFolds=10)
cvModel = crossval.fit(trainingData)
predictions = crossval.transform(testData)
predictions.printSchema()