Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

In spark shell, while using weightCol in RandomForestClassifier throws error: value weightCol is not a member of org.apache.spark.ml.classification.RandomForestClassifier

Solved Go to solution

In spark shell, while using weightCol in RandomForestClassifier throws error: value weightCol is not a member of org.apache.spark.ml.classification.RandomForestClassifier

New Contributor

Even though i have already imported all the necessary libraries for using RandomForestClassifier with weightCol parameter, I still get the following error: value weightCol is not a member of org.apache.spark.ml.classification.RandomForestClassifier. I'm currently using Spark 1.6.1.

Here is my code:

import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.classification.{RandomForestClassificationModel, RandomForestClassifier}
import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
import org.apache.spark.ml.feature.{IndexToString, StringIndexer, VectorIndexer}
import org.apache.spark.ml.classification.RandomForestClassifier
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
import org.apache.spark.ml.feature.StringIndexer
import org.apache.spark.ml.feature.VectorAssembler
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions.{unix_timestamp, from_unixtime, to_date}
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
import sqlContext.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions.{unix_timestamp, from_unixtime, to_date}
import org.apache.spark.mllib.evaluation.MulticlassMetrics
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.sql.Row
import org.apache.spark.ml.param.shared.HasWeightCol
val raw = sqlContext.sql("SELECT * FROM fraudegt.sample_cdr_train_v2")
val mod = raw.withColumn("id", raw("id").cast("string"))
val mod1 = mod.na.fill(0)
val assembler = new VectorAssembler().setInputCols(Array("hora_del_dia","dia_mes","duracion","duracion_dia","duracion_24h","avg_duracion_dia","avg_duracion_24h","avg_duracion_historica","celdas_iniciales_distintas_dia","celdas_iniciales_distintas_historico","celdas_finales_distintas_dia","celdas_finales_distintas_historico","pmc_dia","pmc_historico","imcd_dia","imcd_historico","llamadas_en_dia")).setOutputCol("features")
val df_all = assembler.transform(mod1)
val labelIndexer = new StringIndexer().setInputCol("fraude").setOutputCol("label")
val df = labelIndexer.fit(df_all).transform(df_all)
val splits = df.randomSplit(Array(0.7, 0.3))
val (trainingData, testData) = (splits(0), splits(1))
val classifier = new RandomForestClassifier().setImpurity("gini").setMaxDepth(4).setNumTrees(100).setFeatureSubsetStrategy("auto").setSeed(5043)
val model = classifier.fit(trainingData)
val model2 = classifier.fit(trainingData, classifier.weightCol->"weight")
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: In spark shell, while using weightCol in RandomForestClassifier throws error: value weightCol is not a member of org.apache.spark.ml.classification.RandomForestClassifier

Explorer

weightCol is not part of RandomForestClassifier,its part of logisticrregression

@inherit_doc
[docs]class RandomForestClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasSeed,
                             HasRawPredictionCol, HasProbabilityCol, 

RandomForestParams, TreeClassifierParams, HasCheckpointInterval):

@inherit_doc
[docs]class LogisticRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
                         HasRegParam, HasTol, HasProbabilityCol, HasRawPredictionCol,
                         HasElasticNetParam, HasFitIntercept, HasStandardization, HasThresholds,
                         HasWeightCol):
    """
http://spark.apache.org/docs/1.6.1/api/python/_modules/pyspark/ml/classification.html

View solution in original post

1 REPLY 1
Highlighted

Re: In spark shell, while using weightCol in RandomForestClassifier throws error: value weightCol is not a member of org.apache.spark.ml.classification.RandomForestClassifier

Explorer

weightCol is not part of RandomForestClassifier,its part of logisticrregression

@inherit_doc
[docs]class RandomForestClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasSeed,
                             HasRawPredictionCol, HasProbabilityCol, 

RandomForestParams, TreeClassifierParams, HasCheckpointInterval):

@inherit_doc
[docs]class LogisticRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
                         HasRegParam, HasTol, HasProbabilityCol, HasRawPredictionCol,
                         HasElasticNetParam, HasFitIntercept, HasStandardization, HasThresholds,
                         HasWeightCol):
    """
http://spark.apache.org/docs/1.6.1/api/python/_modules/pyspark/ml/classification.html

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here