Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Error in executing Logistic Regression Algorithm in scala

Error in executing Logistic Regression Algorithm in scala

New Contributor
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
import com.databricks.spark.csv._
import org.apache.spark.mllib.linalg.{Vector, Vectors}
import org.apache.spark.sql._
import org.apache.spark.ml.feature.StringIndexer
import org.apache.spark.ml.feature.VectorAssembler
//for labelled points
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.classification.{SVMModel, SVMWithSGD}
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.rdd.RDD
import org.apache.spark.mllib.classification.{LogisticRegressionModel, LogisticRegressionWithLBFGS}
import org.apache.spark.mllib.evaluation.MulticlassMetrics
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.util.MLUtils
//To Concatenate 2 columns of DataFrame
object TupleUDFs {
  import org.apache.spark.sql.functions.udf      
  // type tag is required, as we have a generic udf
  import scala.reflect.runtime.universe.{TypeTag, typeTag}


  def toTuple2[S: TypeTag, T: TypeTag] = 
    udf[(S, T), S, T]((x: S, y: T) => (x, y))
}


object DataCleaning {


  def main(args: Array[String]): Unit = {
    //Defining Spark Context
    val conf = new
    SparkConf().setAppName("csvDataFrame").setMaster("local[2]")
    val sc = new SparkContext(conf)
    val sqlContext=new SQLContext(sc)
import sqlContext.implicits._
    //Loaded CSV File
    val options=Map("header"->"true", "path"->"/home/abhi/pigspark/SampleData/PlayerData.csv")
    val players=sqlContext.load("com.databricks.spark.csv",options)
   
    //Type Casting to Integer
    val players1 = players.selectExpr("File","FirstName","LastName","Position","TeamName","cast(Season as double) Season","cast(AerialDuelslost as double) AerialDuelslost","cast(AerialDuelswon as double) AerialDuelswon","cast(Appearances as double) Appearances","cast(BackwardPasses as double) BackwardPasses","cast(BlockedShots as double) BlockedShots","cast(Blocks as double) Blocks","cast(Catches as double) Catches","cast(CleanSheets as double) CleanSheets","cast(ClearancesOfftheLine as double) ClearancesOfftheLine","cast(CornersTaken as double) CornersTaken","cast(CornersWon as double) CornersWon","cast(CrossesnotClaimed as double) CrossesnotClaimed","cast(Drops as double) Drops","cast(Duelslost as double) Duelslost","cast(Duelswon as double) Duelswon","cast(FiftyFifty as double) FiftyFifty","cast(ForwardPasses as double) ForwardPasses","cast(FoulWonPenalty as double) FoulWonPenalty","cast(GamesPlayed as double) GamesPlayed","cast(GoalAssists as double) GoalAssists","cast(Goals as double) Goals","cast(GoalsConceded as double) GoalsConceded","cast(GoalsConcededInsideBox as double) GoalsConcededInsideBox","cast(GoalsConcededOutsideBox as double) GoalsConcededOutsideBox","cast(GoalsfromInsideBox as double) GoalsfromInsideBox","cast(GoalsfromOutsideBox as double) GoalsfromOutsideBox","cast(HeadedGoals as double) HeadedGoals","cast(Interceptions as double) Interceptions","cast(KeyPasses_AttemptAssists as double) KeyPasses_AttemptAssists","cast(LeftFootGoals as double) LeftFootGoals","cast(LeftsidePasses as double) LeftsidePasses","cast(Offsides as double) Offsides","cast(OtherGoals as double) OtherGoals","cast(PassingAccuracy as double) PassingAccuracy","cast(Penalties as double) Penalties","cast(PenaltiesConceded as double) PenaltiesConceded","cast(PenaltiesSaved as double) PenaltiesSaved","cast(PenaltiesTaken as double) PenaltiesTaken","cast(PenaltyGoals as double) PenaltyGoals","cast(Punches as double) Punches","cast(PutThroughOrBlockedDistribution as double) PutThroughOrBlockedDistribution","cast(PutThroughOrBlockedDistributionWon as double) PutThroughOrBlockedDistributionWon","cast(Recoveries as double) Recoveries","cast(RedCardsBy2ndYellow as double) RedCardsBy2ndYellow","cast(RightFootGoals as double) RightFootGoals","cast(RightsidePasses as double) RightsidePasses","cast(SavesfromPenalty as double) SavesfromPenalty","cast(SavesMade as double) SavesMade","cast(SetPiecesGoals as double) SetPiecesGoals","cast(ShootingAccuracy as double) ShootingAccuracy","cast(ShotsOffTarget_IncWoodwork as double) ShotsOffTarget_IncWoodwork","cast(ShotsOnTarget_IncGoals as double) ShotsOnTarget_IncGoals","cast(Starts as double) Starts","cast(StraightRedCards as double) StraightRedCards","cast(SubstituteOff as double) SubstituteOff","cast(SubstituteOn as double) SubstituteOn","cast(SuccessfulCrosses_Corners as double) SuccessfulCrosses_Corners","cast(SuccessfulCrossesOpenplay as double)SuccessfulCrossesOpenplay","cast(SuccessfulDribbles as double) SuccessfulDribbles","cast(SuccessfulFiftyFifty as double) SuccessfulFiftyFifty","cast(SuccessfulLongPasses as double) SuccessfulLongPasses","cast(SuccessfulPassesOppositionHalf as double) SuccessfulPassesOppositionHalf","cast(SuccessfulPassesOwnHalf as double) SuccessfulPassesOwnHalf","cast(SuccessfulShortPasses as double) SuccessfulShortPasses","cast(TacklesLost as double) TacklesLost","cast(TacklesWon as double) TacklesWon","cast(Throughballs as double) Throughballs","cast(TimePlayed as double) TimePlayed","cast(TotalClearances as double) TotalClearances","cast(TotalFoulsConceded as double) TotalFoulsConceded","cast(TotalFoulsWon as double) TotalFoulsWon","cast(TotalPasses as double) TotalPasses","cast(TotalShots as double) TotalShots","cast(TotalSuccessfulPasses_ExclCrosses_Corners as double) TotalSuccessfulPasses_ExclCrosses_Corners","cast(UnsuccessfulCrosses_Corners as double) UnsuccessfulCrosses_Corners","cast(UnsuccessfulCrossesopenplay as double) UnsuccessfulCrossesopenplay","cast(UnsuccessfulLongPasses as double) UnsuccessfulLongPasses","cast(UnsuccessfulPassesOppositionHalf as double) UnsuccessfulPassesOppositionHalf","cast(UnsuccessfulPassesOwnHalf as double) UnsuccessfulPassesOwnHalf","cast(YellowCards as double) YellowCards","cast(AerialDuels as double) AerialDuels","cast(Duels as double) Duels","cast(TotalTackles as double) TotalTackles","cast(TotalRedCards as double) TotalRedCards")
    players1.printSchema


    //Data Cleaning and Preperation, 
    val players2 = players1.withColumn(
      "FullName", TupleUDFs.toTuple2[String, String].apply(players1("FirstName"), players1("LastName"))
     ).drop(players1.col("FirstName")).drop(players1.col("LastName")).drop(players1.col("File"))
     println("After Concatenation")
    val players3 = players2.selectExpr("cast(FullName as String) FullName","Position" , "TeamName" , "Season" , "AerialDuelslost" , "AerialDuelswon" , "Appearances" , "BackwardPasses" , "BlockedShots" , "Blocks" , "Catches" , "CleanSheets" , "ClearancesOfftheLine" , "CornersTaken" , "CornersWon" , "CrossesnotClaimed" , "Drops" , "Duelslost" , "Duelswon" , "FiftyFifty" , "ForwardPasses" , "FoulWonPenalty" , "GamesPlayed" , "GoalAssists" , "Goals" , "GoalsConceded" , "GoalsConcededInsideBox" , "GoalsConcededOutsideBox" , "GoalsfromInsideBox" , "GoalsfromOutsideBox" , "HeadedGoals" , "Interceptions" , "KeyPasses_AttemptAssists" , "LeftFootGoals" , "LeftsidePasses" , "Offsides" , "OtherGoals" , "PassingAccuracy" , "Penalties" , "PenaltiesConceded" , "PenaltiesSaved" , "PenaltiesTaken" , "PenaltyGoals" , "Punches" , "PutThroughOrBlockedDistribution" , "PutThroughOrBlockedDistributionWon" , "Recoveries" , "RedCardsBy2ndYellow" , "RightFootGoals" , "RightsidePasses" , "SavesfromPenalty" , "SavesMade" , "SetPiecesGoals" , "ShootingAccuracy" , "ShotsOffTarget_IncWoodwork" , "ShotsOnTarget_IncGoals" , "Starts" , "StraightRedCards" , "SubstituteOff" , "SubstituteOn" , "SuccessfulCrosses_Corners" , "SuccessfulCrossesOpenplay" , "SuccessfulDribbles" , "SuccessfulFiftyFifty" , "SuccessfulLongPasses" , "SuccessfulPassesOppositionHalf" , "SuccessfulPassesOwnHalf" , "SuccessfulShortPasses" , "TacklesLost" , "TacklesWon" , "Throughballs" , "TimePlayed" , "TotalClearances" , "TotalFoulsConceded" , "TotalFoulsWon" , "TotalPasses" , "TotalShots" , "TotalSuccessfulPasses_ExclCrosses_Corners" , "UnsuccessfulCrosses_Corners" , "UnsuccessfulCrossesopenplay" , "UnsuccessfulLongPasses" , "UnsuccessfulPassesOppositionHalf" , "UnsuccessfulPassesOwnHalf" , "YellowCards" , "AerialDuels" , "Duels" , "TotalTackles" , "TotalRedCards")
    players3.printSchema()
    players3.show()
    //Registering Table
    players3.registerTempTable ("players")
   
    //Spliting Players wrt their position
    val trimmed_midplay = sqlContext.sql("SELECT * FROM players WHERE Position = 'Midfielder' ")
    val trimmed_forplay = sqlContext.sql("SELECT * FROM players WHERE Position = 'Forward' ")
    val trimmed_defplay = sqlContext.sql("SELECT * FROM players WHERE Position = 'Defender' ")
    val trimmed_gkplay = sqlContext.sql("SELECT * FROM players WHERE Position = 'Goalkeeper' ")
    //Converting Categorical features to double
    val indexer = new StringIndexer()
    .setInputCol("FullName")
    .setOutputCol("label")


    val indexed = indexer.fit(trimmed_midplay).transform(trimmed_midplay)


    indexed.show()
    indexed.printSchema()
    
    val indexer1 = new StringIndexer()
    .setInputCol("TeamName")
    .setOutputCol("categoryTeam")
    val indexed1 = indexer1.fit(indexed).transform(indexed)
    
    indexed1.show()
    indexed1.printSchema()
    
    val indexer2 = new StringIndexer()
    .setInputCol("Position")
    .setOutputCol("categoryPosition")


    val indexed2 = indexer2.fit(indexed1).transform(indexed1)
    
    indexed2.show()
    indexed2.printSchema()
    
    //Dropping categorical features
    val indexed3 = indexed2.drop(indexed2.col("FullName")).drop(indexed2.col("Position")).drop(indexed2.col("TeamName"))
    indexed3.show()
    indexed3.printSchema()
    
    //creating a vector
    val assembler = new VectorAssembler()
    .setInputCols(Array("label","AerialDuelslost" , "AerialDuelswon" , "Appearances" , "BackwardPasses" , "BlockedShots" , "Blocks" , "Catches" , "CleanSheets" , "ClearancesOfftheLine" , "CornersTaken" , "CornersWon" , "CrossesnotClaimed" , "Drops" , "Duelslost" , "Duelswon" , "FiftyFifty" , "ForwardPasses" , "FoulWonPenalty" , "GamesPlayed" , "GoalAssists" , "Goals" , "GoalsConceded" , "GoalsConcededInsideBox" , "GoalsConcededOutsideBox" , "GoalsfromInsideBox" , "GoalsfromOutsideBox" , "HeadedGoals" , "Interceptions" , "KeyPasses_AttemptAssists" , "LeftFootGoals" , "LeftsidePasses" , "Offsides" , "OtherGoals" , "PassingAccuracy" , "Penalties" , "PenaltiesConceded" , "PenaltiesSaved" , "PenaltiesTaken" , "PenaltyGoals" , "Punches" , "PutThroughOrBlockedDistribution" , "PutThroughOrBlockedDistributionWon" , "Recoveries" , "RedCardsBy2ndYellow" , "RightFootGoals" , "RightsidePasses" , "SavesfromPenalty" , "SavesMade" , "SetPiecesGoals" , "ShootingAccuracy" , "ShotsOffTarget_IncWoodwork" , "ShotsOnTarget_IncGoals" , "Starts" , "StraightRedCards" , "SubstituteOff" , "SubstituteOn" , "SuccessfulCrosses_Corners" , "SuccessfulCrossesOpenplay" , "SuccessfulDribbles" , "SuccessfulFiftyFifty" , "SuccessfulLongPasses" , "SuccessfulPassesOppositionHalf" , "SuccessfulPassesOwnHalf" , "SuccessfulShortPasses" , "TacklesLost" , "TacklesWon" , "Throughballs" , "TimePlayed" , "TotalClearances" , "TotalFoulsConceded" , "TotalFoulsWon" , "TotalPasses" , "TotalShots" , "TotalSuccessfulPasses_ExclCrosses_Corners" , "UnsuccessfulCrosses_Corners" , "UnsuccessfulCrossesopenplay" , "UnsuccessfulLongPasses" , "UnsuccessfulPassesOppositionHalf" , "UnsuccessfulPassesOwnHalf" , "YellowCards" , "AerialDuels" , "Duels" , "TotalTackles" , "TotalRedCards"))
    .setOutputCol("features")
    
    val output = assembler.transform(indexed3)
    println(output.select("features").first())
    
    val realout = output.select("label","features")
    val label = 0;
   
    val labeled = realout.map(row => LabeledPoint(row.getDouble(0), row(4).asInstanceOf[Vector]))
    val splits = labeled.randomSplit(Array(0.6, 0.4), seed = 11L)
    val training = splits(0).cache()
    val test = splits(1)
    
    val model = new LogisticRegressionWithLBFGS()
  .setNumClasses(10)
  .run(training)
  val predictionAndLabels = test.map { case LabeledPoint(label, features) =>
   val prediction = model.predict(features) 
  (prediction, label)
}


    val metrics = new MulticlassMetrics(predictionAndLabels)
    val accuracy = metrics.precision
    println(s"Accuracy = $accuracy")
   sc.stop()
  }
}


[error] /home/abhi/Football_Prediction/src/main/scala/DataCleaning.scala:120: type mismatch;
[error]  found   : org.apache.spark.sql.Dataset[org.apache.spark.mllib.regression.LabeledPoint]
[error]  required: org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint]
[error]   .run(training)
[error]        ^
[error] /home/abhi/Football_Prediction/src/main/scala/DataCleaning.scala:121: ambiguous implicit values:
[error]  both method newIntEncoder in class SQLImplicits of type => org.apache.spark.sql.Encoder[Int]
[error]  and method newLongEncoder in class SQLImplicits of type => org.apache.spark.sql.Encoder[Long]
[error]  match expected type org.apache.spark.sql.Encoder[U]
[error]   val predictionAndLabels = test.map { case LabeledPoint(label, features) =>
[error]                                      ^
[error] two errors found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 2 s, completed Nov 10, 2016 11:39:13 P
5 REPLIES 5
Highlighted

Re: Error in executing Logistic Regression Algorithm in scala

@Abhishek Srinivas

First, it's nearly impossible to read the code you posted. If you copy and past the code inside of a code block (use the CODE) icon in the bar above the post entry, it should be much easier to read.

Second, what kind of problem are you experiencing?

Highlighted

Re: Error in executing Logistic Regression Algorithm in scala

New Contributor

Hello @Michael Young, i have formatted the question properly . i have added the error which i was facing.. !! Kindly help me out !!

Sorry for the earlier unformatted version !!

Highlighted

Re: Error in executing Logistic Regression Algorithm in scala

Super Guru

can you post some example data? Seems it can't tell if something is long or int.

Highlighted

Re: Error in executing Logistic Regression Algorithm in scala

Super Guru

what version of Scala? Spark? SBT? JDK? Can you post your build.sbt?

I was able to run this in the spark shell. Have you tried to run local, spark shell or zeppelin?

what version of databricks CSV?

Highlighted

Re: Error in executing Logistic Regression Algorithm in scala

New Contributor

@Timothy Spann Below i have posted some sample data on which i am working on,

File,FirstName,LastName,Position,TeamName,Season,AerialDuelslost,AerialDuelswon,Appearances,BackwardPasses,BlockedShots,Blocks,Catches,CleanSheets,ClearancesOfftheLine,CornersTaken,CornersWon,CrossesnotClaimed,Drops,Duelslost,Duelswon,FiftyFifty,ForwardPasses,FoulWonPenalty,GamesPlayed,GoalAssists,Goals,GoalsConceded,GoalsConcededInsideBox,GoalsConcededOutsideBox,GoalsfromInsideBox,GoalsfromOutsideBox,HeadedGoals,Interceptions,KeyPasses_AttemptAssists,LeftFootGoals,LeftsidePasses,Offsides,OtherGoals,PassingAccuracy,Penalties,PenaltiesConceded,PenaltiesSaved,PenaltiesTaken,PenaltyGoals,Punches,PutThroughOrBlockedDistribution,PutThroughOrBlockedDistributionWon,Recoveries,RedCardsBy2ndYellow,RightFootGoals,RightsidePasses,SavesfromPenalty,SavesMade,SetPiecesGoals,ShootingAccuracy,ShotsOffTarget_IncWoodwork,ShotsOnTarget_IncGoals,Starts,StraightRedCards,SubstituteOff,SubstituteOn,SuccessfulCrosses_Corners,SuccessfulCrossesOpenplay,SuccessfulDribbles,SuccessfulFiftyFifty,SuccessfulLongPasses,SuccessfulPassesOppositionHalf,SuccessfulPassesOwnHalf,SuccessfulShortPasses,TacklesLost,TacklesWon,Throughballs,TimePlayed,TotalClearances,TotalFoulsConceded,TotalFoulsWon,TotalPasses,TotalShots,TotalSuccessfulPasses_ExclCrosses_Corners,UnsuccessfulCrosses_Corners,UnsuccessfulCrossesopenplay,UnsuccessfulLongPasses,UnsuccessfulPassesOppositionHalf,UnsuccessfulPassesOwnHalf,YellowCards,AerialDuels,Duels,TotalTackles,TotalRedCards
PlayerData_2013_Redefined_CSV.csv,Julian,Baumgartlinger,Midfielder,1. FSV Mainz 05,2013,7,15,9,43,1,0,0,1,0,0,1,0,0,26,37,17,92,0,9,0,0,9,8,1,0,0,0,9,3,0,89,0,0,0,0,0,0,0,0,0,26,10,11,0,0,91,0,0,0,0,0,0,6,0,2,3,0,0,6,11,24,91,188,255,0,9,1,491,5,5,10,315,0,279,0,0,5,26,10,0,22,63,9,0
PlayerData_2013_Redefined_CSV.csv,Stefan,Bell,Defender,1. FSV Mainz 05,2013,68,130,27,73,4,18,0,8,1,0,3,0,0,107,194,69,499,0,27,4,0,39,32,7,0,0,0,53,7,0,326,2,0,0,0,1,0,0,0,0,83,41,47,0,0,104,0,0,0,0,11,2,25,0,1,2,3,3,9,46,152,229,539,613,10,35,0,2272,196,19,16,1002,13,765,2,2,103,173,66,5,198,301,45,0
PlayerData_2013_Redefined_CSV.csv,Niko,Bungert,Defender,1. FSV Mainz

I am working on SBT version of scala, Below is my .build file,

name := "Football_Prediction"


version := "V_1.0"


scalaVersion := "2.11.8"


val sparkVersion="2.0.1"


libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion,
  "org.apache.spark" %% "spark-sql" % sparkVersion,
  "org.apache.spark" %% "spark-mllib" % sparkVersion,
"com.databricks" %% "spark-csv" % "1.1.0"
)



From the build file, we can notice that i am using 1.1.0 version of CSV Databricks.

Don't have an account?
Coming from Hortonworks? Activate your account here