Created 08-03-2016 01:26 PM
Hi
After upgrading to Spark 2.0 recently I cannot seem to get something that relies on Spark SQL to run in Intelli J Idea in Windows. My code is provided below.
import org.apache.spark.sql.SparkSession object PropertyInvestmentCalcs { def main(args: Array[String]) { val spark = SparkSession .builder() .appName("Spark PropertyInvestmentCalcs") .master("local[6]") .config("spark.sql.warehouse.dir", "\\\\TJVRLAPTOP\\Users\\tjoha\\Google Drive\\Programming\\IntelliJ\\PropertyInvestmentCalcs\\spark-warehouse") //.config("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName) //.config("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName) .getOrCreate() //val sqlContext = new org.apache.spark.sql.SparkSession(spark) // Get number of data records in the table val nrRecordsDF = spark.read.format("jdbc") .option("url", "jdbc:mysql://localhost:3306/test") .option("driver", "com.mysql.jdbc.Driver") .option("dbtable", "(SELECT COUNT(*) AS nrRecords FROM test.propertydb) AS nrRecords_tmp") .option("user", "tjohannvr") .option("password", "[5010083]").load() nrRecordsDF.show() nrRecordsDF.printSchema() val nrRecords = nrRecordsDF.head().getLong(0) println("nrRecords = " + nrRecords) // Select data from MySQL, with a specific number of records at a time val NrRecordsAtATime = nrRecords * 0 + 100000 println("NrRecordsAtATime = " + NrRecordsAtATime) spark.stop() } }
The error which occurs when trying to set nrRecordsDF is provided below:
Exception in thread "main" java.io.IOException: No FileSystem for scheme: null
Is it perhaps unable to load hadoop classes, specifically those related to HDFS. Why would this be the case?
Below is the contents of my build.sbt file.
import sbt._ name := "PropertyInvestmentCalcs" version := "0.1.0-SNAPSHOT" organization := "TJVR" scalaVersion := "2.11.8" libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" //% "provided" libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" //% "provided" libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.0.0" //% "provided" libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.39" lazy val commonSettings = Seq( version := "0.1-SNAPSHOT", organization := "TJVR", scalaVersion := "2.11.8" ) lazy val app = (project in file("app")). settings(commonSettings: _*). settings( // your settings here ) artifact in (Compile, assembly) := { val art = (artifact in (Compile, assembly)).value art.copy(`classifier` = Some("assembly")) } addArtifact(artifact in (Compile, assembly), assembly)
Thanks in advance for any help.
Created 08-03-2016 04:29 PM
Try enabling Spark debug and if possible get complete stack trace. Does the error only happen on Windows?
Created 08-04-2016 11:39 AM
Hi @vshukla
Thanks for the answer. I don't intend to create a Linux or other machine to test as I prefer to stick with Windows.
Here is the stack trace:
16/08/04 13:08:58 INFO SharedState: Warehouse path is '\\SERVER\Users\USER\STORAGE\Programming\IntelliJ\PropertyInvestmentCalcs\spark-warehouse'. Exception in thread "main" java.io.IOException: No FileSystem for scheme: null at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeQualifiedPath(SessionCatalog.scala:115) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:145) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89) at org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95) at org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95) at org.apache.spark.sql.internal.SessionState$anon$1.<init>(SessionState.scala:112) at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112) at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:111) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122) at PropertyInvestmentCalcs$.main(PropertyInvestmentCalcs.scala:27) at PropertyInvestmentCalcs.main(PropertyInvestmentCalcs.scala)
16/08/04 13:09:00 INFO SparkContext: Invoking stop() from shutdown hook
16/08/04 13:09:00 INFO SparkUI: Stopped Spark web UI at http://192.168.1.100:4040
16/08/04 13:09:00 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 16/08/04 13:09:00 INFO MemoryStore: MemoryStore cleared
16/08/04 13:09:00 INFO BlockManager: BlockManager stopped
16/08/04 13:09:00 INFO BlockManagerMaster: BlockManagerMaster stopped
16/08/04 13:09:00 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16/08/04 13:09:00 INFO SparkContext: Successfully stopped SparkContext
16/08/04 13:09:00 INFO ShutdownHookManager: Shutdown hook called
16/08/04 13:09:00 INFO ShutdownHookManager: Deleting directory C:\Users\USER\AppData\Local\Temp\spark-523c95ef-a46c-4b16-88c6-3da8f6f2a801
Does it give sufficient additional information?
Thanks in advance for any help!