Support Questions

tjohannvr · ‎08-03-2016

Hi

After upgrading to Spark 2.0 recently I cannot seem to get something that relies on Spark SQL to run in Intelli J Idea in Windows. My code is provided below.

import org.apache.spark.sql.SparkSession

object PropertyInvestmentCalcs {
  def main(args: Array[String]) {

    val spark = SparkSession
      .builder()
      .appName("Spark PropertyInvestmentCalcs")
      .master("local[6]")
      .config("spark.sql.warehouse.dir", "\\\\TJVRLAPTOP\\Users\\tjoha\\Google Drive\\Programming\\IntelliJ\\PropertyInvestmentCalcs\\spark-warehouse")
      //.config("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
      //.config("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)
      .getOrCreate()

    //val sqlContext = new org.apache.spark.sql.SparkSession(spark)

    // Get number of data records in the table
    val nrRecordsDF = spark.read.format("jdbc")
      .option("url", "jdbc:mysql://localhost:3306/test")
      .option("driver", "com.mysql.jdbc.Driver")
      .option("dbtable", "(SELECT COUNT(*) AS nrRecords FROM test.propertydb) AS nrRecords_tmp")
      .option("user", "tjohannvr")
      .option("password", "[5010083]").load()
    nrRecordsDF.show()
    nrRecordsDF.printSchema()
    val nrRecords = nrRecordsDF.head().getLong(0)
    println("nrRecords = " + nrRecords)

    // Select data from MySQL, with a specific number of records at a time
    val NrRecordsAtATime = nrRecords * 0 + 100000
    println("NrRecordsAtATime = " + NrRecordsAtATime)

    spark.stop()

}
}

The error which occurs when trying to set nrRecordsDF is provided below:

Exception in thread "main" java.io.IOException: No FileSystem for scheme: null

Is it perhaps unable to load hadoop classes, specifically those related to HDFS. Why would this be the case?

Below is the contents of my build.sbt file.

import sbt._

name := "PropertyInvestmentCalcs"

version := "0.1.0-SNAPSHOT"

organization := "TJVR"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" //% "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" //% "provided"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.0.0" //% "provided"
libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.39"

lazy val commonSettings = Seq(
  version := "0.1-SNAPSHOT",
  organization := "TJVR",
  scalaVersion := "2.11.8"
)

lazy val app = (project in file("app")).
  settings(commonSettings: _*).
  settings(
    // your settings here
  )

artifact in (Compile, assembly) := {
  val art = (artifact in (Compile, assembly)).value
  art.copy(`classifier` = Some("assembly"))
}

addArtifact(artifact in (Compile, assembly), assembly)

Thanks in advance for any help.

vshukla · ‎08-03-2016

Try enabling Spark debug and if possible get complete stack trace. Does the error only happen on Windows?

tjohannvr · ‎08-04-2016

Hi @vshukla

Thanks for the answer. I don't intend to create a Linux or other machine to test as I prefer to stick with Windows.

Here is the stack trace:

16/08/04 13:08:58 INFO SharedState: Warehouse path is '\\SERVER\Users\USER\STORAGE\Programming\IntelliJ\PropertyInvestmentCalcs\spark-warehouse'. Exception in thread "main" java.io.IOException: No FileSystem for scheme: null at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeQualifiedPath(SessionCatalog.scala:115) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:145) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89) at org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95) at org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95) at org.apache.spark.sql.internal.SessionState$anon$1.<init>(SessionState.scala:112) at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112) at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:111) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122) at PropertyInvestmentCalcs$.main(PropertyInvestmentCalcs.scala:27) at PropertyInvestmentCalcs.main(PropertyInvestmentCalcs.scala)

16/08/04 13:09:00 INFO SparkContext: Invoking stop() from shutdown hook

16/08/04 13:09:00 INFO SparkUI: Stopped Spark web UI at http://192.168.1.100:4040

16/08/04 13:09:00 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 16/08/04 13:09:00 INFO MemoryStore: MemoryStore cleared

16/08/04 13:09:00 INFO BlockManager: BlockManager stopped

16/08/04 13:09:00 INFO BlockManagerMaster: BlockManagerMaster stopped

16/08/04 13:09:00 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16/08/04 13:09:00 INFO SparkContext: Successfully stopped SparkContext

16/08/04 13:09:00 INFO ShutdownHookManager: Shutdown hook called

16/08/04 13:09:00 INFO ShutdownHookManager: Deleting directory C:\Users\USER\AppData\Local\Temp\spark-523c95ef-a46c-4b16-88c6-3da8f6f2a801

Does it give sufficient additional information?

Thanks in advance for any help!

Cloudera Community

Support Questions

Intelli J IDEA: No FileSystem for scheme: null issue occurs since upgrading to Spark 2.0