Member since
08-03-2016
2
Posts
0
Kudos Received
0
Solutions
08-04-2016
11:39 AM
Hi @vshukla Thanks for the answer. I don't intend to create a Linux or other machine to test as I prefer to stick with Windows. Here is the stack trace: 16/08/04 13:08:58 INFO SharedState: Warehouse path is '\\SERVER\Users\USER\STORAGE\Programming\IntelliJ\PropertyInvestmentCalcs\spark-warehouse'.
Exception in thread "main" java.io.IOException: No FileSystem for scheme: null
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeQualifiedPath(SessionCatalog.scala:115)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:145)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89)
at org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95)
at org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95)
at org.apache.spark.sql.internal.SessionState$anon$1.<init>(SessionState.scala:112)
at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112)
at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:111)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)
at PropertyInvestmentCalcs$.main(PropertyInvestmentCalcs.scala:27)
at PropertyInvestmentCalcs.main(PropertyInvestmentCalcs.scala) 16/08/04 13:09:00 INFO SparkContext: Invoking stop() from shutdown hook 16/08/04 13:09:00 INFO SparkUI: Stopped Spark web UI at http://192.168.1.100:4040 16/08/04 13:09:00 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/08/04 13:09:00 INFO MemoryStore: MemoryStore cleared 16/08/04 13:09:00 INFO BlockManager: BlockManager stopped 16/08/04 13:09:00 INFO BlockManagerMaster: BlockManagerMaster stopped 16/08/04 13:09:00 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/08/04 13:09:00 INFO SparkContext: Successfully stopped SparkContext 16/08/04 13:09:00 INFO ShutdownHookManager: Shutdown hook called 16/08/04 13:09:00 INFO ShutdownHookManager: Deleting directory C:\Users\USER\AppData\Local\Temp\spark-523c95ef-a46c-4b16-88c6-3da8f6f2a801 Does it give sufficient additional information? Thanks in advance for any help!
... View more
08-03-2016
01:26 PM
Hi After upgrading to Spark 2.0 recently I cannot seem to get something that relies on Spark SQL to run in Intelli J Idea in Windows. My code is provided below. import org.apache.spark.sql.SparkSession
object PropertyInvestmentCalcs {
def main(args: Array[String]) {
val spark = SparkSession
.builder()
.appName("Spark PropertyInvestmentCalcs")
.master("local[6]")
.config("spark.sql.warehouse.dir", "\\\\TJVRLAPTOP\\Users\\tjoha\\Google Drive\\Programming\\IntelliJ\\PropertyInvestmentCalcs\\spark-warehouse")
//.config("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
//.config("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)
.getOrCreate()
//val sqlContext = new org.apache.spark.sql.SparkSession(spark)
// Get number of data records in the table
val nrRecordsDF = spark.read.format("jdbc")
.option("url", "jdbc:mysql://localhost:3306/test")
.option("driver", "com.mysql.jdbc.Driver")
.option("dbtable", "(SELECT COUNT(*) AS nrRecords FROM test.propertydb) AS nrRecords_tmp")
.option("user", "tjohannvr")
.option("password", "[5010083]").load()
nrRecordsDF.show()
nrRecordsDF.printSchema()
val nrRecords = nrRecordsDF.head().getLong(0)
println("nrRecords = " + nrRecords)
// Select data from MySQL, with a specific number of records at a time
val NrRecordsAtATime = nrRecords * 0 + 100000
println("NrRecordsAtATime = " + NrRecordsAtATime)
spark.stop()
}
}
The error which occurs when trying to set nrRecordsDF is provided below: Exception in thread "main" java.io.IOException: No FileSystem for scheme: null Is it perhaps unable to load hadoop classes, specifically those related to HDFS. Why would this be the case? Below is the contents of my build.sbt file. import sbt._
name := "PropertyInvestmentCalcs"
version := "0.1.0-SNAPSHOT"
organization := "TJVR"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" //% "provided"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" //% "provided"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.0.0" //% "provided"
libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.39"
lazy val commonSettings = Seq(
version := "0.1-SNAPSHOT",
organization := "TJVR",
scalaVersion := "2.11.8"
)
lazy val app = (project in file("app")).
settings(commonSettings: _*).
settings(
// your settings here
)
artifact in (Compile, assembly) := {
val art = (artifact in (Compile, assembly)).value
art.copy(`classifier` = Some("assembly"))
}
addArtifact(artifact in (Compile, assembly), assembly)
Thanks in advance for any help.
... View more
Labels:
- Labels:
-
Apache Spark