Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

pyspark to scala

pyspark to scala

Explorer

hi guys i have a code written in pyspark any help to run it under scala thanks it is urgent please thanks

  1. from functools import reduce
  2. files =["/tmp/test_1.csv","/tmp/test_2.csv","/tmp/test_3.csv"]
  3. df = reduce(lambda x,y: x.unionAll(y),
  4. [sqlContext.read.format('com.databricks.spark.csv')
  5. .load(f, header="true", inferSchema="true")
  6. for f in files])
  7. df.show()
1 REPLY 1
Highlighted

Re: pyspark to scala

@Maher Hattabi

You should be able to directly read in multiple files as part of the sqlContext.read statement, as shown below:

import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/tmp/test_1.csv","/tmp/test_2.csv","/tmp/test_3.csv")

df.show()

If you are using Spark 2.0 or newer, this is the preferred syntax (using the spark context):

val df = spark.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/tmp/test_1.csv","/tmp/test_2.csv","/tmp/test_3.csv")

df.show()

Please let me know if this helps.

Don't have an account?
Coming from Hortonworks? Activate your account here