Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

pyspark to scala

Highlighted

pyspark to scala

Explorer

hi guys i have a code written in pyspark any help to run it under scala thanks it is urgent please thanks

  1. from functools import reduce
  2. files =["/tmp/test_1.csv","/tmp/test_2.csv","/tmp/test_3.csv"]
  3. df = reduce(lambda x,y: x.unionAll(y),
  4. [sqlContext.read.format('com.databricks.spark.csv')
  5. .load(f, header="true", inferSchema="true")
  6. for f in files])
  7. df.show()
1 REPLY 1

Re: pyspark to scala

@Maher Hattabi

You should be able to directly read in multiple files as part of the sqlContext.read statement, as shown below:

import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/tmp/test_1.csv","/tmp/test_2.csv","/tmp/test_3.csv")

df.show()

If you are using Spark 2.0 or newer, this is the preferred syntax (using the spark context):

val df = spark.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load("/tmp/test_1.csv","/tmp/test_2.csv","/tmp/test_3.csv")

df.show()

Please let me know if this helps.

Don't have an account?
Coming from Hortonworks? Activate your account here