Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

pyspark issue

Explorer
 
3 REPLIES 3

@hema moger

This would be a sample code to covert csv to json using pyspark.

df = spark.read.format("CSV").option("header","true").load("file:///tmp/sample.csv")
df.repartition(1).toJSON(use_unicode=True).saveAsTextFile("file:///tmp/sample_out")

Hope this helps.

@hema moger, Do accept this answer and close this thread if it helped in addressing your query.

@hema moger

In spark1.6, you can use databricks custom csv formatter to load csv into a data frame and write it to a json. You can read this readme to achieve that

https://github.com/databricks/spark-csv#csv-data-source-for-apache-spark-1x

In spark2+, spark itself providing a csv loader to create a data frame and write it to a whatever format (json, parquet and orc) you want

https://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader.csv

https://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter.json

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.