Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

pyspark issue

Highlighted

pyspark issue

Explorer
 
3 REPLIES 3
Highlighted

Re: pyspark issue

@hema moger

This would be a sample code to covert csv to json using pyspark.

df = spark.read.format("CSV").option("header","true").load("file:///tmp/sample.csv")
df.repartition(1).toJSON(use_unicode=True).saveAsTextFile("file:///tmp/sample_out")

Hope this helps.

Re: pyspark issue

@hema moger, Do accept this answer and close this thread if it helped in addressing your query.

Highlighted

Re: pyspark issue

@hema moger

In spark1.6, you can use databricks custom csv formatter to load csv into a data frame and write it to a json. You can read this readme to achieve that

https://github.com/databricks/spark-csv#csv-data-source-for-apache-spark-1x

In spark2+, spark itself providing a csv loader to create a data frame and write it to a whatever format (json, parquet and orc) you want

https://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader.csv

https://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter.json

Don't have an account?
Coming from Hortonworks? Activate your account here