Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Reading/ analysing Json file with about 1TB size in Spark/ HDInisght Kafka cluster

Solved Go to solution
Highlighted

Reading/ analysing Json file with about 1TB size in Spark/ HDInisght Kafka cluster

New Contributor

I would like to analyze a big data (0.9 TB after unzipping) in a cluster with 14 nodes and 39 cores (Azure HDInsight/Kafka). But it's very slow. Here what I do:

 

1. Data is downloaded from here.

2. val data = spark.read.json(path) ---- it crashes. Data are stored in HDFS. 

3. val rdd = sc.textFile(path) ... then rdd.count() .... also crashes

4. rdd.take(10) , ... these are ok

5. It was not possible to unzip the file; I read it with data.json.gz

Any suggestion? How I can read it with json reader?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Reading/ analysing Json file with about 1TB size in Spark/ HDInisght Kafka cluster

Community Manager

@Maryam While we welcome your question, you would be much more likely to obtain a useful answer if you posted this to the the appropriate forum for Microsoft Azure Hdinsight.

 

 

Bill Brooks, Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

View solution in original post

1 REPLY 1

Re: Reading/ analysing Json file with about 1TB size in Spark/ HDInisght Kafka cluster

Community Manager

@Maryam While we welcome your question, you would be much more likely to obtain a useful answer if you posted this to the the appropriate forum for Microsoft Azure Hdinsight.

 

 

Bill Brooks, Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here