Reply
New Contributor
Posts: 5
Registered: ‎09-16-2018

Parquet imports in exercise 3

[ Edited ]

Hello,

In the exercise 3 of the tutorial, there are following imports:

 

import org.apache.hadoop.mapreduce.Job
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
import org.apache.avro.generic.GenericRecord
import parquet.hadoop.ParquetInputFormat
import parquet.avro.AvroReadSupport
import org.apache.spark.rdd.RDD

 

I would like to know what are these:

import parquet.hadoop.ParquetInputFormat
import parquet.avro.AvroReadSupport

 

Wherever I look all spark/hadoop related libraries have org.apache prefix. Is there something special in those 2 libs? Are they treated differently by Cloudera

New Contributor
Posts: 5
Registered: ‎09-16-2018

Re: Parquet imports in exercise 3

OK, apparently I have posted my question too quickly. I have finaly found some packages which start with parquet, like parquet.avro. For example here https://github.com/stripe/parquet-mr/blob/master/parquet-avro/src/main/java/parquet/avro/AvroReadSup....

BTW - I might be doing something wrong during my googling, but I cannot find a consistent, javadoc or whatever other format, documentation on parquet modules. Sure, there is a page parquet.apache.org and there is some general documentation there. But I cannot see any API documentation. Take AvroReadSupport class for example: is there a place where official API documentation is available?

Announcements