Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Parquet imports in exercise 3

Highlighted

Parquet imports in exercise 3

New Contributor

Hello,

In the exercise 3 of the tutorial, there are following imports:

 

import org.apache.hadoop.mapreduce.Job
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
import org.apache.avro.generic.GenericRecord
import parquet.hadoop.ParquetInputFormat
import parquet.avro.AvroReadSupport
import org.apache.spark.rdd.RDD

 

I would like to know what are these:

import parquet.hadoop.ParquetInputFormat
import parquet.avro.AvroReadSupport

 

Wherever I look all spark/hadoop related libraries have org.apache prefix. Is there something special in those 2 libs? Are they treated differently by Cloudera

1 REPLY 1

Re: Parquet imports in exercise 3

New Contributor

OK, apparently I have posted my question too quickly. I have finaly found some packages which start with parquet, like parquet.avro. For example here https://github.com/stripe/parquet-mr/blob/master/parquet-avro/src/main/java/parquet/avro/AvroReadSup....

BTW - I might be doing something wrong during my googling, but I cannot find a consistent, javadoc or whatever other format, documentation on parquet modules. Sure, there is a page parquet.apache.org and there is some general documentation there. But I cannot see any API documentation. Take AvroReadSupport class for example: is there a place where official API documentation is available?