Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: The Cloudera Community will undergo maintenance on Saturday, August 17 at 12:00am PDT. See more info here.

Problem due to period (DOT) in column name (Apache Pig)

Highlighted

Problem due to period (DOT) in column name (Apache Pig)

New Contributor

Hi,

 

I am new to Apache PIG. I am trying to load a parquet file in PIG. The schema of the parquet file looks like this:

message events {

  optional binary d.ingestor.year;

  optional binary d.ingestor.month;

  optional binary d.ingestor.day;

}

 

Then, I try to load it in as follows:

A = LOAD '/tmp/myFiles/' USING parquet.pig.ParquetLoader('d.ingestor.month: chararray');

 

I get an error due to the period in the column name.

Error: 

mismatched input '.' expecting EOF

 

I also tried the following but got the same error:

A = LOAD '/tmp/myFiles/' USING parquet.pig.ParquetLoader();

B = FOREACH A GENERATE $1 as month;

DUMP B;

 

I tried escaping the '.' with '\' and '\\' but to no avail.

 

Please let me know if you a workaround. Thanks!