Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Problem due to period (DOT) in column name (Apache Pig)

Highlighted

Problem due to period (DOT) in column name (Apache Pig)

New Contributor

Hi,

 

I am new to Apache PIG. I am trying to load a parquet file in PIG. The schema of the parquet file looks like this:

message events {

  optional binary d.ingestor.year;

  optional binary d.ingestor.month;

  optional binary d.ingestor.day;

}

 

Then, I try to load it in as follows:

A = LOAD '/tmp/myFiles/' USING parquet.pig.ParquetLoader('d.ingestor.month: chararray');

 

I get an error due to the period in the column name.

Error: 

mismatched input '.' expecting EOF

 

I also tried the following but got the same error:

A = LOAD '/tmp/myFiles/' USING parquet.pig.ParquetLoader();

B = FOREACH A GENERATE $1 as month;

DUMP B;

 

I tried escaping the '.' with '\' and '\\' but to no avail.

 

Please let me know if you a workaround. Thanks!