Reply
Highlighted
New Contributor
Posts: 1
Registered: ‎05-23-2017

Problem due to period (DOT) in column name (Apache Pig)

Hi,

 

I am new to Apache PIG. I am trying to load a parquet file in PIG. The schema of the parquet file looks like this:

message events {

  optional binary d.ingestor.year;

  optional binary d.ingestor.month;

  optional binary d.ingestor.day;

}

 

Then, I try to load it in as follows:

A = LOAD '/tmp/myFiles/' USING parquet.pig.ParquetLoader('d.ingestor.month: chararray');

 

I get an error due to the period in the column name.

Error: 

mismatched input '.' expecting EOF

 

I also tried the following but got the same error:

A = LOAD '/tmp/myFiles/' USING parquet.pig.ParquetLoader();

B = FOREACH A GENERATE $1 as month;

DUMP B;

 

I tried escaping the '.' with '\' and '\\' but to no avail.

 

Please let me know if you a workaround. Thanks!

 

Announcements