Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Using Pig to Load Data into ORC

Super Guru

I have a script that loads data into an ORC table, seems I can only load TEXT data type. Are other data types supported?

Or is there a better way to load bulk CSV data and load into ORC tables.

Thanks.

1 ACCEPTED SOLUTION

@Timothy Spann

You can also use pig ORC storage and store file in ORC, afterward create external hive table and point it to orc directory.

https://pig.apache.org/docs/r0.15.0/func.html#OrcStorage

Data types

Most Orc data type has one to one mapping to Pig data type. Several exceptions are:

Loader side:

  • Orc STRING/CHAR/VARCHAR all map to Pig varchar
  • Orc BYTE/BINARY all map to Pig bytearray
  • Orc TIMESTAMP/DATE all maps to Pig datetime
  • Orc DECIMAL maps to Pig bigdecimal

Storer side:

  • Pig chararray maps to Orc STRING
  • Pig datetime maps to Orc TIMESTAMP
  • Pig bigdecimal/biginteger all map to Orc DECIMAL
  • Pig bytearray maps to Orc BINARY

View solution in original post

1 REPLY 1

@Timothy Spann

You can also use pig ORC storage and store file in ORC, afterward create external hive table and point it to orc directory.

https://pig.apache.org/docs/r0.15.0/func.html#OrcStorage

Data types

Most Orc data type has one to one mapping to Pig data type. Several exceptions are:

Loader side:

  • Orc STRING/CHAR/VARCHAR all map to Pig varchar
  • Orc BYTE/BINARY all map to Pig bytearray
  • Orc TIMESTAMP/DATE all maps to Pig datetime
  • Orc DECIMAL maps to Pig bigdecimal

Storer side:

  • Pig chararray maps to Orc STRING
  • Pig datetime maps to Orc TIMESTAMP
  • Pig bigdecimal/biginteger all map to Orc DECIMAL
  • Pig bytearray maps to Orc BINARY