Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Using Pig to Load Data into ORC

avatar
Master Guru

I have a script that loads data into an ORC table, seems I can only load TEXT data type. Are other data types supported?

Or is there a better way to load bulk CSV data and load into ORC tables.

Thanks.

1 ACCEPTED SOLUTION

avatar
Super Guru
@Timothy Spann

You can also use pig ORC storage and store file in ORC, afterward create external hive table and point it to orc directory.

https://pig.apache.org/docs/r0.15.0/func.html#OrcStorage

Data types

Most Orc data type has one to one mapping to Pig data type. Several exceptions are:

Loader side:

  • Orc STRING/CHAR/VARCHAR all map to Pig varchar
  • Orc BYTE/BINARY all map to Pig bytearray
  • Orc TIMESTAMP/DATE all maps to Pig datetime
  • Orc DECIMAL maps to Pig bigdecimal

Storer side:

  • Pig chararray maps to Orc STRING
  • Pig datetime maps to Orc TIMESTAMP
  • Pig bigdecimal/biginteger all map to Orc DECIMAL
  • Pig bytearray maps to Orc BINARY

View solution in original post

1 REPLY 1

avatar
Super Guru
@Timothy Spann

You can also use pig ORC storage and store file in ORC, afterward create external hive table and point it to orc directory.

https://pig.apache.org/docs/r0.15.0/func.html#OrcStorage

Data types

Most Orc data type has one to one mapping to Pig data type. Several exceptions are:

Loader side:

  • Orc STRING/CHAR/VARCHAR all map to Pig varchar
  • Orc BYTE/BINARY all map to Pig bytearray
  • Orc TIMESTAMP/DATE all maps to Pig datetime
  • Orc DECIMAL maps to Pig bigdecimal

Storer side:

  • Pig chararray maps to Orc STRING
  • Pig datetime maps to Orc TIMESTAMP
  • Pig bigdecimal/biginteger all map to Orc DECIMAL
  • Pig bytearray maps to Orc BINARY