Support Questions

Find answers, ask questions, and share your expertise

Using Pig to Load Data into ORC

avatar
Master Guru

I have a script that loads data into an ORC table, seems I can only load TEXT data type. Are other data types supported?

Or is there a better way to load bulk CSV data and load into ORC tables.

Thanks.

1 ACCEPTED SOLUTION

avatar
Super Guru
@Timothy Spann

You can also use pig ORC storage and store file in ORC, afterward create external hive table and point it to orc directory.

https://pig.apache.org/docs/r0.15.0/func.html#OrcStorage

Data types

Most Orc data type has one to one mapping to Pig data type. Several exceptions are:

Loader side:

  • Orc STRING/CHAR/VARCHAR all map to Pig varchar
  • Orc BYTE/BINARY all map to Pig bytearray
  • Orc TIMESTAMP/DATE all maps to Pig datetime
  • Orc DECIMAL maps to Pig bigdecimal

Storer side:

  • Pig chararray maps to Orc STRING
  • Pig datetime maps to Orc TIMESTAMP
  • Pig bigdecimal/biginteger all map to Orc DECIMAL
  • Pig bytearray maps to Orc BINARY

View solution in original post

1 REPLY 1

avatar
Super Guru
@Timothy Spann

You can also use pig ORC storage and store file in ORC, afterward create external hive table and point it to orc directory.

https://pig.apache.org/docs/r0.15.0/func.html#OrcStorage

Data types

Most Orc data type has one to one mapping to Pig data type. Several exceptions are:

Loader side:

  • Orc STRING/CHAR/VARCHAR all map to Pig varchar
  • Orc BYTE/BINARY all map to Pig bytearray
  • Orc TIMESTAMP/DATE all maps to Pig datetime
  • Orc DECIMAL maps to Pig bigdecimal

Storer side:

  • Pig chararray maps to Orc STRING
  • Pig datetime maps to Orc TIMESTAMP
  • Pig bigdecimal/biginteger all map to Orc DECIMAL
  • Pig bytearray maps to Orc BINARY