Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to compress bzip2 format and insert into hive

avatar

hi,

I am trying to insert my dataframe using orc and bzip2 compression but it is throwing me the error

java.lang.IllegalArgumentException: Codec [bzip2] is not available. Available codecs are uncompressed, lzo, snappy, zlib, none.
  at org.apache.spark.sql.hive.orc.OrcOptions.<init>(OrcOptions.scala:49)
  at org.apache.spark.sql.hive.orc.OrcOptions.<init>(OrcOptions.scala:32)
  at org

My code is

fields.write.format("orc").option("compression","bzip2").saveAsTable("prasadtest.descargatest")

I am using spark 2 for this.
3 REPLIES 3

avatar

Hi Prasad

You need to import the BZip2Codec class in your code. Simply add the following line to your code and it should work fine.

import org.apache.hadoop.io.compress.BZip2Codec;

avatar
Expert Contributor

Hi, @prasad raju

Unfortunately, ORC doesn't support BZip2, so Hive and Spark doesn't.

- ORC Source Code

- HIVE-5067

avatar
Master Guru

Use Snappyas your compression