Support Questions

mathfish · ‎05-17-2017

I was reading that Bzip2 is a good compression format to use since it is splittable so i was trying to write a basic java program to take in a .txt file and write it to Hdfs compressed using bzip2.

Here is my program:

But I am getting this stack trace when I run:(First arg is location of file, second is where to put compressed file in Hdfs, and the last arg is a boolean saying to compress)

I checked the io.compression.codecs property in core-site.xml and that doesn't seem to have bzip2 listed:

I tried adding it via the configuration.set() method in my java program but that did not work. I also tried setting the io.native.lib.available property through configuration.set to false and that did not work.

Does Hdp Sandbox not come with bzip2?

Thanks for the help.

mathfish · ‎05-19-2017

So after messing around it seems the correct way to do this, or at least the way I figured out how to do this, is to obtain the codec via the CompressionCodecFactory and invoking the method getCodecByClassName("org.apache.hadoop.io.compress.BZip2Codec").

vancampk · ‎09-19-2017

If you're using spark you can do this directly:

mydataset.write().option("compression","bzip2").text(filePath);

denis_arnaud_ho · ‎03-06-2019

The codec should be associated to the Hadoop configuration. In Scala:

val hadoopConfig = new org.apache.hadoop.conf.Configuration()
val hdfs = org.apache.hadoop.fs.FileSystem.get (hadoopConfig)
val bzCodec = new org.apache.hadoop.io.compress.BZip2Codec()
bzCodec.setConf (hadoopConfig)
val outputFile = hdfs.create (new org.apache.hadoop.fs.Path (uriDest))
val outputStream = bzCodec.createOutputStream (outputFile)

Cloudera Community

Support Questions

[Solved]How To Compress Using Bzip2