Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Why my file size getting increase after writing it into outputstream ?


Why my file size getting increase after writing it into outputstream ?

New Contributor

I am writing a custom processor to store files into azure data lake after reading LAS file and using its well name as a directory name to store those wells data into it. In these files PDF, TIF, PAS, and LAS files are there. If I'm using "String input = IOUtils.toString(in)" then I can only read LAS file after downloading. For this problem I have used ByteArrayOutputStream and able to read files like pdf. But hereafter one or two files processing files size start incrementing 2 or 3 times of original size and I'm getting stored those files in azure data lake. What I will do with this problem and even some TIF file is not opened after downloading.

Even I'm not able to read bytes after converting it into string. I have used two approach.

String input = new String(bytes, "UTF-8");

String input = IOUtils.toString(bytes, "UTF-8");

Code snippets : -

ByteArrayOutputStream bos = new ByteArrayOutputStream();

for (int readNum; (readNum = != -1;) {
		bos.write(buf, 0, readNum);
	System.out.println("read " + readNum + " bytes,");
byte[] bytes = bos.toByteArray();
String file = dir + "/" + filenameList.get(i);
stream = client.createFile(file, IfExists.OVERWRITE);//here client is a AzureDataLakeAcessClient class instance variable
stream.write(inputList.get(i), 0, inputList.get(i).length);
Don't have an account?
Coming from Hortonworks? Activate your account here