Created 08-13-2017 09:20 PM
When I fetch 49 records at a time, 49 flow files converted to ORC go into the putHDFS processor but the result is one small file in HDFS. When I do a select count(*) there is only 1 record in the external table. There are no failed flow files coming out of the HDFS processor.
I included a mergecontent processor after the convert to orc and before put to HDFS, a bigger file of merged content is put into HDFS but that results in an error(see below) when I select count(*) from the external table. I suspect the merged file is corrupted, but putting the original 49 files into HDFS doesn’t appear to be working properly either. The External table is simply defined with fields STORED as ORC LOCATION ‘hdfspath’. I tried the DDL with and without orc.compress=’SNAPPY’, even removed all compression from any processor and the results were the same as the mergedcontent file throws the below error every time when selecting from the external table.
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1502140036873_0283_2_00, diagnostics=[Task failed, taskId=task_1502140036873_0283_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).
Caused by: java.lang.RuntimeException: java.io.IOException: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).
Created 08-14-2017 06:12 PM
Are there any failures in the PutHDFS processor? Seems to me (unless the flowfiles have the same filename and Conflict Resolution Strategy is "append") that you should have 49 small flow files in HDFS (not that that's ideal).
You won't be able to use MergeContent with ORC files as there is no strategy for that (same goes for MergeRecord until an OrcRecordSetWriter is implemented). If your flow files are Avro (going into ConvertAvroToORC), you could try MergeContent before ConvertAvroToORC and use the Avro merge strategy.
Created 08-14-2017 06:12 PM
Are there any failures in the PutHDFS processor? Seems to me (unless the flowfiles have the same filename and Conflict Resolution Strategy is "append") that you should have 49 small flow files in HDFS (not that that's ideal).
You won't be able to use MergeContent with ORC files as there is no strategy for that (same goes for MergeRecord until an OrcRecordSetWriter is implemented). If your flow files are Avro (going into ConvertAvroToORC), you could try MergeContent before ConvertAvroToORC and use the Avro merge strategy.
Created 08-14-2017 11:26 PM
I moved it to the AVRO and made sure the filenames are unique.