We are using ConvertCSVToAvro and ConvertAvroToORC.
The clickstrem tsv files have " in them and the ConvertCSVtoAvro processor uses " as the value for the "CSV quote Character" processor configuration property by default. As a result many tabbed fields end up in the same record. We can get good output by changing this configuration property to another character that is not used in input files anywhere. We used ¥
So when use CSV related processor, double check the contents don't have the quote character.