Created 06-19-2018 04:01 PM
Hi, I am using nifi to load data to hive from sqlserver. I am using convertavrotoOrc to load data in orc format via puthdfs and puthiveql. Sqlserver datatypes of decimal are converted into binary datatype by processor and data is not readable. How I can correct this.
Created 06-19-2018 04:47 PM
ConvertAvroToORC is in the Hive bundle which uses Avro 1.7.7, which does not support logical types such as decimal. This is discussed in NIFI-5079, where it was decided to add support via a PutORC processor in the upcoming Hive 3 bundle (slated for NiFi 1.7.0 and HDF 3.2).
If you are using a version of NiFi prior to 1.6.0, then upgrading may help solve the original conversion issue when fetching from SQLServer (via NIFI-4846). However if you're converting to ORC you will still run into the issue above. A workaround might be to store the Avro directly to HDFS and put the Hive table atop the Avro data vs ORC data.
Created 06-19-2018 04:47 PM
ConvertAvroToORC is in the Hive bundle which uses Avro 1.7.7, which does not support logical types such as decimal. This is discussed in NIFI-5079, where it was decided to add support via a PutORC processor in the upcoming Hive 3 bundle (slated for NiFi 1.7.0 and HDF 3.2).
If you are using a version of NiFi prior to 1.6.0, then upgrading may help solve the original conversion issue when fetching from SQLServer (via NIFI-4846). However if you're converting to ORC you will still run into the issue above. A workaround might be to store the Avro directly to HDFS and put the Hive table atop the Avro data vs ORC data.
Created 06-19-2018 05:04 PM
if we follow your approach , we can then use INSERT INTO SELECT to load from Avro table to ORC table.? reason being in some cases like ACID in Hive only works in ORC format.
Created 06-19-2018 06:17 PM
Yes I think that will work. Also if you convert from decimal to a different type then you should be able to use PutHiveStreaming, although that isn't always as performant as it could be. In the upcoming Hive 3 bundle, there is a new Streaming API and PutHive3Streaming should be much faster (and Avro logical types are supported).
Created 06-19-2018 10:07 PM
Thanks @Matt Burgess and @Saikrishna Tarapareddy. I converted upstream decimal to string to get it correctly in incremental ACID mode into hive table.