Member since
01-20-2016
1
Post
0
Kudos Received
0
Solutions
02-28-2016
03:09 AM
Hi , Starting from Hive 0.14 and Avro 1.7.1 (CDH 5.x.x) support for DECIMAL datatype has been added. Please refer the jira below for details https://issues.apache.org/jira/browse/AVRO-1402 Following is the procedure to do the conversion. Change the schema as per your requirements, I am just outlining the way by which I did the transformation. Step1: This is my test table with a column "c" in double datatype show create table test1; +-------------------------------------------------------------------------+--+ | createtab_stmt | +-------------------------------------------------------------------------+--+ | CREATE TABLE `test1`( | | `a` int COMMENT 'from deserializer', | | `b` int COMMENT 'from deserializer', | | `c` double COMMENT 'from deserializer') | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' | | LOCATION | | 'hdfs://nightly54-1.vpc.cloudera.com:8020/user/hive/warehouse/test1' | | TBLPROPERTIES ( | | 'COLUMN_STATS_ACCURATE'='true', | | 'numFiles'='1', | | 'numRows'='1', | | 'rawDataSize'='0', | | 'totalSize'='306', | | 'transient_lastDdlTime'='1456655270') | +-------------------------------------------------------------------------+--+ Step2: The header in avro data files in test1 table looks as follows a. java -jar /usr/lib/avro/avro-tools-1.7.6-cdh5.4.10-SNAPSHOT.jar getschema 000000_0 { "type" : "record", "name" : "test1", "namespace" : "default", "fields" : [ { "name" : "a", "type" : [ "null", "int" ], "doc" : "\u0000\u0000", "default" : null }, { "name" : "b", "type" : [ "null", "int" ], "default" : null }, { "name" : "c", "type" : [ "null", "double" ], "default" : null } ] } Step3: For hive to read avro data with decimal datatype, the header in the avro datafile should look as follows { "type" : "record", "name" : "test1", "namespace" : "default", "fields" : [ { "name" : "a", "type" : [ "null", "int" ], "doc" : "\u0000\u0000", "default" : null }, { "name" : "b", "type" : [ "null", "int" ], "default" : null }, { "name" : "c", "type" : [ "null", { "type" : "bytes", "logicalType" : "decimal", "precision" : 15, "scale" : 3 } ], "default" : null } ] } Hive serde reads the actual data file's header to read data so when you create external table by referring the schema file WITH SERDEPROPERTIES ('avro.schema.url'='/path-to-decimal-datatype-schema-file') it won't workout and you will be receiving errors saying that Error: java.io.IOException: org.apache.avro.AvroTypeException: Found double, expecting union (state=,code=0) Step4: So we need to create another table with desired decimal datatype and import the content of original table to new temp table, so that data with appropriate header for decimal datatype gets created a. create table temp(a int,b int,c decimal(15,3))stored as avro; //* which will create schema in avro file as mentioned in step3 *// b. insert overwrite table temp as select * from test1; Now we have the data in decimal datatype in avro format and can be confirmed with describe statement and also by looking into the data stored. Thanks
... View more