About shepheardt

shepheardt · ‎05-18-2018

I'm using Nifi 1.5. I don't have the create statement for the source (it's an oracle view) but the field in question is a TIMESTAMP(6) in the source DB. The target DB is Impala; the create table for that is create table test ( id STRING, my_value DOUBLE, time_inserted TIMESTAMP, time_updated TIMESTAMP) STORED AS PARQUET LOCATION '/blah/test'; As the Avro input fields are in uppercase, the PutDatabaseRecord processor is using Translate Field Names and I've also set it to quote both column and table identifiers. Unmatched fields/columns are set to ignore. I'll have a try later with 1.6 and see if that solves it, or if not will have a go at a 1.7 build. In the meantime I've split it out into two separate QueryDatabaseTable processors, one for null and one for not null (luckily time_updated is the only nullable timestamp!) Thanks, Toby

shepheardt · ‎05-17-2018

I'm attempting to write to Implala using PutDatabaseRecord. Everything works fine until I hit a null timestamp, and it tries to cast it to a varchar(19) instead of a timestamp. The data input is from an Oracle DB via QueryDatabaseTable. This generates the following avro embedded schema: {"type":"record","name":"BLAH","namespace":"TEST","fields":[{"name":"ID","type":["null","string"]},{"name":"MY_VALUE","type":["null",{"type":"bytes","logicalType":"decimal","precision":15,"scale":4}]},{"name":"TIME_INSERTED","type":["null",{"type":"long","logicalType":"timestamp-millis"}]},{"name":"TIME_UPDATED","type":["null",{"type":"long","logicalType":"timestamp-millis"}]}]} The actual data is then: [ { "ID" : "71416037", "MY_VALUE" : 58.75, "TIME_INSERTED" : "2018-05-11T22:17:44.000Z", "TIME_UPDATED" : null }, { "ID" : "71416525", "MY_VALUE" : 267.5, "TIME_INSERTED" : "2018-05-11T22:18:25.000Z", "TIME_UPDATED" : "2018-05-15T22:09:37.385Z" } ] Taken in isolation, the second entry with no null values works fine, and the first entry with null works fine, but when trying with both together it fails. Looking at the generated SQL, it appears it's trying to cast the timestamp to a char(19) rather than a timestamp. What's a little confusing is the error talks about the expression null being of type char(19), but there's no cast around the null value. Target table 'test' is incompatible with source expressions. Expression 'null' (type: CHAR(19)) is not compatible with column 'time_updated' (type: TIMESTAMP) ), Query: INSERT INTO `test`(`id`, `my_value`, `time_inserted`, `time_updated`) VALUES (CAST('71416037' AS CHAR(8)), 58.75, '2018-05-11 18:17:44.0', null), (CAST('71416525' AS CHAR(8)), 267.5, '2018-05-11 18:18:25.0', CAST('2018-05-15 18:09:37' AS CHAR(19))).: Any ideas? It seems strange that either in isolation is fine but both together is not.

shepheardt · ‎05-15-2018

I think I've found the issue: https://issues-test.apache.org/jira/browse/IMPALA-2494 Other than upgrading Impala to 2.11 or trying to hack PutParquet to use fixed_len_byte_array rather than binary (is that viable?), are there any ways to work around this? Edit: I'm thinking I can maybe map the avro decimal values to fixed type with some schema conversion between the QueryDatabaseTable and PutParquet processors. Would love to hear any other suggestions though.

shepheardt · ‎05-15-2018

I'm using avro-tools 1.8.2, Imapala is on 2.10.0-cdh5.13.1 I started checking the data because I was getting an error when trying to query the impala table, which was created from the parquet file produced by PutParquet. I've attached both the avro file from QueryDatabaseTable, and the parquet file produced by PutParquet. Below is the code I used to create and then query the table (paths abbreviated for conciseness): create external table test2 LIKE PARQUET '..../test/577533060432928' <br>STORED AS PARQUET <br>LOCATION '..../test' select * from test2; WARNINGS: File 'hdfs://nameservice1/.../test/577533060432928' has an incompatible Parquet schema for column 'test2.spread'. Column type: DECIMAL(10,0), Parquet schema: optional byte_array SPREAD [i:1 d:1 r:0] avro-parquet.zip

shepheardt · ‎05-15-2018

I'm attempting to use Nifi to take data from an Oracle database, and then write it as Parquet files to HDFS, from which I'll then build tables for use in Impala. I've setup a QueryDatabaseTable processor and linked it up to PutParquet. By default everything works, but all the values are Strings rather than having any dates, ints, doubles etc. After some quick research I realised I need to set the "Use Avro Logical Types" to true in the QueryDatabaseTable, so I did this. However, when I look at the Avro output from a simple test query, it doesn't look right. Running it through avro-tools tojson I get: {"TRADEID":{"string":"71416037"},"SPREAD":{"bytes":";"}} Why would bytes be showing '";"? I'm expecting a value of 58.75 here. The schema looks ok: { "type" : "record", "name" : "CDS_CTC", "namespace" : "VOL_DATA", "fields" : [ { "name" : "TRADEID", "type" : [ "null", "string" ] }, { "name" : "SPREAD", "type" : [ "null", { "type" : "bytes", "logicalType" : "decimal", "precision" : 10, "scale" : 0 } ] } ] } and without using avro logical types, I get the values I'd expect: {"TRADEID":{"string":"71416037"},"SPREAD":{"string":"58.75"}} Any suggestions please?

Online	Offline
Last Visited	‎10-09-2018 04:13 PM

Member Since	‎05-15-2018 02:05 PM
Last Visited	‎10-09-2018 04:13 PM
Posts	7
Kudos received	1

Cloudera Community

Re: PutDatabaseRecord and mix of null and not null...

PutDatabaseRecord and mix of null and not null tim...

Re: Avro logical types data format from QueryDatab...

Re: Avro logical types data format from QueryDatab...

Avro logical types data format from QueryDatabaseT...