Reply
Explorer
Posts: 15
Registered: ‎11-20-2013

pyodbc kudu number type error

Hey all !

 

Sorry if this is double posted.  It disappeared from the forum after an edit.

 

CDH 5.10

 

I realize that kudu and pyodbc are not supported but wondering if someone has run into this before.  Using pyodbc to insert rows into a kudu table.  DDL, pyodbc call and error below.

 

The issue is:  When the primary key column ("id") is type int, pyodbc throws an error trying to convert it to bigint before insert.

 

If I define the "id" column as bigint, all is fine.   The inserted "id" values will never approach bigint size so would prefer not to use that type.

 

Is there something wrong below or is there a way to control the conversion attempted ?

 

DDL**************

 

CREATE TABLE genomics.pipeline_status(
id INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
experiment_id INT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,
PRIMARY KEY (id)
)
PARTITION BY HASH (id) PARTITIONS 3
STORED AS KUDU
TBLPROPERTIES ('kudu.master_addresses'='<server>.choa.org')
;

 

PYODBC CALL**************

try:

#kerberos ticket already acquired

conn = pyodbc.connect('DSN=IMPALA_DEV', autocommit=True)
#print str(connection)
with conn.cursor() as cur:

cur.execute("insert into genomics.pipeline_status (id, experiment_id) values (?,?)", (int(id), int(experiment_id)))

except pyodbc.Error as e:
logger.critical('getting sync rundb results FAILED: ' + str(e) + '\n' + logPath)
finally:
logger.info('end inserting rows')

 

ERROR**************

getting sync rundb results FAILED: ('HY000', "[HY000] [Cloudera][ImpalaODBC] (110) Error while executing a query in Impala: [HY000] : AnalysisException: Possible loss of precision for target table 'genomics.pipeline_status'.\nExpression 'cast(5 as bigint)' (type: BIGINT) would need to be cast to INT for column 'id'\n (110) (SQLExecDirectW)")
/usr/local/scripts/genomics/logs/syncRunDbResults.py-2017_09_05_17_10_15.log
Cloudera Employee
Posts: 51
Registered: ‎09-28-2015

Re: pyodbc kudu number type error

It seems like this is a PyODBC specific issue - see https://github.com/
mkleehammer/pyodbc/issues/213 where people are asking for input on how to
explicitly specify the size of an int placeholder column.

Maybe there is some workaround such as using:

insert into genomics.pipeline_status (id, experiment_id) values (cast(?
as int), cast(? as int))

... though I have not tried that.

-Todd
Explorer
Posts: 15
Registered: ‎11-20-2013

Re: pyodbc kudu number type error

Thanks for responding with that URL Todd.

 

As the post describes, this is a known behavior and they appear to be workng to resolve.  I tried a few versions of casting the value as int but pyodbc assigns precision under the hood and impala chokes.

 

CAST(? as int)

becomes

CAST(CAST(? AS BIGINT) AS INT(3))

which throws this error when passed to impala

 

 

'HY000', '[HY000] [Cloudera][ImpalaODBC] (110) Error while executing a query in Impala: [HY000] : AnalysisException: Syntax error in line 1:\n...(CAST(16 AS BIGINT) AS int(3)), CAST(CAST(94 AS BIGINT...\n                             ^\nEncountered: (\nExpected: BLOCK_SIZE, COMMENT, COMPRESSION, DEFAULT, ENCODING, INTERMEDIATE, LOCATION, NOT, NULL, PRIMARY, COMMA\n\nCAUSED BY: Exception: Syntax error\n (110) (SQLExecDirectW)')

 

if it did not assign precision, it would work.

CAST(CAST(? AS BIGINT) AS INT)

 

Anyway, while bigint is not optimal, it is acceptable and I'm ready to move on.

 

Thanks again

TD

Announcements