New Contributor
Posts: 1
Registered: ‎06-20-2017

sql function

Hi I am new here, and would like to get your help.


The data I have contains 10 diagnoses codes in a same row -- dx1, dx2,.... dx10. It is so tidious to use in SQL, how I use a string that contains all 10 dx in the SQL logic? Below is what I usually have to write


select patientid, dx1, dx2, dx3, dx4, dx5, dx6, dx7, dx8, dx9, dx10

from patient_visits

where dx1 in ('XXXXX','YYYYY','ZZZZZ') or 

           dx2 in ('XXXXX','YYYYY','ZZZZZ') or

           dx3 in ('XXXXX','YYYYY','ZZZZZ') or


           dx10 in ('XXXXX','YYYYY','ZZZZZ');


I would like to get something like the following


Select patientd, dxstring,

from patient_visits

where dxstring in 'XXXXX','YYYYY','ZZZZZ');




Cloudera Employee
Posts: 116
Registered: ‎03-23-2015

Re: sql function

I believe you can use the Map data type for your table.

CREATE TABLE test (patientid int, dx_data map<string, string>);

However, you might still need to say:

dx_data['dx1'] in ('XXXX', 'YYYY','ZZZZ') OR
dx_data['dx2'] in .... OR

There is a custom UDF that can hep you to find intersection of two arrays, so you might be able to use it and then:

SELECT patientid, dx_data FROM test WHERE
array_intersect(map_values(dx_data), array('XXXXX','YYYYY','ZZZZZ'))
) > 0;

I am not sure about performance though, you might need to test out yourself.