Support Questions

roshanbi · ‎06-25-2021

Hello Team,

can you please advise if there is an equivalent of the function on Kudu below to extract address and location from a table with XML data type?

select TMP_ACCOUNT_CODE_N,decode(EXTRACTVALUE (address_x, '//ADDRESS_DTLS/@STREET_DESC'),'.',null,EXTRACTVALUE (address_x, '//ADDRESS_DTLS/@STREET_DESC'))||' '||EXTRACTVALUE (address_x, '//ADDRESS_DTLS/@SUB_LOCALITY_DESC')
||' '||EXTRACTVALUE (address_x, '//ADDRESS_DTLS/@CITY_DESC')
New_installation_address from tmp_address_xml@cbsstandby
where address_type_n = 4

Regards,

Roshan

aakulov · ‎07-08-2021

Hi @roshanbi ,

You can simply put your sqoop command in a shell file and then have Cron run that script according to your desired schedule (depending on how frequently you expect the data to be updated in the Oracle source table).

Alternatively, you can use Oozie to orchestrate the running of the sqoop job.

Regards,

Alex

View solution in original post

roshanbi · ‎06-25-2021

Which methods do you think would be most appropriate to use? I was thinking of using beehive to read the table/run query in real time and save it in parquet. The load from parqet to Kudu?

or using Sqoop to read Oracle tables into HDFS. Then from HDFS to Kudu.

aakulov · ‎06-26-2021

Hi @roshanbi,

There is no equivalent of oracle's EXTRACTVALUE in Kudu or Impala SQL. XML is not one of the supported formats.

I think your approach of running the SQL with EXTRACTVALUE as part of the sqoop job is a good one. That way you are writing the fields you need into target Hive table and from there insert into Kudu. This would be the fastest path and would work if you are not planning to get more fields from your source XML table in the future.

An alternative approach is to use Hive with a 3rd party SerDe for XML. Then let Hive do the conversion from XML format to something like Avro, and then create your Impala table based on that Avro data. This way is more involved, but gives you flexibility in terms of the schema.

Hope that helps,

Alex

roshanbi · ‎06-26-2021

Hi @aakulov

thanks for the update.

Can you please advise how can I schedule this scoop job so that it will update the hive table with incremental changes(CDC)? for example, suppose the XML fields are updated on Oracle, how can I schedule scoop job to replicate the incremental changes on Hive and Kudu?

Regards,

Roshan

aakulov · ‎07-08-2021

Hi @roshanbi ,

You can simply put your sqoop command in a shell file and then have Cron run that script according to your desired schedule (depending on how frequently you expect the data to be updated in the Oracle source table).

Alternatively, you can use Oozie to orchestrate the running of the sqoop job.

Regards,

Alex

Cloudera Community

Support Questions

Impala query