- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Impala query
- Labels:
-
Apache Kudu
Created 06-25-2021 04:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Team,
can you please advise if there is an equivalent of the function on Kudu below to extract address and location from a table with XML data type?
select TMP_ACCOUNT_CODE_N,decode(EXTRACTVALUE (address_x, '//ADDRESS_DTLS/@STREET_DESC'),'.',null,EXTRACTVALUE (address_x, '//ADDRESS_DTLS/@STREET_DESC'))||' '||EXTRACTVALUE (address_x, '//ADDRESS_DTLS/@SUB_LOCALITY_DESC')
||' '||EXTRACTVALUE (address_x, '//ADDRESS_DTLS/@CITY_DESC')
New_installation_address from tmp_address_xml@cbsstandby
where address_type_n = 4
Regards,
Roshan
Created 07-08-2021 02:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @roshanbi ,
You can simply put your sqoop command in a shell file and then have Cron run that script according to your desired schedule (depending on how frequently you expect the data to be updated in the Oracle source table).
Alternatively, you can use Oozie to orchestrate the running of the sqoop job.
Regards,
Alex
Created on 06-25-2021 06:53 AM - edited 06-25-2021 07:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Which methods do you think would be most appropriate to use? I was thinking of using beehive to read the table/run query in real time and save it in parquet. The load from parqet to Kudu?
or using Sqoop to read Oracle tables into HDFS. Then from HDFS to Kudu.
Created 06-26-2021 07:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @roshanbi,
There is no equivalent of oracle's EXTRACTVALUE in Kudu or Impala SQL. XML is not one of the supported formats.
I think your approach of running the SQL with EXTRACTVALUE as part of the sqoop job is a good one. That way you are writing the fields you need into target Hive table and from there insert into Kudu. This would be the fastest path and would work if you are not planning to get more fields from your source XML table in the future.
An alternative approach is to use Hive with a 3rd party SerDe for XML. Then let Hive do the conversion from XML format to something like Avro, and then create your Impala table based on that Avro data. This way is more involved, but gives you flexibility in terms of the schema.
Hope that helps,
Alex
Created 06-26-2021 07:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @aakulov
thanks for the update.
Can you please advise how can I schedule this scoop job so that it will update the hive table with incremental changes(CDC)? for example, suppose the XML fields are updated on Oracle, how can I schedule scoop job to replicate the incremental changes on Hive and Kudu?
Regards,
Roshan
Created 07-08-2021 02:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @roshanbi ,
You can simply put your sqoop command in a shell file and then have Cron run that script according to your desired schedule (depending on how frequently you expect the data to be updated in the Oracle source table).
Alternatively, you can use Oozie to orchestrate the running of the sqoop job.
Regards,
Alex
