Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

CROSS APPLY SQL Server query on Hive

Highlighted

CROSS APPLY SQL Server query on Hive

Expert Contributor

HDP-2.5.0.0 using Ambari 2.4.0.1

The Hive table ReportSetting is as follows :

id int

serializedreportsetting String

The column 'serializedreportsetting' is an XML data type in the source SQL Server db but is converted to String during Sqoop import, this is how it looks in SQL Server :

<ReportSettings4 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <Trigger>
  <Manual>true</Manual>
  </Trigger>
<StartTime>
    <Year>8</Year>
    <Month>1</Month>
    <Day>1</Day>
    <Hour>0</Hour>
    <Minute>0</Minute>
  </StartTime>
  <ReportPeriod>
    <Month>0</Month>
    <Day>0</Day>
    <Hour>0</Hour>
    <Minute>5</Minute>
  </ReportPeriod>
  <Theft>
    <DigitalInput>true</DigitalInput>
    <Can>false</Can>
  </Theft>
  <SequenceNo>0</SequenceNo>
</ReportSettings4>

In Hive table :

<ReportSettings4 xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><Trigger><Manual>true</Manual></Trigger><StartTime><Year>8</Year><Month>12</Month><Day>31</Day><Hour>23</Hour><Minute>34</Minute></StartTime><ReportPeriod><Month>0</Month><Day>0</Day><Hour>4</Hour><Minute>0</Minute></ReportPeriod><Theft><DigitalInput>false</DigitalInput><Can>false</Can></Theft><SequenceNo>3</SequenceNo></ReportSettings4>

The query that works fine on the SQL Server :

SELECT
r.VehicleId
,rs.value('(Trigger/Manual)[1]', 'bit') AS RS_Trigger_Manual, ,CAST(CONCAT(CASE WHEN rs.value('(StartTime/Year)[1]', 'int') < 10 THEN CONCAT('200',rs.value('(StartTime/Year)[1]', 'int')) ELSE CONCAT('20',rs.value('(StartTime/Year)[1]', 'int')) END,'-',rs.value('(StartTime/Month)[1]', 'int'),'-',rs.value('(StartTime/Day)[1]', 'int'),' ',rs.value('(StartTime/Hour)[1]', 'int'),':',rs.value('(StartTime/Minute)[1]', 'int'),':','00.000') AS datetime) AS RS_StartTime
,rs.value('(ReportPeriod/Month)[1]', 'int') AS RS_ReportPeriod_Month
,rs.value('(ReportPeriod/Day)[1]', 'int') AS RS_ReportPeriod_Day
,rs.value('(ReportPeriod/Hour)[1]', 'int') AS RS_ReportPeriod_Hour
,rs.value('(ReportPeriod/Minute)[1]', 'int') AS RS_ReportPeriod_Minute
,rs.value('(Theft/DigitalInput)[1]', 'bit') AS RS_Theft_DigitalInput
,rs.value('(Theft/Can)[1]', 'bit') AS RS_Theft_Can,rs.value('(SequenceNo)[1]', 'int') 

AS RS_SequenceNo FROM ReportSetting r
  CROSS APPLY SerializedReportSetting.nodes('/*') AS ReportSettings(rs)

I could think of/do the following :

  1. To use the CROSS APPLY, I guess lateral view needs to be used, here I don't have the serializedreportsetting as an array, so explode() won't work. Can someone validate if I'm thinking in the right direction
  2. I simply tried to get the data in serializedreportsetting as columns using the built-in xpath udf, however, I don't get any records, few trials are as follows :
select xpath(SerializedReportSetting,'/*') from ReportSetting limit 1;
+------+--+
| _c0  |
+------+--+
| []   |
+------+--+select xpath(SerializedReportSetting,'/ReportSettings4') from ReportSetting limit 1;
+------+--+
| _c0  |
+------+--+
| []   |
+------+--+select xpath(SerializedReportSetting,'/Trigger/Manual') from ReportSetting limit 1;
+------+--+
| _c0  |
+------+--+
| []   |
+------+--+
Don't have an account?
Coming from Hortonworks? Activate your account here