Support Questions
Find answers, ask questions, and share your expertise

CROSS APPLY SQL Server query on Hive

Expert Contributor

HDP-2.5.0.0 using Ambari 2.4.0.1

The Hive table ReportSetting is as follows :

id int

serializedreportsetting String

The column 'serializedreportsetting' is an XML data type in the source SQL Server db but is converted to String during Sqoop import, this is how it looks in SQL Server :

<ReportSettings4 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <Trigger>
  <Manual>true</Manual>
  </Trigger>
<StartTime>
    <Year>8</Year>
    <Month>1</Month>
    <Day>1</Day>
    <Hour>0</Hour>
    <Minute>0</Minute>
  </StartTime>
  <ReportPeriod>
    <Month>0</Month>
    <Day>0</Day>
    <Hour>0</Hour>
    <Minute>5</Minute>
  </ReportPeriod>
  <Theft>
    <DigitalInput>true</DigitalInput>
    <Can>false</Can>
  </Theft>
  <SequenceNo>0</SequenceNo>
</ReportSettings4>

In Hive table :

<ReportSettings4 xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><Trigger><Manual>true</Manual></Trigger><StartTime><Year>8</Year><Month>12</Month><Day>31</Day><Hour>23</Hour><Minute>34</Minute></StartTime><ReportPeriod><Month>0</Month><Day>0</Day><Hour>4</Hour><Minute>0</Minute></ReportPeriod><Theft><DigitalInput>false</DigitalInput><Can>false</Can></Theft><SequenceNo>3</SequenceNo></ReportSettings4>

The query that works fine on the SQL Server :

SELECT
r.VehicleId
,rs.value('(Trigger/Manual)[1]', 'bit') AS RS_Trigger_Manual, ,CAST(CONCAT(CASE WHEN rs.value('(StartTime/Year)[1]', 'int') < 10 THEN CONCAT('200',rs.value('(StartTime/Year)[1]', 'int')) ELSE CONCAT('20',rs.value('(StartTime/Year)[1]', 'int')) END,'-',rs.value('(StartTime/Month)[1]', 'int'),'-',rs.value('(StartTime/Day)[1]', 'int'),' ',rs.value('(StartTime/Hour)[1]', 'int'),':',rs.value('(StartTime/Minute)[1]', 'int'),':','00.000') AS datetime) AS RS_StartTime
,rs.value('(ReportPeriod/Month)[1]', 'int') AS RS_ReportPeriod_Month
,rs.value('(ReportPeriod/Day)[1]', 'int') AS RS_ReportPeriod_Day
,rs.value('(ReportPeriod/Hour)[1]', 'int') AS RS_ReportPeriod_Hour
,rs.value('(ReportPeriod/Minute)[1]', 'int') AS RS_ReportPeriod_Minute
,rs.value('(Theft/DigitalInput)[1]', 'bit') AS RS_Theft_DigitalInput
,rs.value('(Theft/Can)[1]', 'bit') AS RS_Theft_Can,rs.value('(SequenceNo)[1]', 'int') 

AS RS_SequenceNo FROM ReportSetting r
  CROSS APPLY SerializedReportSetting.nodes('/*') AS ReportSettings(rs)

I could think of/do the following :

  1. To use the CROSS APPLY, I guess lateral view needs to be used, here I don't have the serializedreportsetting as an array, so explode() won't work. Can someone validate if I'm thinking in the right direction
  2. I simply tried to get the data in serializedreportsetting as columns using the built-in xpath udf, however, I don't get any records, few trials are as follows :
select xpath(SerializedReportSetting,'/*') from ReportSetting limit 1;
+------+--+
| _c0  |
+------+--+
| []   |
+------+--+select xpath(SerializedReportSetting,'/ReportSettings4') from ReportSetting limit 1;
+------+--+
| _c0  |
+------+--+
| []   |
+------+--+select xpath(SerializedReportSetting,'/Trigger/Manual') from ReportSetting limit 1;
+------+--+
| _c0  |
+------+--+
| []   |
+------+--+