<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive and XML parsing and saving table as ORC or textFile in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Hive-and-XML-parsing-and-saving-table-as-ORC-or-textFile/m-p/292809#M216329</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/76146"&gt;@saaga119&lt;/a&gt;&amp;nbsp;The solution here is to create the external table in the format you need to query the raw xml data. &amp;nbsp;Seems like you have this done already. &amp;nbsp; Next create a native hive table. &amp;nbsp;It could still be external too. The new table should be the schema for the fields you want with required ORC lines. &amp;nbsp;You can also choose some other text format; for example: csv. &amp;nbsp; Last INSERT INTO new_table SELECT * FROM external_table.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The main idea here is to use external and staging tables, then INSERT INTO SELECT to fill the final table you want. &amp;nbsp;Final table could then be optimized for performance (ORC). &amp;nbsp; This kind of idea also allows multiple external/raw/staging tables able to be combined (select w/ join) into a single final optimized table (orc, compression, partitions, buckets, etc).&lt;/P&gt;</description>
    <pubDate>Sun, 29 Mar 2020 12:58:03 GMT</pubDate>
    <dc:creator>stevenmatison</dc:creator>
    <dc:date>2020-03-29T12:58:03Z</dc:date>
    <item>
      <title>Hive and XML parsing and saving table as ORC or textFile</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-and-XML-parsing-and-saving-table-as-ORC-or-textFile/m-p/292808#M216328</link>
      <description>&lt;P&gt;For below sample xml file:&lt;/P&gt;&lt;P&gt;&amp;lt;Company&amp;gt;&lt;BR /&gt;&amp;lt;Employee&amp;gt;&lt;BR /&gt;&amp;lt;Id&amp;gt;458790&amp;lt;/Id&amp;gt;&lt;BR /&gt;&amp;lt;Name&amp;gt;Sameer&amp;lt;/Name&amp;gt;&lt;BR /&gt;&amp;lt;Email&amp;gt;sameer@email.com&amp;lt;/Email&amp;gt;&lt;BR /&gt;&amp;lt;Address&amp;gt;&lt;BR /&gt;&amp;lt;HouseNo&amp;gt;105&amp;lt;/HouseNo&amp;gt;&lt;BR /&gt;&amp;lt;Street&amp;gt;Grand Road&amp;lt;/Street&amp;gt;&lt;BR /&gt;&amp;lt;City&amp;gt;Bangalore&amp;lt;/City&amp;gt;&lt;BR /&gt;&amp;lt;State&amp;gt;Karnataka&amp;lt;/State&amp;gt;&lt;BR /&gt;&amp;lt;Pincode&amp;gt;560068&amp;lt;/Pincode&amp;gt;&lt;BR /&gt;&amp;lt;Country&amp;gt;India&amp;lt;/Country&amp;gt;&lt;BR /&gt;&amp;lt;Passport&amp;gt;Available&amp;lt;/Passport&amp;gt;&lt;BR /&gt;&amp;lt;Visa/&amp;gt;&lt;BR /&gt;&amp;lt;Contact&amp;gt;&lt;BR /&gt;&amp;lt;Mobile&amp;gt;9909999999&amp;lt;/Mobile&amp;gt;&lt;BR /&gt;&amp;lt;Phone&amp;gt;8044552266&amp;lt;/Phone&amp;gt;&lt;BR /&gt;&amp;lt;/Contact&amp;gt;&lt;BR /&gt;&amp;lt;/Address&amp;gt;&lt;BR /&gt;&amp;lt;/Employee&amp;gt;&lt;BR /&gt;&amp;lt;Employee&amp;gt;&lt;BR /&gt;&amp;lt;Id&amp;gt;458791&amp;lt;/Id&amp;gt;&lt;BR /&gt;&amp;lt;Name&amp;gt;Gohar&amp;lt;/Name&amp;gt;&lt;BR /&gt;&amp;lt;Email&amp;gt;Gohar@email.com&amp;lt;/Email&amp;gt;&lt;BR /&gt;&amp;lt;Address&amp;gt;&lt;BR /&gt;&amp;lt;HouseNo&amp;gt;485&amp;lt;/HouseNo&amp;gt;&lt;BR /&gt;&amp;lt;Street&amp;gt;Camac Street Road&amp;lt;/Street&amp;gt;&lt;BR /&gt;&amp;lt;City&amp;gt;Mumbai&amp;lt;/City&amp;gt;&lt;BR /&gt;&amp;lt;State&amp;gt;Maharastra&amp;lt;/State&amp;gt;&lt;BR /&gt;&amp;lt;Pincode&amp;gt;400001&amp;lt;/Pincode&amp;gt;&lt;BR /&gt;&amp;lt;Country&amp;gt;India&amp;lt;/Country&amp;gt;&lt;BR /&gt;&amp;lt;Passport&amp;gt;Available&amp;lt;/Passport&amp;gt;&lt;BR /&gt;&amp;lt;Visa/&amp;gt;&lt;BR /&gt;&amp;lt;Contact&amp;gt;&lt;BR /&gt;&amp;lt;Mobile&amp;gt;9908888888&amp;lt;/Mobile&amp;gt;&lt;BR /&gt;&amp;lt;Phone/&amp;gt;&lt;BR /&gt;&amp;lt;/Contact&amp;gt;&lt;BR /&gt;&amp;lt;/Address&amp;gt;&lt;BR /&gt;&amp;lt;/Employee&amp;gt;&lt;BR /&gt;&amp;lt;/Company&amp;gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Executed this query for a table containing three columns&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;id name email&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Getting all employees information from xml into different rows like:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;458790 Sameer sameer@email.com&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;458791 Gohar Gohar@email.com&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;by executing below query&lt;BR /&gt;CREATE EXTERNAL TABLE Company (Id STRING, Name String,Email String)&lt;BR /&gt;ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'&lt;BR /&gt;WITH SERDEPROPERTIES (&lt;BR /&gt;"column.xpath.Id"="Employee/Id/text()",&lt;BR /&gt;"column.xpath.Name"="Employee/Name/text()",&lt;BR /&gt;"column.xpath.Email"="Employee/Email/text()")&lt;BR /&gt;STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'&lt;BR /&gt;OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'&lt;BR /&gt;LOCATION '/user/add/hive/xml'&lt;BR /&gt;TBLPROPERTIES ("xmlinput.start"="&amp;lt;Company","xmlinput.end"= "&amp;lt;/Company&amp;gt;")&lt;/P&gt;&lt;P&gt;but when I am storing it as irc or textfile it is giving null values&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;like &lt;STRONG&gt;stored as textfile&lt;/STRONG&gt; then&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;NULL NULL NULL&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;how&amp;nbsp; to resolve it, Thanks&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 29 Mar 2020 12:25:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-and-XML-parsing-and-saving-table-as-ORC-or-textFile/m-p/292808#M216328</guid>
      <dc:creator>saaga119</dc:creator>
      <dc:date>2020-03-29T12:25:31Z</dc:date>
    </item>
    <item>
      <title>Re: Hive and XML parsing and saving table as ORC or textFile</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Hive-and-XML-parsing-and-saving-table-as-ORC-or-textFile/m-p/292809#M216329</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/76146"&gt;@saaga119&lt;/a&gt;&amp;nbsp;The solution here is to create the external table in the format you need to query the raw xml data. &amp;nbsp;Seems like you have this done already. &amp;nbsp; Next create a native hive table. &amp;nbsp;It could still be external too. The new table should be the schema for the fields you want with required ORC lines. &amp;nbsp;You can also choose some other text format; for example: csv. &amp;nbsp; Last INSERT INTO new_table SELECT * FROM external_table.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The main idea here is to use external and staging tables, then INSERT INTO SELECT to fill the final table you want. &amp;nbsp;Final table could then be optimized for performance (ORC). &amp;nbsp; This kind of idea also allows multiple external/raw/staging tables able to be combined (select w/ join) into a single final optimized table (orc, compression, partitions, buckets, etc).&lt;/P&gt;</description>
      <pubDate>Sun, 29 Mar 2020 12:58:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Hive-and-XML-parsing-and-saving-table-as-ORC-or-textFile/m-p/292809#M216329</guid>
      <dc:creator>stevenmatison</dc:creator>
      <dc:date>2020-03-29T12:58:03Z</dc:date>
    </item>
  </channel>
</rss>

