Created 07-13-2018 08:30 PM
Hi All,
I did load the below XML file into HIVE table” test_table.customer_details” with a schema
Hive> Create table test_table.customer_details ( Customers array<struct<customer_details:struct<first_name:string,last_name:string,DOB:string,Addr1:string,City:string,state:string,country:string>>> )
row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
with serdeproperties
(
“Column.xpath.customerdetails” = “/FormResponse/Customers/customer_details”
)
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
"xmlinput.start"="<FormResponse>",
"xmlinput.end"="</FormResponse>"
);
XML file
<FormResponse>
<Customers>
<customer_details>
<First_name>Anji</First_name>
<Last_name> Raju</Last_name>
<DOB>06/24/1278<DOB>
<Addr1> 14 duck st </Addr1>
<City> boston </City>
<State> OH </State>
<Country> USA </Country>
</customer_details>
<customer_details>
<First_name>Jeet</First_name>
<Last_name> Anu</Last_name>
<DOB>06/24/1279<DOB>
<Addr1> tuttles groove </Addr1>
<City> denver </City>
<State> CA </State>
<Country> USA </Country>
</customer_details>
<customer_details>
<First_name>Test1</First_name>
<Last_name> Test_last</Last_name>
<DOB>006/24/1280<DOB>
<Addr1> Sleek street </Addr1>
<City> cali </City>
<State> MA </State>
<Country> USA </Country>
</customer_details>
</Customers>
</FormResponse>
When I write the command:
Hive> Select explode(customers.customer) from test_table.customer_details;
The output is
{“First_Name”:”Anji”,”Last_name”:”Raju”,”DOB”:” 06/24/1278”,”Addr1”:” 14 duck st”,”City”:”boston”,”State”:”OH”,”Country”:”USA”}
{“First_Name”:”Jeet”,”Last_name”:”Anu”,”DOB”:” 06/24/1279”,”Addr1”:” tuttles grove”,”City”:”denver”,”State”:”CA”,”Country”:”USA”}
{“First_Name”:”Test1”,”Last_name”:”Test_last”,”DOB”:” 06/24/1280”,”Addr1”:” Sleek street”,”City”:”cali”,”State”:”MA”,”Country”:”USA”}
I did create a new table
create table test_table.new ( customer array<struct<first_name:string,last_name:string,DOB:string, Addr1:string, city:string, State:string, Country:string >>);
I did try writing a hive command something like below to get the expected output but I am getting an Error:
hive> insert into test_table.new select explode( customers.customer_details) from test_table.customer_details;
FAILED: SemanticException [Error 10044]: Line 1:12 Cannot insert into target table because column number/types are different 'test_mess1': Cannot convert column 1 from array<struct<first_name:string,last_name:string,DOB:string,Addr1:string,city:string,State:string,Country:string >> to
array<struct<first_name:string,last_name:string,DOB:string, Addr1:string, city:string, State:string, Country:string >>
The Expected output should be:
First_name | Last_name | DOB | Addr1 | City | State | Country |
Anji | Raju | 06/24/1278 | 14 duck st | boston | OH | USA |
Jeet | Anu | 06/24/1279 | tuttles groove | denver | CA | USA |
Test1 | Test_lat | 06/24/1280 | Sleek street | cali | MA | USA |
Could someone help me to resolve this issue???
Created 07-14-2018 06:23 AM
Hello @Anji Raju!
Try this out:
#My XML content (don't forge to close your DOB TAG!)
<FormResponse> <Customers> <customer_details> <First_name>Anji</First_name> <Last_name> Raju</Last_name> <DOB>06/24/1278</DOB> <Addr1> 14 duck st </Addr1> <City> boston </City> <State> OH </State> <Country> USA </Country> </customer_details> <customer_details> <First_name>Jeet</First_name> <Last_name> Anu</Last_name> <DOB>06/24/1279</DOB> <Addr1> tuttles groove </Addr1> <City> denver </City> <State> CA </State> <Country> USA </Country> </customer_details> <customer_details> <First_name>Test1</First_name> <Last_name> Test_last</Last_name> <DOB>006/24/1280</DOB> <Addr1> Sleek street </Addr1> <City> cali </City> <State> MA </State> <Country> USA </Country> </customer_details> </Customers> </FormResponse> #DDL for the table (same as you) CREATE EXTERNAL TABLE default.customer_details ( Customers array<struct<customer_details:struct< First_name:string, Last_name:string, DOB:string, Addr1:string, City:string, State:string, Country:string>>> ) row format serde 'com.ibm.spss.hive.serde2.xml.XmlSerDe' with serdeproperties ( "column.xpath.Customers" = "/FormResponse/Customers/customer_details" ) STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION 'hdfs://Admin-TrainingNS/user/hive/warehouse/customer_details' TBLPROPERTIES ( "xmlinput.start"="<FormResponse>", "xmlinput.end"="</FormResponse>" ); #Query to extract columns select inline(Customers.customer_details) from default.customer_details; OK Anji Raju 06/24/1278 14 duck st boston OH USA Jeet Anu 06/24/1279 tuttles groove denver CA USA Test1 Test_last 006/24/1280 Sleek street cali MA USA Time taken: 0.08 seconds, Fetched: 3 row(s)
Hope this helps! 🙂
Created 07-16-2018 02:44 AM
Hi @Vinicius Higa Murakami , Thanks for the reply.
I was trying to use inline function with SELECT statement. but its throwing an Error.
select strngname, inline(Customers.customer_details) from default.customer_details;
BUT its throwing below Error:
SemanticException [Error 10081]: UDTf's are not supported outside the SELECT clause , nor nested in expressions:
Could you please help me with this.
Thanks,
Created 07-16-2018 05:57 AM
Hi @Anji Raju!
In this case, you'll need to use the LATERAL VIEW to use the inline function with another column.
hive> select c.*,'Vini' from default.customer_details lateral view inline(Customers.customer_details) c; OK Anji Raju 06/24/1278 14 duck st boston OH USA Vini Jeet Anu 06/24/1279 tuttles groove denver CA USA Vini Test1 Test_last 006/24/1280 Sleek street cali MA USA Vini Time taken: 0.094 seconds, Fetched: 3 row(s)
Hope this helps!
Created 03-08-2020 10:16 AM
Thanks, this really helps