Reply
Highlighted
Explorer
Posts: 8
Registered: ‎01-08-2015

Can not query struct field with hive (CDH 5.9.0)

Hi!

 

I just switch to CDH 5.9.0 (a full new install, not an upgrade, on a new cluster).

I have a table like this one (a bit more complex, but here is an extract):

 

CREATE TABLE `products`(`header` struct<PCODE:string, PNAME:string>)
PARTITIONED BY (`IMPORT_DATE' string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://myhost.com:8020/user/hive/warehouse/dbp/products'
TBLPROPERTIES ('transient_lastDdlTime'='1482160314')

If I do:

SELECT header FROM products;

==> The query is successful and return all products headers (in a JSON format)

 

But if I do:

SELECT header.PCODE FROM products;

==> It fails with the following stacktrace:

Error: java.lang.RuntimeException: Error in configuring object
                at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
                at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
                at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
                at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449)
                at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
                at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:422)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
                at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.lang.reflect.Method.invoke(Method.java:498)
                at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
                ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
                at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
                at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
                at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
                at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
                ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                at java.lang.reflect.Method.invoke(Method.java:498)
                at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
                ... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
                at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:147)
                ... 22 more
Caused by: java.lang.NullPointerException
                at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61)
                at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:53)
                at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
                at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:980)
                at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:63)
                at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
                at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
                at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
                at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
                at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
                at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:431)
                at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
                at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126)
                ... 22 more

Any idea?

Explorer
Posts: 8
Registered: ‎01-08-2015

Re: Can not query struct field with hive (CDH 5.9.0)

I have downgraded all jars (/opt/cloudera/parcels/CDH/jars) with the 5.8.2 ones. The query is successfull. I can imagine that there is a problem with the CDH 5.9.0.

Explorer
Posts: 8
Registered: ‎01-08-2015

Re: Can not query struct field with hive (CDH 5.9.0)

If the table is stored as TextFile ( 'org.apache.hadoop.mapred.TextInputFormat'), the query runs successfuly.

We can think that the problem is linked with parquet.

Explorer
Posts: 8
Registered: ‎01-08-2015

Re: Can not query struct field with hive (CDH 5.9.0)

I also try:

 

SELECT header.pcode FROM products;

but it fails too.

 

 

I have try something else: Creating the table with field headers in lowercase:

 

CREATE TABLE `products`(`header` struct<pcode:string, pname:string>)
PARTITIONED BY (`IMPORT_DATE' string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://myhost.com:8020/user/hive/warehouse/dbp/products'
TBLPROPERTIES ('transient_lastDdlTime'='1482160314')

then

 

 

SELECT header.pcode FROM products;

==> The query runs successfully...

 

 

Here is a summary:

 

CREATE TABLE `products`(`header` struct<pcode:string, pname:string>) STORED AS PARQUET;

Select results:

  1. SELECT header.pcode FROM products ==> OK
  2. SELECT HEADER.pcode FROM products ==> OK
  3. SELECT header.PCODE FROM products ==> KO
  4. SELECT HEADER.PCODE FROM products ==> KO

-----------------------------------------------------------------------------------

CREATE TABLE `products`(`header` struct<PCODE:string, PNAME:string>) STORED AS PARQUET;

Select results:

  1. SELECT header.pcode FROM products ==> KO
  2. SELECT HEADER.pcode FROM products ==> KO
  3. SELECT header.PCODE FROM products ==> KO
  4. SELECT HEADER.PCODE FROM products ==> KO

==> Avoid UPPERCASE in struct fieldnames...

 

 

Announcements