Support Questions

Find answers, ask questions, and share your expertise

Error While Inserting data into hive using Spark-Hive Table

Hi,

I am using HDP 2.4.0.

I have created hive table called table1 using Spark application, the data is stored in parquet format and the type of data is Complex JSON. I get the incremental data on hourly basis from MongoDB into this table and this table is External table.

Now i have created same table2 with same schema as table1 and tried to perform INSERTION into it. But it is throwing some exception.

Exception :

Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row

 at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
        ... 8 more
Caused by: java.lang.RuntimeException: Parquet record is malformed: empty fields are illegal, the field should be ommited completely instead
        at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64)
        at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
        at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
        at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
        at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
        at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
        at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111)
        at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:753)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97)
        at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508)
        ... 9 more
Caused by: parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead
        at parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:244)
        at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeArray(DataWritableWriter.java:186)
        at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeValue(DataWritableWriter.java:113)
        at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroupFields(DataWritableWriter.java:89)
        at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroup(DataWritableWriter.java:146)
        at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeValue(DataWritableWriter.java:119)
        at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroupFields(DataWritableWriter.java:89)
        at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:60)
        ... 23 more


Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143




FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask


Note: In JSON, some of the columns may be empty.

Please help me how to handle this kind of situation.

Thanks in advance

1 REPLY 1

Is filling empty values with some other values an option?

Example:

df
df.show()
df: org.apache.spark.sql.DataFrame = [a: bigint, b: bigint, c: string, x: double]
+---+----+----+----+
|  a|   b|   c|   x|
+---+----+----+----+
| 12|  34|wert| 2.1|
| 23|null|null|68.0|
| 23|null| xyz|null|
| 23|null|null|null|
+---+----+----+----+

Replace "na" values with reasonable type conform values:

val df2 = df.na.fill(Map(
  "a" -> -1,
  "b" -> -1,
  "c" -> "n/a",
  "x" -> 0.0
))
df2.show()
+---+---+----+----+
|  a|  b|   c|   x|
+---+---+----+----+
| 12| 34|wert| 2.1|
| 23| -1| n/a|68.0|
| 23| -1| xyz| 0.0|
| 23| -1| n/a| 0.0|
+---+---+----+----+