Support Questions

avengers · ‎07-23-2019

cdh version is 6.1.0

1. Try merge small files.

insert  overwrite table tdb.tb_activity partition(ymd)
select * from tdb.tb_activity where ymd = '2019-07-22';

2. But, Exception raised.

"UnknownReason Blacklisting behavior can be configured..." in HUE.

And, Below is a spark container error log.

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Number of input columns was different than output columns (in = 9 vs out = 8
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:805)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
        at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:146)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:484)
        ... 19 more
Caused by: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Number of input columns was different than output columns (in = 9 vs out = 8
        at org.apache.hadoop.hive.serde2.avro.AvroSerializer.serialize(AvroSerializer.java:75)
        at org.apache.hadoop.hive.serde2.avro.AvroSerDe.serialize(AvroSerDe.java:212)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:725)
        ... 25 more
19/07/23 17:25:32 ERROR executor.Executor: Exception in task 4.0 in stage 0.0 (TID 2)

3. I found a strange in "explain query".

line24 : expressions: ..... extras (type: string), ....

line41: avro.schema.literal {"type":"record", ....... ,{"name":"extras","type":["null","string"],"default":null}, .....

line44: columns actiontype,contentid,contenttype,device,serviceid,timestamp,userip,userid

No have "extras" field on line 44.

10	
11	STAGE PLANS:
12	  Stage: Stage-1
13	    Spark
14	      DagName: hive_20190723175708_8b93a3ff-d533-48cc-865e-6af87f576858:29
15	      Vertices:
16	        Map 1 
17	            Map Operator Tree:
18	                TableScan
19	                  alias: tb_activity
20	                  filterExpr: (ymd = '2019-07-22') (type: boolean)
21	                  Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE
22	                  GatherStats: false
23	                  Select Operator
24	                    expressions: actiontype (type: string), contentid (type: string), contenttype (type: string), device (type: string), extras (type: string), serviceid (type: string), timestamp (type: bigint), userip (type: string), userid (type: string), '2019-07-22' (type: string)
25	                    outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9
26	                    Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE
27	                    File Output Operator
28	                      compressed: true
29	                      GlobalTableId: 1
30	                      directory: hdfs://nameservice1/etl/flume/tb_activity/.hive-staging_hive_2019-07-23_17-57-08_154_1733664706886256905-5/-ext-10002
31	                      NumFilesPerFileSink: 1
32	                      Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE
33	                      Stats Publishing Key Prefix: hdfs://nameservice1/etl/flume/tb_activity/.hive-staging_hive_2019-07-23_17-57-08_154_1733664706886256905-5/-ext-10000/
34	                      table:
35	                          input format: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
36	                          output format: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
37	                          properties:
38	                            DO_NOT_UPDATE_STATS true
39	                            EXTERNAL TRUE
40	                            STATS_GENERATED TASK
41	                            avro.schema.literal {"type":"record","name":"Activity","namespace":"com.bigdata.avro","doc":"Schema for com.bigdata.avro.Activity","fields":[{"name":"actionType","type":["null","string"]},{"name":"contentId","type":["null","string"]},{"name":"contentType","type":["null","string"]},{"name":"device","type":["null","string"]},{"name":"extras","type":["null","string"],"default":null},{"name":"serviceId","type":["null","string"]},{"name":"timestamp","type":["null","long"]},{"name":"userIp","type":["null","string"]},{"name":"userid","type":["null","string"]}]}
42	                            avro.schema.url hdfs:///metadata/avro/tb_activity.avsc
43	                            bucket_count -1
44	                            columns actiontype,contentid,contenttype,device,serviceid,timestamp,userip,userid
45	                            columns.comments 
46	                            columns.types string:string:string:string:string:bigint:string:string
47	                            file.inputformat org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
48	                            file.outputformat org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
49	                            impala.lastComputeStatsTime 1559636056
50	                            location hdfs://nameservice1/etl/flume/tb_activity
51	                            name tdb.tb_activity
52	                            numRows 83020631
53	                            partition_columns ymd
54	                            partition_columns.types string
55	                            serialization.ddl struct tb_activity { string actiontype, string contentid, string contenttype, string device, string serviceid, i64 timestamp, string userip, string userid}
56	                            serialization.format 1
57	                            serialization.lib org.apache.hadoop.hive.serde2.avro.AvroSerDe
58	                            totalSize 6334562388
59	                            transient_lastDdlTime 1556875047
60	                          serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe
61	                          name: tdb.tb_activity
62	                      TotalFiles: 1
63	                      GatherStats: true
64	                      MultiFileSpray: false

"extras" field on avro schema have "default" property.
and other fields has no "default" property.

I have been doing avro schema changes in the past. The "extras" field was then added.

What is wrong?

Cloudera Community

Support Questions

Who agreed with this topic

HiveException "Number of input columns was different..."