Created on 07-23-2019 02:40 AM - edited 09-16-2022 07:31 AM
cdh version is 6.1.0
1. Try merge small files.
insert overwrite table tdb.tb_activity partition(ymd) select * from tdb.tb_activity where ymd = '2019-07-22';
2. But, Exception raised.
"UnknownReason Blacklisting behavior can be configured..." in HUE.
And, Below is a spark container error log.
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Number of input columns was different than output columns (in = 9 vs out = 8
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:805)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:146)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:484)
... 19 more
Caused by: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Number of input columns was different than output columns (in = 9 vs out = 8
at org.apache.hadoop.hive.serde2.avro.AvroSerializer.serialize(AvroSerializer.java:75)
at org.apache.hadoop.hive.serde2.avro.AvroSerDe.serialize(AvroSerDe.java:212)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:725)
... 25 more
19/07/23 17:25:32 ERROR executor.Executor: Exception in task 4.0 in stage 0.0 (TID 2)
3. I found a strange in "explain query".
line24 : expressions: ..... extras (type: string), ....
line41: avro.schema.literal {"type":"record", ....... ,{"name":"extras","type":["null","string"],"default":null}, .....
line44: columns actiontype,contentid,contenttype,device,serviceid,timestamp,userip,userid
No have "extras" field on line 44.
10
11 STAGE PLANS:
12 Stage: Stage-1
13 Spark
14 DagName: hive_20190723175708_8b93a3ff-d533-48cc-865e-6af87f576858:29
15 Vertices:
16 Map 1
17 Map Operator Tree:
18 TableScan
19 alias: tb_activity
20 filterExpr: (ymd = '2019-07-22') (type: boolean)
21 Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE
22 GatherStats: false
23 Select Operator
24 expressions: actiontype (type: string), contentid (type: string), contenttype (type: string), device (type: string), extras (type: string), serviceid (type: string), timestamp (type: bigint), userip (type: string), userid (type: string), '2019-07-22' (type: string)
25 outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9
26 Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE
27 File Output Operator
28 compressed: true
29 GlobalTableId: 1
30 directory: hdfs://nameservice1/etl/flume/tb_activity/.hive-staging_hive_2019-07-23_17-57-08_154_1733664706886256905-5/-ext-10002
31 NumFilesPerFileSink: 1
32 Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE
33 Stats Publishing Key Prefix: hdfs://nameservice1/etl/flume/tb_activity/.hive-staging_hive_2019-07-23_17-57-08_154_1733664706886256905-5/-ext-10000/
34 table:
35 input format: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
36 output format: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
37 properties:
38 DO_NOT_UPDATE_STATS true
39 EXTERNAL TRUE
40 STATS_GENERATED TASK
41 avro.schema.literal {"type":"record","name":"Activity","namespace":"com.bigdata.avro","doc":"Schema for com.bigdata.avro.Activity","fields":[{"name":"actionType","type":["null","string"]},{"name":"contentId","type":["null","string"]},{"name":"contentType","type":["null","string"]},{"name":"device","type":["null","string"]},{"name":"extras","type":["null","string"],"default":null},{"name":"serviceId","type":["null","string"]},{"name":"timestamp","type":["null","long"]},{"name":"userIp","type":["null","string"]},{"name":"userid","type":["null","string"]}]}
42 avro.schema.url hdfs:///metadata/avro/tb_activity.avsc
43 bucket_count -1
44 columns actiontype,contentid,contenttype,device,serviceid,timestamp,userip,userid
45 columns.comments
46 columns.types string:string:string:string:string:bigint:string:string
47 file.inputformat org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
48 file.outputformat org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
49 impala.lastComputeStatsTime 1559636056
50 location hdfs://nameservice1/etl/flume/tb_activity
51 name tdb.tb_activity
52 numRows 83020631
53 partition_columns ymd
54 partition_columns.types string
55 serialization.ddl struct tb_activity { string actiontype, string contentid, string contenttype, string device, string serviceid, i64 timestamp, string userip, string userid}
56 serialization.format 1
57 serialization.lib org.apache.hadoop.hive.serde2.avro.AvroSerDe
58 totalSize 6334562388
59 transient_lastDdlTime 1556875047
60 serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe
61 name: tdb.tb_activity
62 TotalFiles: 1
63 GatherStats: true
64 MultiFileSpray: false
"extras" field on avro schema have "default" property.
and other fields has no "default" property.
I have been doing avro schema changes in the past. The "extras" field was then added.
What is wrong?
Created 07-23-2019 04:36 AM
Hi.
this issue basically comes ,when you try to access columns from a parent table to child table and whatever you are trying to access columns that is not matching with your parents table at time of insertion.
so suggest you please check your parent table columns and child while trying to Create table or inserting data,loadinging data.
Thanks
HadoopHelp
Created on 07-23-2019 06:21 PM - edited 07-23-2019 06:30 PM
HI
That is not query to insert from parent table to child table.
That is query to insert with same table.
I try to merge flume small files.
These are properties of the table.
I think, Some properties be reason.
I am finding a solution.
DO_NOT_UPDATE_STATS true EXTERNAL TRUE STATS_GENERATED TASK impala.lastComputeStatsTime 1559634875 numRows 3035760415 totalSize 315026055870 transient_lastDdlTime 1559548433
Created 07-24-2019 03:59 AM
Created 08-04-2019 09:01 PM
Thanks.
I did find the solution.
The reason was relate impala compute.
I have been update avro schema 'extras' field.
I have been execute 'impala compute' command.
Maybe reason is that.
I did drop table my external table, And create table.
It is working.
Created on 08-06-2019 10:55 AM - edited 08-06-2019 11:05 AM
I'm happy to see you resolved your issue. Please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.