Created on 07-23-2019 02:40 AM - edited 09-16-2022 07:31 AM
cdh version is 6.1.0
1. Try merge small files.
insert overwrite table tdb.tb_activity partition(ymd) select * from tdb.tb_activity where ymd = '2019-07-22';
2. But, Exception raised.
"UnknownReason Blacklisting behavior can be configured..." in HUE.
And, Below is a spark container error log.
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Number of input columns was different than output columns (in = 9 vs out = 8 at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:805) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:146) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:484) ... 19 more Caused by: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Number of input columns was different than output columns (in = 9 vs out = 8 at org.apache.hadoop.hive.serde2.avro.AvroSerializer.serialize(AvroSerializer.java:75) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.serialize(AvroSerDe.java:212) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:725) ... 25 more 19/07/23 17:25:32 ERROR executor.Executor: Exception in task 4.0 in stage 0.0 (TID 2)
3. I found a strange in "explain query".
line24 : expressions: ..... extras (type: string), ....
line41: avro.schema.literal {"type":"record", ....... ,{"name":"extras","type":["null","string"],"default":null}, .....
line44: columns actiontype,contentid,contenttype,device,serviceid,timestamp,userip,userid
No have "extras" field on line 44.
10 11 STAGE PLANS: 12 Stage: Stage-1 13 Spark 14 DagName: hive_20190723175708_8b93a3ff-d533-48cc-865e-6af87f576858:29 15 Vertices: 16 Map 1 17 Map Operator Tree: 18 TableScan 19 alias: tb_activity 20 filterExpr: (ymd = '2019-07-22') (type: boolean) 21 Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE 22 GatherStats: false 23 Select Operator 24 expressions: actiontype (type: string), contentid (type: string), contenttype (type: string), device (type: string), extras (type: string), serviceid (type: string), timestamp (type: bigint), userip (type: string), userid (type: string), '2019-07-22' (type: string) 25 outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9 26 Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE 27 File Output Operator 28 compressed: true 29 GlobalTableId: 1 30 directory: hdfs://nameservice1/etl/flume/tb_activity/.hive-staging_hive_2019-07-23_17-57-08_154_1733664706886256905-5/-ext-10002 31 NumFilesPerFileSink: 1 32 Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE 33 Stats Publishing Key Prefix: hdfs://nameservice1/etl/flume/tb_activity/.hive-staging_hive_2019-07-23_17-57-08_154_1733664706886256905-5/-ext-10000/ 34 table: 35 input format: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat 36 output format: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat 37 properties: 38 DO_NOT_UPDATE_STATS true 39 EXTERNAL TRUE 40 STATS_GENERATED TASK 41 avro.schema.literal {"type":"record","name":"Activity","namespace":"com.bigdata.avro","doc":"Schema for com.bigdata.avro.Activity","fields":[{"name":"actionType","type":["null","string"]},{"name":"contentId","type":["null","string"]},{"name":"contentType","type":["null","string"]},{"name":"device","type":["null","string"]},{"name":"extras","type":["null","string"],"default":null},{"name":"serviceId","type":["null","string"]},{"name":"timestamp","type":["null","long"]},{"name":"userIp","type":["null","string"]},{"name":"userid","type":["null","string"]}]} 42 avro.schema.url hdfs:///metadata/avro/tb_activity.avsc 43 bucket_count -1 44 columns actiontype,contentid,contenttype,device,serviceid,timestamp,userip,userid 45 columns.comments 46 columns.types string:string:string:string:string:bigint:string:string 47 file.inputformat org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat 48 file.outputformat org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat 49 impala.lastComputeStatsTime 1559636056 50 location hdfs://nameservice1/etl/flume/tb_activity 51 name tdb.tb_activity 52 numRows 83020631 53 partition_columns ymd 54 partition_columns.types string 55 serialization.ddl struct tb_activity { string actiontype, string contentid, string contenttype, string device, string serviceid, i64 timestamp, string userip, string userid} 56 serialization.format 1 57 serialization.lib org.apache.hadoop.hive.serde2.avro.AvroSerDe 58 totalSize 6334562388 59 transient_lastDdlTime 1556875047 60 serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe 61 name: tdb.tb_activity 62 TotalFiles: 1 63 GatherStats: true 64 MultiFileSpray: false
"extras" field on avro schema have "default" property.
and other fields has no "default" property.
I have been doing avro schema changes in the past. The "extras" field was then added.
What is wrong?
Created 07-23-2019 04:36 AM
Hi.
this issue basically comes ,when you try to access columns from a parent table to child table and whatever you are trying to access columns that is not matching with your parents table at time of insertion.
so suggest you please check your parent table columns and child while trying to Create table or inserting data,loadinging data.
Thanks
HadoopHelp
Created on 07-23-2019 06:21 PM - edited 07-23-2019 06:30 PM
HI
That is not query to insert from parent table to child table.
That is query to insert with same table.
I try to merge flume small files.
These are properties of the table.
I think, Some properties be reason.
I am finding a solution.
DO_NOT_UPDATE_STATS true EXTERNAL TRUE STATS_GENERATED TASK impala.lastComputeStatsTime 1559634875 numRows 3035760415 totalSize 315026055870 transient_lastDdlTime 1559548433
Created 07-24-2019 03:59 AM
Created 08-04-2019 09:01 PM
Thanks.
I did find the solution.
The reason was relate impala compute.
I have been update avro schema 'extras' field.
I have been execute 'impala compute' command.
Maybe reason is that.
I did drop table my external table, And create table.
It is working.
Created on 08-06-2019 10:55 AM - edited 08-06-2019 11:05 AM
I'm happy to see you resolved your issue. Please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.