Support Questions

avengers · ‎07-23-2019

cdh version is 6.1.0

1. Try merge small files.

insert  overwrite table tdb.tb_activity partition(ymd)
select * from tdb.tb_activity where ymd = '2019-07-22';

2. But, Exception raised.

"UnknownReason Blacklisting behavior can be configured..." in HUE.

And, Below is a spark container error log.

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Number of input columns was different than output columns (in = 9 vs out = 8
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:805)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
        at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:146)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:484)
        ... 19 more
Caused by: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Number of input columns was different than output columns (in = 9 vs out = 8
        at org.apache.hadoop.hive.serde2.avro.AvroSerializer.serialize(AvroSerializer.java:75)
        at org.apache.hadoop.hive.serde2.avro.AvroSerDe.serialize(AvroSerDe.java:212)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:725)
        ... 25 more
19/07/23 17:25:32 ERROR executor.Executor: Exception in task 4.0 in stage 0.0 (TID 2)

3. I found a strange in "explain query".

line24 : expressions: ..... extras (type: string), ....

line41: avro.schema.literal {"type":"record", ....... ,{"name":"extras","type":["null","string"],"default":null}, .....

line44: columns actiontype,contentid,contenttype,device,serviceid,timestamp,userip,userid

No have "extras" field on line 44.

10	
11	STAGE PLANS:
12	  Stage: Stage-1
13	    Spark
14	      DagName: hive_20190723175708_8b93a3ff-d533-48cc-865e-6af87f576858:29
15	      Vertices:
16	        Map 1 
17	            Map Operator Tree:
18	                TableScan
19	                  alias: tb_activity
20	                  filterExpr: (ymd = '2019-07-22') (type: boolean)
21	                  Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE
22	                  GatherStats: false
23	                  Select Operator
24	                    expressions: actiontype (type: string), contentid (type: string), contenttype (type: string), device (type: string), extras (type: string), serviceid (type: string), timestamp (type: bigint), userip (type: string), userid (type: string), '2019-07-22' (type: string)
25	                    outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9
26	                    Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE
27	                    File Output Operator
28	                      compressed: true
29	                      GlobalTableId: 1
30	                      directory: hdfs://nameservice1/etl/flume/tb_activity/.hive-staging_hive_2019-07-23_17-57-08_154_1733664706886256905-5/-ext-10002
31	                      NumFilesPerFileSink: 1
32	                      Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE
33	                      Stats Publishing Key Prefix: hdfs://nameservice1/etl/flume/tb_activity/.hive-staging_hive_2019-07-23_17-57-08_154_1733664706886256905-5/-ext-10000/
34	                      table:
35	                          input format: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
36	                          output format: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
37	                          properties:
38	                            DO_NOT_UPDATE_STATS true
39	                            EXTERNAL TRUE
40	                            STATS_GENERATED TASK
41	                            avro.schema.literal {"type":"record","name":"Activity","namespace":"com.bigdata.avro","doc":"Schema for com.bigdata.avro.Activity","fields":[{"name":"actionType","type":["null","string"]},{"name":"contentId","type":["null","string"]},{"name":"contentType","type":["null","string"]},{"name":"device","type":["null","string"]},{"name":"extras","type":["null","string"],"default":null},{"name":"serviceId","type":["null","string"]},{"name":"timestamp","type":["null","long"]},{"name":"userIp","type":["null","string"]},{"name":"userid","type":["null","string"]}]}
42	                            avro.schema.url hdfs:///metadata/avro/tb_activity.avsc
43	                            bucket_count -1
44	                            columns actiontype,contentid,contenttype,device,serviceid,timestamp,userip,userid
45	                            columns.comments 
46	                            columns.types string:string:string:string:string:bigint:string:string
47	                            file.inputformat org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
48	                            file.outputformat org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
49	                            impala.lastComputeStatsTime 1559636056
50	                            location hdfs://nameservice1/etl/flume/tb_activity
51	                            name tdb.tb_activity
52	                            numRows 83020631
53	                            partition_columns ymd
54	                            partition_columns.types string
55	                            serialization.ddl struct tb_activity { string actiontype, string contentid, string contenttype, string device, string serviceid, i64 timestamp, string userip, string userid}
56	                            serialization.format 1
57	                            serialization.lib org.apache.hadoop.hive.serde2.avro.AvroSerDe
58	                            totalSize 6334562388
59	                            transient_lastDdlTime 1556875047
60	                          serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe
61	                          name: tdb.tb_activity
62	                      TotalFiles: 1
63	                      GatherStats: true
64	                      MultiFileSpray: false

"extras" field on avro schema have "default" property.
and other fields has no "default" property.

I have been doing avro schema changes in the past. The "extras" field was then added.

What is wrong?

HadoopHelp · ‎07-23-2019

Hi.

@avengers

this issue basically comes ,when you try to access columns from a parent table to child table and whatever you are trying to access columns that is not matching with your parents table at time of insertion.

so suggest you please check your parent table columns and child while trying to Create table or inserting data,loadinging data.

Thanks

HadoopHelp

avengers · ‎07-23-2019

HI

That is not query to insert from parent table to child table.

That is query to insert with same table.

I try to merge flume small files.

These are properties of the table.

I think, Some properties be reason.

I am finding a solution.

DO_NOT_UPDATE_STATS	true
EXTERNAL	TRUE
STATS_GENERATED	TASK
impala.lastComputeStatsTime	1559634875
numRows	3035760415
totalSize	315026055870
transient_lastDdlTime	1559548433

EricL · ‎07-24-2019

Hi,

what about if you try to explicitly specify the list of columns, rather than "*"?

Also, I am not sure if you read and write to the same table and partition is a good idea. Can you try to create a new table instead?

Cheers
Eric

avengers · ‎08-04-2019

Thanks.

I did find the solution.

The reason was relate impala compute.

I have been update avro schema 'extras' field.

I have been execute 'impala compute' command.

Maybe reason is that.

I did drop table my external table, And create table.

It is working.

cjervis · ‎08-06-2019

I'm happy to see you resolved your issue. Please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.

Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Cloudera Community

Support Questions

HiveException "Number of input columns was different..."