About avengers

avengers · ‎07-23-2019

HI That is not query to insert from parent table to child table. That is query to insert with same table. I try to merge flume small files. These are properties of the table. I think, Some properties be reason. I am finding a solution. DO_NOT_UPDATE_STATS true EXTERNAL TRUE STATS_GENERATED TASK impala.lastComputeStatsTime 1559634875 numRows 3035760415 totalSize 315026055870 transient_lastDdlTime 1559548433

avengers · ‎07-23-2019

cdh version is 6.1.0 1. Try merge small files. insert overwrite table tdb.tb_activity partition(ymd) select * from tdb.tb_activity where ymd = '2019-07-22'; 2. But, Exception raised. "UnknownReason Blacklisting behavior can be configured..." in HUE. And, Below is a spark container error log. Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Number of input columns was different than output columns (in = 9 vs out = 8 at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:805) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:146) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:484) ... 19 more Caused by: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Number of input columns was different than output columns (in = 9 vs out = 8 at org.apache.hadoop.hive.serde2.avro.AvroSerializer.serialize(AvroSerializer.java:75) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.serialize(AvroSerDe.java:212) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:725) ... 25 more 19/07/23 17:25:32 ERROR executor.Executor: Exception in task 4.0 in stage 0.0 (TID 2) 3. I found a strange in "explain query". line24 : expressions: ..... extras (type: string), .... line41: avro.schema.literal {"type":"record", ....... ,{"name":"extras","type":["null","string"],"default":null}, ..... line44: columns actiontype,contentid,contenttype,device,serviceid,timestamp,userip,userid No have "extras" field on line 44. 10 11 STAGE PLANS: 12 Stage: Stage-1 13 Spark 14 DagName: hive_20190723175708_8b93a3ff-d533-48cc-865e-6af87f576858:29 15 Vertices: 16 Map 1 17 Map Operator Tree: 18 TableScan 19 alias: tb_activity 20 filterExpr: (ymd = '2019-07-22') (type: boolean) 21 Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE 22 GatherStats: false 23 Select Operator 24 expressions: actiontype (type: string), contentid (type: string), contenttype (type: string), device (type: string), extras (type: string), serviceid (type: string), timestamp (type: bigint), userip (type: string), userid (type: string), '2019-07-22' (type: string) 25 outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9 26 Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE 27 File Output Operator 28 compressed: true 29 GlobalTableId: 1 30 directory: hdfs://nameservice1/etl/flume/tb_activity/.hive-staging_hive_2019-07-23_17-57-08_154_1733664706886256905-5/-ext-10002 31 NumFilesPerFileSink: 1 32 Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE 33 Stats Publishing Key Prefix: hdfs://nameservice1/etl/flume/tb_activity/.hive-staging_hive_2019-07-23_17-57-08_154_1733664706886256905-5/-ext-10000/ 34 table: 35 input format: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat 36 output format: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat 37 properties: 38 DO_NOT_UPDATE_STATS true 39 EXTERNAL TRUE 40 STATS_GENERATED TASK 41 avro.schema.literal {"type":"record","name":"Activity","namespace":"com.bigdata.avro","doc":"Schema for com.bigdata.avro.Activity","fields":[{"name":"actionType","type":["null","string"]},{"name":"contentId","type":["null","string"]},{"name":"contentType","type":["null","string"]},{"name":"device","type":["null","string"]},{"name":"extras","type":["null","string"],"default":null},{"name":"serviceId","type":["null","string"]},{"name":"timestamp","type":["null","long"]},{"name":"userIp","type":["null","string"]},{"name":"userid","type":["null","string"]}]} 42 avro.schema.url hdfs:///metadata/avro/tb_activity.avsc 43 bucket_count -1 44 columns actiontype,contentid,contenttype,device,serviceid,timestamp,userip,userid 45 columns.comments 46 columns.types string:string:string:string:string:bigint:string:string 47 file.inputformat org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat 48 file.outputformat org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat 49 impala.lastComputeStatsTime 1559636056 50 location hdfs://nameservice1/etl/flume/tb_activity 51 name tdb.tb_activity 52 numRows 83020631 53 partition_columns ymd 54 partition_columns.types string 55 serialization.ddl struct tb_activity { string actiontype, string contentid, string contenttype, string device, string serviceid, i64 timestamp, string userip, string userid} 56 serialization.format 1 57 serialization.lib org.apache.hadoop.hive.serde2.avro.AvroSerDe 58 totalSize 6334562388 59 transient_lastDdlTime 1556875047 60 serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe 61 name: tdb.tb_activity 62 TotalFiles: 1 63 GatherStats: true 64 MultiFileSpray: false "extras" field on avro schema have "default" property. and other fields has no "default" property. I have been doing avro schema changes in the past. The "extras" field was then added. What is wrong?

avengers · ‎06-30-2019

I use hive on spark. I made UDF. Jar file name is 'hive-udf-20190701.jar' I set hive configuration(Hive Service Advanced Cofniguration snipped(Safety Value) for hive-site.xml. hive.reloadable.aux.jars.path /usr/local/bigdata/hive-udf I upload jar file to HiveServer2 filesystem directory. /usr/local/bigdata/hive-udf-20190701.jar I create function on Hue. reload; drop temporary function udf_map_tojson; create temporary function udf_map_tojson as 'bigdata.hive.udf.MapToJsonString'; I test UDF. select udf_map_tojson(str_to_map("k1:v1,k2:v2")); But, Exception raise. Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed due to: Job aborted due to stage failure: Aborting TaskSet 3.0 because task 0 (partition 0) cannot run anywhere due to node and executor blacklist. Most recent failure: Lost task 0.0 in stage 3.0 (TID 3, worker09.example.com, executor 1): UnknownReason Blacklisting behavior can be configured via spark.blacklist.*. What am I wrong?

avengers · ‎03-31-2019

Thank you for testing. Have you ever submitted a workflow? Workflow still dosen't work. Look at this. first) saved HQL document.It contained test parameter '76710'. http://demo.gethue.com/hue/editor?editor=292133 hql docuement second) click the "submit" button. I setted '92801' value of parameter. and passed it. param But HQL results' zipcode is '76710'. The results' will be empty, If I save the HQL document no parameter value. I have tested many case. I think that problem is HQL because It haven't variable. please look at this capture. This capture is oozie log. hql Oozie runed hql query no have variable "$zip". It is very necessary function for me. Please help me to find the soluction.

avengers · ‎03-23-2019

http://gethue.com/drag-drop-saved-hive-queries-into-your-workflows/#comment-78368 Did it not working at currrent version? ex, Hue4.3 etc.. Parameter Name is 'zip' when submit workflow. It's not 'zip_code' Input From. this like. I expected to look like this image. But, It didn't working. Please. Could you check my demo workflow. http://demo.gethue.com/hue/oozie/editor/workflow/edit/?workflow=292135 I 'am use cloudera cdh 6.1.0.

avengers · ‎01-28-2019

Thanks. I'll try it the way you told me.

avengers · ‎01-16-2019

I want to create a table with the complex type removed from the avro data in the same schema. This is because Impala does not skipping complex types. Platform is CDH 6.0.1 For Example : Employee(raw data) - name : string - age : int - additional-info : map<string, string> Employee(Hive table 1) - name : string - age : int - additional-info : map<string, string> Employee_For_Implala(Hive table 2) - name : string - age : int Pipeline : KafkaProducer(Avro Bytes) - Kafka - Flume - HDFS - Hive(Impala) Flume : KafkaSource - Channel - Sink(AvroEventSerializer$Builder) I tried changing the sink(serializer.schemaURL, remove Complex type field) but it failed. I am trying to use morphine now. But this is also failing. Is there a better way?

avengers · ‎01-16-2019

My CDH cluster version is cdh6.0.1. services is kafka, hdfs, hive, impala, hue. I tested it in the following order. 1. set idele timeout value cloudera manager > impala > snippet(safe value) -idle_query_timeout=30 -idle_session_timeout=120 2. check timeout value. cloudera manager > impala > instance > impala daemon web ui > http://[impalad host]:25000/varz idle_session_timeout value setted 120 3. monitoring server's tcp connections ssh connect to impalad host $ watch -n 1 -d 'netstat -anpt | grep 21050' 4. java client application jdbc Connection connection = DriverManager.getconnection("jdbc:impala://...."); Statement statement = connection.createStament(); System.exit(0); 5. impalad server. establishe tcp connection The connection will not be terminated after multi minutes. 6. check impala daemon web ui can see tcp connection in web ui. impala daemon web ui > http://[impalad host]:25000/sessions do action 'close' However, the tcp connection is still established.

avengers · ‎01-16-2019

My CDH cluster version is cdh6.0.1. services is kafka, hdfs, hive, impala, hue. I tested it in the following order. 1. set idele timeout value cloudera manager > impala > snippet(safe value) -idle_query_timeout=30 -idle_session_timeout=120 2. check timeout value. cloudera manager > impala > instance > impala daemon web ui > http://[impalad host].example.com:25000/varz idle_session_timeout value setted 120 3. monitoring server's tcp connections ssh connect to impalad host $ watch -n 1 -d 'netstat -anpt | grep 21050' 4. java client application jdbc Connection connection = DriverManager.getconnection("jdbc:impala://...."); Statement statement = connection.createStament(); System.exit(0); 5. impalad server. establishe tcp connection The connection will not be terminated after multi minutes. 6. check impala daemon web ui can see tcp connection in web ui. impala daemon web ui > http://[impalad host].example.com:25000/sessions do action 'close' However, the tcp connection is still established.

avengers · ‎01-14-2019

If the client application exits before the impala connection is terminated Impala hosts have zombie tcp connections. step 1 : connection to impalad using jdbc step 2 : query step 3 : application shutdown without jdbc connection closed I closed it on the web ui(<impalad host>:25000/sessions ) of the impalad daemon, but the tcp connection does not disappear. I set 'idle_session_timeout' but tcp connection does not disappear.

Online	Offline
Last Visited	‎03-06-2021 06:11 AM

Member Since	‎08-23-2018 05:58 PM
Last Visited	‎03-06-2021 06:11 AM
Posts	36
Kudos received	1

Cloudera Community

Re: HiveException "Number of input columns was dif...

HiveException "Number of input columns was differe...

How can I use UDF at Hive on Spark?

Re: How can I pass HQL parameter in Oozie workflow...

How can I pass HQL parameter in Oozie workflow usi...

Re: flume kafkasource, hdfs sink remove avro field

flume kafkasource, hdfs sink remove avro field

Re: How can I handle zombie tcp connections?

Re: How can I handle zombie tcp connections?

How can I handle zombie tcp connections?