Member since
08-23-2018
36
Posts
1
Kudos Received
0
Solutions
07-23-2019
06:21 PM
HI That is not query to insert from parent table to child table. That is query to insert with same table. I try to merge flume small files. These are properties of the table. I think, Some properties be reason. I am finding a solution. DO_NOT_UPDATE_STATS true
EXTERNAL TRUE
STATS_GENERATED TASK
impala.lastComputeStatsTime 1559634875
numRows 3035760415
totalSize 315026055870
transient_lastDdlTime 1559548433
... View more
07-23-2019
02:40 AM
cdh version is 6.1.0 1. Try merge small files. insert overwrite table tdb.tb_activity partition(ymd)
select * from tdb.tb_activity where ymd = '2019-07-22'; 2. But, Exception raised. "UnknownReason Blacklisting behavior can be configured..." in HUE. And, Below is a spark container error log. Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Number of input columns was different than output columns (in = 9 vs out = 8
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:805)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:146)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:484)
... 19 more
Caused by: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Number of input columns was different than output columns (in = 9 vs out = 8
at org.apache.hadoop.hive.serde2.avro.AvroSerializer.serialize(AvroSerializer.java:75)
at org.apache.hadoop.hive.serde2.avro.AvroSerDe.serialize(AvroSerDe.java:212)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:725)
... 25 more
19/07/23 17:25:32 ERROR executor.Executor: Exception in task 4.0 in stage 0.0 (TID 2) 3. I found a strange in "explain query". line24 : expressions: ..... extras (type: string), .... line41: avro.schema.literal {"type":"record", ....... ,{"name":"extras","type":["null","string"],"default":null}, ..... line44: columns actiontype,contentid,contenttype,device,serviceid,timestamp,userip,userid No have "extras" field on line 44. 10
11 STAGE PLANS:
12 Stage: Stage-1
13 Spark
14 DagName: hive_20190723175708_8b93a3ff-d533-48cc-865e-6af87f576858:29
15 Vertices:
16 Map 1
17 Map Operator Tree:
18 TableScan
19 alias: tb_activity
20 filterExpr: (ymd = '2019-07-22') (type: boolean)
21 Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE
22 GatherStats: false
23 Select Operator
24 expressions: actiontype (type: string), contentid (type: string), contenttype (type: string), device (type: string), extras (type: string), serviceid (type: string), timestamp (type: bigint), userip (type: string), userid (type: string), '2019-07-22' (type: string)
25 outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9
26 Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE
27 File Output Operator
28 compressed: true
29 GlobalTableId: 1
30 directory: hdfs://nameservice1/etl/flume/tb_activity/.hive-staging_hive_2019-07-23_17-57-08_154_1733664706886256905-5/-ext-10002
31 NumFilesPerFileSink: 1
32 Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE
33 Stats Publishing Key Prefix: hdfs://nameservice1/etl/flume/tb_activity/.hive-staging_hive_2019-07-23_17-57-08_154_1733664706886256905-5/-ext-10000/
34 table:
35 input format: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
36 output format: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
37 properties:
38 DO_NOT_UPDATE_STATS true
39 EXTERNAL TRUE
40 STATS_GENERATED TASK
41 avro.schema.literal {"type":"record","name":"Activity","namespace":"com.bigdata.avro","doc":"Schema for com.bigdata.avro.Activity","fields":[{"name":"actionType","type":["null","string"]},{"name":"contentId","type":["null","string"]},{"name":"contentType","type":["null","string"]},{"name":"device","type":["null","string"]},{"name":"extras","type":["null","string"],"default":null},{"name":"serviceId","type":["null","string"]},{"name":"timestamp","type":["null","long"]},{"name":"userIp","type":["null","string"]},{"name":"userid","type":["null","string"]}]}
42 avro.schema.url hdfs:///metadata/avro/tb_activity.avsc
43 bucket_count -1
44 columns actiontype,contentid,contenttype,device,serviceid,timestamp,userip,userid
45 columns.comments
46 columns.types string:string:string:string:string:bigint:string:string
47 file.inputformat org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
48 file.outputformat org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
49 impala.lastComputeStatsTime 1559636056
50 location hdfs://nameservice1/etl/flume/tb_activity
51 name tdb.tb_activity
52 numRows 83020631
53 partition_columns ymd
54 partition_columns.types string
55 serialization.ddl struct tb_activity { string actiontype, string contentid, string contenttype, string device, string serviceid, i64 timestamp, string userip, string userid}
56 serialization.format 1
57 serialization.lib org.apache.hadoop.hive.serde2.avro.AvroSerDe
58 totalSize 6334562388
59 transient_lastDdlTime 1556875047
60 serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe
61 name: tdb.tb_activity
62 TotalFiles: 1
63 GatherStats: true
64 MultiFileSpray: false
"extras" field on avro schema have "default" property. and other fields has no "default" property. I have been doing avro schema changes in the past. The "extras" field was then added. What is wrong?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
06-30-2019
08:50 PM
I use hive on spark. I made UDF. Jar file name is 'hive-udf-20190701.jar' I set hive configuration(Hive Service Advanced Cofniguration snipped(Safety Value) for hive-site.xml. hive.reloadable.aux.jars.path
/usr/local/bigdata/hive-udf I upload jar file to HiveServer2 filesystem directory. /usr/local/bigdata/hive-udf-20190701.jar I create function on Hue. reload;
drop temporary function udf_map_tojson;
create temporary function udf_map_tojson as 'bigdata.hive.udf.MapToJsonString'; I test UDF. select udf_map_tojson(str_to_map("k1:v1,k2:v2")); But, Exception raise. Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed due to: Job aborted due to stage failure: Aborting TaskSet 3.0 because task 0 (partition 0) cannot run anywhere due to node and executor blacklist. Most recent failure: Lost task 0.0 in stage 3.0 (TID 3, worker09.example.com, executor 1): UnknownReason Blacklisting behavior can be configured via spark.blacklist.*. What am I wrong?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
03-31-2019
06:50 PM
Thank you for testing. Have you ever submitted a workflow? Workflow still dosen't work. Look at this. first) saved HQL document.It contained test parameter '76710'. http://demo.gethue.com/hue/editor?editor=292133 hql docuement second) click the "submit" button. I setted '92801' value of parameter. and passed it. param But HQL results' zipcode is '76710'. The results' will be empty, If I save the HQL document no parameter value. I have tested many case. I think that problem is HQL because It haven't variable. please look at this capture. This capture is oozie log. hql Oozie runed hql query no have variable "$zip". It is very necessary function for me. Please help me to find the soluction.
... View more
03-23-2019
07:26 AM
http://gethue.com/drag-drop-saved-hive-queries-into-your-workflows/#comment-78368 Did it not working at currrent version? ex, Hue4.3 etc.. Parameter Name is 'zip' when submit workflow. It's not 'zip_code' Input From. this like. I expected to look like this image. But, It didn't working. Please. Could you check my demo workflow. http://demo.gethue.com/hue/oozie/editor/workflow/edit/?workflow=292135 I 'am use cloudera cdh 6.1.0.
... View more
Labels:
- Labels:
-
Apache Hive
-
Cloudera Hue
01-28-2019
05:33 PM
Thanks. I'll try it the way you told me.
... View more
01-16-2019
11:16 PM
I want to create a table with the complex type removed from the avro data in the same schema. This is because Impala does not skipping complex types. Platform is CDH 6.0.1
For Example :
Employee(raw data)
- name : string
- age : int
- additional-info : map<string, string>
Employee(Hive table 1)
- name : string
- age : int
- additional-info : map<string, string>
Employee_For_Implala(Hive table 2)
- name : string
- age : int
Pipeline :
KafkaProducer(Avro Bytes) - Kafka - Flume - HDFS - Hive(Impala)
Flume : KafkaSource - Channel - Sink(AvroEventSerializer$Builder)
I tried changing the sink(serializer.schemaURL, remove Complex type field) but it failed.
I am trying to use morphine now. But this is also failing.
Is there a better way?
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Kafka
-
HDFS
01-16-2019
06:40 PM
My CDH cluster version is cdh6.0.1. services is kafka, hdfs, hive, impala, hue. I tested it in the following order. 1. set idele timeout value cloudera manager > impala > snippet(safe value) -idle_query_timeout=30 -idle_session_timeout=120 2. check timeout value. cloudera manager > impala > instance > impala daemon web ui > http://[impalad host]:25000/varz idle_session_timeout value setted 120 3. monitoring server's tcp connections ssh connect to impalad host $ watch -n 1 -d 'netstat -anpt | grep 21050' 4. java client application jdbc Connection connection = DriverManager.getconnection("jdbc:impala://...."); Statement statement = connection.createStament(); System.exit(0); 5. impalad server. establishe tcp connection The connection will not be terminated after multi minutes. 6. check impala daemon web ui can see tcp connection in web ui. impala daemon web ui > http://[impalad host]:25000/sessions do action 'close' However, the tcp connection is still established.
... View more
01-16-2019
05:25 PM
My CDH cluster version is cdh6.0.1. services is kafka, hdfs, hive, impala, hue. I tested it in the following order. 1. set idele timeout value cloudera manager > impala > snippet(safe value) -idle_query_timeout=30 -idle_session_timeout=120 2. check timeout value. cloudera manager > impala > instance > impala daemon web ui > http://[impalad host].example.com:25000/varz idle_session_timeout value setted 120 3. monitoring server's tcp connections ssh connect to impalad host $ watch -n 1 -d 'netstat -anpt | grep 21050' 4. java client application jdbc Connection connection = DriverManager.getconnection("jdbc:impala://....");
Statement statement = connection.createStament();
System.exit(0); 5. impalad server. establishe tcp connection The connection will not be terminated after multi minutes. 6. check impala daemon web ui can see tcp connection in web ui. impala daemon web ui > http://[impalad host].example.com:25000/sessions do action 'close' However, the tcp connection is still established.
... View more
01-14-2019
09:05 PM
If the client application exits before the impala connection is terminated Impala hosts have zombie tcp connections. step 1 : connection to impalad using jdbc step 2 : query step 3 : application shutdown without jdbc connection closed I closed it on the web ui(<impalad host>:25000/sessions ) of the impalad daemon, but the tcp connection does not disappear. I set 'idle_session_timeout' but tcp connection does not disappear.
... View more
Labels:
- Labels:
-
Apache Impala
- « Previous
-
- 1
- 2
- Next »