Member since
08-23-2018
36
Posts
1
Kudos Received
0
Solutions
01-21-2021
02:35 AM
The java heap size of the catalog server in my cluster is 16G. Mem_rss metric of impala catalogd is 15G. What is the mem_rss metric of impala catalogd? If the mem_rss metric is high, should I increase the Java heap size? Version is CDH 5.14
... View more
Labels:
11-17-2020
05:21 PM
Thanks to your answer. I have read documents in your answer. How cat I disable Self-Signed Certificate verify? Like this : [hadoop]
[[hdfs_clusters]]
[[[default]]]
ssl_cert_ca_verify=False
[[yarn_clusters]]
[[[default]]]
ssl_cert_ca_verify=False
... View more
11-16-2020
10:54 PM
When I click impala query profile, Hue raise error. Exception cause Self-signed certificate. I am finding the solution. bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)" Current configuration in my cluster : Impala : Enable TLS/SSL for Impala : disabled
SSL/TLS Certificate for Impala component Webserver : Daemon, Catalog, Statestore enabled I want to set PEM file path. Where are Hue - [Impala] - [[ssl]] properties used?
... View more
Labels:
11-15-2020
05:48 PM
Questions about how to configurate Hue Configuring TLS/SSL for Impala [impala]
# Port of the Impala Server
## server_port=21050
# URL of the Impala Coordinator Server.
## coordinator_url=localhost:25000
[[ssl]]
# SSL communication enabled for this server.
## enabled=false
# Path to Certificate Authority certificates.
## cacerts=/etc/hue/cacerts.pem
# Choose whether Hue should validate certificates received from the server.
## validate=true Do Imapala TLS/SSL settings in Hue apply to both 21050 and 25000 communications? Impala in my cluster, Only 'SSL/TLS Certificate for Impala component Webserver' is configured. https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/impala_ssl.html#concept_gnk_2tt_qp Port 25000 uses SSL. Port 21050 does not use SSL. Would the connection to port 21050 fail If I set ssl for Imapala in hue. When I click impala query profile, Hue raise error. I am finding the solution. bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)"
... View more
Labels:
06-23-2020
04:14 AM
https://phoenix.apache.org/tracing.html
CDH 5.14.2
Hi. I try to setting Apache Phoenix tracing server.
But SYSTEM.TRACING_STATS table is empty.
I think Phoenix enabled tracing server.
This is Hbase log.
2020-06-23 19:32:37,363 INFO org.apache.phoenix.trace.PhoenixMetricsSink: Writing tracing metrics to phoenix table
2020-06-23 19:32:37,365 INFO org.apache.phoenix.trace.PhoenixMetricsSink: Phoenix tracing writer started
2020-06-23 19:32:37,377 INFO org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Sink tracing started
I am set property in client code.
properties.setProperty("phoenix.trace.frequency", "always");
Can you give me a solution? or Can you give me Guide how to set phoenix tracing server?
... View more
- Tags:
- cdh-5.14.2
- Phoenix
12-09-2019
05:11 PM
Thank for the answer.
... View more
12-09-2019
03:02 AM
https://docs.cloudera.com/documentation/enterprise/6/6.1/topics/admin_ha_hiveserver2.html#concept_u4b_c5d_wv My cluster version is Cloudera CDH 6.1 Express Edition. I try set Hive HA. I added new instance. And I set the address on HiveServer2 Load Balancer property. But Load Balancer server didn't open proxy port. Doesn't Managed cluster start proxy server? Do I have to configure proxy? Managed cluster means Enterprise Edition? If that is, Can I use L4 instead of haproxy?
... View more
Labels:
12-04-2019
09:36 PM
I am using Cloudera CDH 6.1.
There are services Flume, HDFS, Hive, Hue, Impala, Kafka, Oozie, Spark, Yarn and Zookeepr.
And I am using MySQL.
There are databases amon, hue, metastore, oozie, rman, scm.
I think some databases like rman and amon are not being used.
So, my question.
I have to move the database another rack.
But I want to continue saving data. ( kafka - flume - hdfs pipeline )
Can you give me a manual?
I think only some sesrvices are using the database.
Maybe I only stop them, I don't have to stop the other services in cluster.
After only stop the services as Hive, Hue, Impala, Oozie and Spark, can I save datas into the cluster?
Is Cluster good during the database be moved?
... View more
- Tags:
- cloudera manager
- CM
11-06-2019
10:13 PM
Thanks. However, I have already read them. I'am already connecting to Hive from Zeppelin using JDBC. I want to query Hive Table with SparkSQL. And I'm wondering if the metastore won't crash if I use it in a Cluster using HiveOnSpark. For example. %spark
val df = spark.read.format("csv").option("header", "true")
.option("inferSchema", "true").load("/somefile.csv")
df.createOrReplaceTempView("csvTable");
%spark.sql
select *
from csvTable lt
join hiveTable rt
on lt.col = rt.col
... View more
11-05-2019
10:32 PM
I am using CDH 6.1.1 Cluster. Cluster is configured to use Spark as the execution engine for Hive. Is there anything wrong with using SparkSQL on this Cluster? Is it ok to create Hive Tables and change data using SparkSQL? Since SparkSQL uses the Hive Metastore, I suspect that there may be a conflict between SparkSQL and Hive on Spark. In addition, please refer to documentation on how to intergrate Cloudera CDH Hive with Apache Zeppelin's Spark interpreter. Thank you.
... View more
Labels:
08-04-2019
09:01 PM
Thanks. I did find the solution. The reason was relate impala compute. I have been update avro schema 'extras' field. I have been execute 'impala compute' command. Maybe reason is that. I did drop table my external table, And create table. It is working.
... View more
08-04-2019
08:54 PM
1 Kudo
Thanks you. Yes, It is not problem of UDF. I have 2 HiveServer2 server host. I have been register udf only 1 server host. Maybe. It is reason.
... View more
07-23-2019
06:21 PM
HI That is not query to insert from parent table to child table. That is query to insert with same table. I try to merge flume small files. These are properties of the table. I think, Some properties be reason. I am finding a solution. DO_NOT_UPDATE_STATS true
EXTERNAL TRUE
STATS_GENERATED TASK
impala.lastComputeStatsTime 1559634875
numRows 3035760415
totalSize 315026055870
transient_lastDdlTime 1559548433
... View more
07-23-2019
02:40 AM
cdh version is 6.1.0 1. Try merge small files. insert overwrite table tdb.tb_activity partition(ymd)
select * from tdb.tb_activity where ymd = '2019-07-22'; 2. But, Exception raised. "UnknownReason Blacklisting behavior can be configured..." in HUE. And, Below is a spark container error log. Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Number of input columns was different than output columns (in = 9 vs out = 8
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:805)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:146)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:484)
... 19 more
Caused by: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Number of input columns was different than output columns (in = 9 vs out = 8
at org.apache.hadoop.hive.serde2.avro.AvroSerializer.serialize(AvroSerializer.java:75)
at org.apache.hadoop.hive.serde2.avro.AvroSerDe.serialize(AvroSerDe.java:212)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:725)
... 25 more
19/07/23 17:25:32 ERROR executor.Executor: Exception in task 4.0 in stage 0.0 (TID 2) 3. I found a strange in "explain query". line24 : expressions: ..... extras (type: string), .... line41: avro.schema.literal {"type":"record", ....... ,{"name":"extras","type":["null","string"],"default":null}, ..... line44: columns actiontype,contentid,contenttype,device,serviceid,timestamp,userip,userid No have "extras" field on line 44. 10
11 STAGE PLANS:
12 Stage: Stage-1
13 Spark
14 DagName: hive_20190723175708_8b93a3ff-d533-48cc-865e-6af87f576858:29
15 Vertices:
16 Map 1
17 Map Operator Tree:
18 TableScan
19 alias: tb_activity
20 filterExpr: (ymd = '2019-07-22') (type: boolean)
21 Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE
22 GatherStats: false
23 Select Operator
24 expressions: actiontype (type: string), contentid (type: string), contenttype (type: string), device (type: string), extras (type: string), serviceid (type: string), timestamp (type: bigint), userip (type: string), userid (type: string), '2019-07-22' (type: string)
25 outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9
26 Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE
27 File Output Operator
28 compressed: true
29 GlobalTableId: 1
30 directory: hdfs://nameservice1/etl/flume/tb_activity/.hive-staging_hive_2019-07-23_17-57-08_154_1733664706886256905-5/-ext-10002
31 NumFilesPerFileSink: 1
32 Statistics: Num rows: 16429 Data size: 13275304 Basic stats: COMPLETE Column stats: NONE
33 Stats Publishing Key Prefix: hdfs://nameservice1/etl/flume/tb_activity/.hive-staging_hive_2019-07-23_17-57-08_154_1733664706886256905-5/-ext-10000/
34 table:
35 input format: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
36 output format: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
37 properties:
38 DO_NOT_UPDATE_STATS true
39 EXTERNAL TRUE
40 STATS_GENERATED TASK
41 avro.schema.literal {"type":"record","name":"Activity","namespace":"com.bigdata.avro","doc":"Schema for com.bigdata.avro.Activity","fields":[{"name":"actionType","type":["null","string"]},{"name":"contentId","type":["null","string"]},{"name":"contentType","type":["null","string"]},{"name":"device","type":["null","string"]},{"name":"extras","type":["null","string"],"default":null},{"name":"serviceId","type":["null","string"]},{"name":"timestamp","type":["null","long"]},{"name":"userIp","type":["null","string"]},{"name":"userid","type":["null","string"]}]}
42 avro.schema.url hdfs:///metadata/avro/tb_activity.avsc
43 bucket_count -1
44 columns actiontype,contentid,contenttype,device,serviceid,timestamp,userip,userid
45 columns.comments
46 columns.types string:string:string:string:string:bigint:string:string
47 file.inputformat org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
48 file.outputformat org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
49 impala.lastComputeStatsTime 1559636056
50 location hdfs://nameservice1/etl/flume/tb_activity
51 name tdb.tb_activity
52 numRows 83020631
53 partition_columns ymd
54 partition_columns.types string
55 serialization.ddl struct tb_activity { string actiontype, string contentid, string contenttype, string device, string serviceid, i64 timestamp, string userip, string userid}
56 serialization.format 1
57 serialization.lib org.apache.hadoop.hive.serde2.avro.AvroSerDe
58 totalSize 6334562388
59 transient_lastDdlTime 1556875047
60 serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe
61 name: tdb.tb_activity
62 TotalFiles: 1
63 GatherStats: true
64 MultiFileSpray: false
"extras" field on avro schema have "default" property. and other fields has no "default" property. I have been doing avro schema changes in the past. The "extras" field was then added. What is wrong?
... View more
Labels:
06-30-2019
08:50 PM
I use hive on spark. I made UDF. Jar file name is 'hive-udf-20190701.jar' I set hive configuration(Hive Service Advanced Cofniguration snipped(Safety Value) for hive-site.xml. hive.reloadable.aux.jars.path
/usr/local/bigdata/hive-udf I upload jar file to HiveServer2 filesystem directory. /usr/local/bigdata/hive-udf-20190701.jar I create function on Hue. reload;
drop temporary function udf_map_tojson;
create temporary function udf_map_tojson as 'bigdata.hive.udf.MapToJsonString'; I test UDF. select udf_map_tojson(str_to_map("k1:v1,k2:v2")); But, Exception raise. Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed due to: Job aborted due to stage failure: Aborting TaskSet 3.0 because task 0 (partition 0) cannot run anywhere due to node and executor blacklist. Most recent failure: Lost task 0.0 in stage 3.0 (TID 3, worker09.example.com, executor 1): UnknownReason Blacklisting behavior can be configured via spark.blacklist.*. What am I wrong?
... View more
Labels:
03-31-2019
06:50 PM
Thank you for testing. Have you ever submitted a workflow? Workflow still dosen't work. Look at this. first) saved HQL document.It contained test parameter '76710'. http://demo.gethue.com/hue/editor?editor=292133 hql docuement second) click the "submit" button. I setted '92801' value of parameter. and passed it. param But HQL results' zipcode is '76710'. The results' will be empty, If I save the HQL document no parameter value. I have tested many case. I think that problem is HQL because It haven't variable. please look at this capture. This capture is oozie log. hql Oozie runed hql query no have variable "$zip". It is very necessary function for me. Please help me to find the soluction.
... View more
03-23-2019
07:26 AM
http://gethue.com/drag-drop-saved-hive-queries-into-your-workflows/#comment-78368 Did it not working at currrent version? ex, Hue4.3 etc.. Parameter Name is 'zip' when submit workflow. It's not 'zip_code' Input From. this like. I expected to look like this image. But, It didn't working. Please. Could you check my demo workflow. http://demo.gethue.com/hue/oozie/editor/workflow/edit/?workflow=292135 I 'am use cloudera cdh 6.1.0.
... View more
01-28-2019
05:33 PM
Thanks. I'll try it the way you told me.
... View more
01-16-2019
11:16 PM
I want to create a table with the complex type removed from the avro data in the same schema. This is because Impala does not skipping complex types. Platform is CDH 6.0.1
For Example :
Employee(raw data)
- name : string
- age : int
- additional-info : map<string, string>
Employee(Hive table 1)
- name : string
- age : int
- additional-info : map<string, string>
Employee_For_Implala(Hive table 2)
- name : string
- age : int
Pipeline :
KafkaProducer(Avro Bytes) - Kafka - Flume - HDFS - Hive(Impala)
Flume : KafkaSource - Channel - Sink(AvroEventSerializer$Builder)
I tried changing the sink(serializer.schemaURL, remove Complex type field) but it failed.
I am trying to use morphine now. But this is also failing.
Is there a better way?
... View more
01-16-2019
06:40 PM
My CDH cluster version is cdh6.0.1. services is kafka, hdfs, hive, impala, hue. I tested it in the following order. 1. set idele timeout value cloudera manager > impala > snippet(safe value) -idle_query_timeout=30 -idle_session_timeout=120 2. check timeout value. cloudera manager > impala > instance > impala daemon web ui > http://[impalad host]:25000/varz idle_session_timeout value setted 120 3. monitoring server's tcp connections ssh connect to impalad host $ watch -n 1 -d 'netstat -anpt | grep 21050' 4. java client application jdbc Connection connection = DriverManager.getconnection("jdbc:impala://...."); Statement statement = connection.createStament(); System.exit(0); 5. impalad server. establishe tcp connection The connection will not be terminated after multi minutes. 6. check impala daemon web ui can see tcp connection in web ui. impala daemon web ui > http://[impalad host]:25000/sessions do action 'close' However, the tcp connection is still established.
... View more
01-16-2019
05:25 PM
My CDH cluster version is cdh6.0.1. services is kafka, hdfs, hive, impala, hue. I tested it in the following order. 1. set idele timeout value cloudera manager > impala > snippet(safe value) -idle_query_timeout=30 -idle_session_timeout=120 2. check timeout value. cloudera manager > impala > instance > impala daemon web ui > http://[impalad host].example.com:25000/varz idle_session_timeout value setted 120 3. monitoring server's tcp connections ssh connect to impalad host $ watch -n 1 -d 'netstat -anpt | grep 21050' 4. java client application jdbc Connection connection = DriverManager.getconnection("jdbc:impala://....");
Statement statement = connection.createStament();
System.exit(0); 5. impalad server. establishe tcp connection The connection will not be terminated after multi minutes. 6. check impala daemon web ui can see tcp connection in web ui. impala daemon web ui > http://[impalad host].example.com:25000/sessions do action 'close' However, the tcp connection is still established.
... View more
01-14-2019
09:05 PM
If the client application exits before the impala connection is terminated Impala hosts have zombie tcp connections. step 1 : connection to impalad using jdbc step 2 : query step 3 : application shutdown without jdbc connection closed I closed it on the web ui(<impalad host>:25000/sessions ) of the impalad daemon, but the tcp connection does not disappear. I set 'idle_session_timeout' but tcp connection does not disappear.
... View more
Labels:
12-24-2018
02:01 AM
The data that I collect contains complex types and should guarantee a response time of less than 5 seconds.
I might use Hbase, but I want to use Impala.
I know that Impala does not support complex types.
What I want is for Impala to skipping complex types.
As a result of my checking, Impala skipping a complex type in a parqut format file.
How do I write a parquert format file to hdfs, hive, impala etc.?
Can I write a parquert file using Flume, Morphline, etc.?
My system's data collection flow is as follows.
Kafka -> Flume -> hdfs(avro file) -> hive
... View more
12-05-2018
07:41 PM
First of all thank you for your answer. I decided to set the WorkerNode's memory to 128G. In the picture of my question, the Utility is a Host( 1 ea ) that is deployed with ClouderaManager, Zookeeper, and Journalnode. In this question, What I want to know is how many Hosts I need. I am ahead of hardware purchases and I have to decide how many Hosts I should buy. https://www.cloudera.com/documentation/enterprise/latest/topics/admin_ha.html According to this document, Gateway (Hue, Hive server2) and Utility (Oozie, Cloudera Manager Server) can be HA configuration. I am planning one Utility Host and two Master Host. I do not want to allocate an independent Host for the Hue, HiveServer2, and Oozie services. So what I'm considering is putting them in the Master or Worker.
... View more
12-04-2018
09:40 PM
CDH 6.0.1 I want to build 10 Worker Hosts with High Availability. I referenced the following guide documents. https://www.cloudera.com/documentation/enterprise/6/latest/topics/cm_ig_host_allocations.html https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_hardware_requirements.html http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/ My cluster services Kafka, HDFS, YARN, Hive on Spark, Hue, Oozie, Spark Streamnig, Sqoop and Impala. I will deploy the host as shown below. Gateway. Can I place the Gateway on the Worker host? Utility. I understand about HA in Namenode, but I have not yet learned HA for other services. When I configure HA for all other services in the cluster, how many additional Hosts do I need? For example, services such as cloudera manager, Yarn, Hive, Hue, Spark, etc. Hard Requirement. I collect 90 GB of text log per day. My cluster collects, ETLs, or aggregates (counts) text logs of 90 GB per day. The cluster handles about 10 workflows. If 32G of memory is allocated to the MasterNode and 64G of memory is used for the WorkNode, is the memory adequate? I wait for your advice.
... View more
Labels:
11-28-2018
02:07 AM
Saved avro file's(raw data files, collect by flume) schema contains map type.
I want to create table using it.
I know Impala not support map type.
https://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_avro.html#avro_map_table
So, I removed the map type from the avro schema.( map type column remove )
Avro files still has map type data.
And, I create Hive and Impala table.
And, add partitions.
Hive table have rows.
But Impala no have rows.
Impala table query raise exception(complex type).
How do I create an impala table in an avro file(raw data) that already has a map type?
... View more
10-07-2018
08:44 PM
My hbase version is 0.92.1-cdh4.0.1. I do not know the history of the past, but the hbase version of one server is 0.94.15-cdh4.7.0. So I tried to decommission the 0.94.15-cdh4.7.0 version of the server. However, decommission failed. Cloudera Manager can see the failure, but can not determine the cause. master server, region server The log does not contain a log of failure. --------------------- Cloudera Manager > hbase > Commands Command Details: Decommission "Failed to decommission Region Servers" Toggle Balancer - Successfully toggled balancer Decommission - Command completed with 0/1 successful subcommands. Toggle Balancer - Successfully toggled balancer ------------------------------------ s there a way I can solve this problem?
... View more
Labels:
09-27-2018
12:23 AM
I success delete .archive directory. However, after the deletion, the .archive directory capacity increases continuously. The 192.16.1.150 server is one server with a different version(0.94.15+114). I think the version is due to one other region server (0.94.15 + 114). Is there an option to disable the .archive directory in this version?
... View more
09-06-2018
01:16 AM
In addition, I have seen logs from the hadoop hdfs audit log to access the .archive directory. ... ugi=hbase(auth:SIMPLE) ip=/192.16.1.150 cmd=mkdir src=/hbase/.archive/tableName/99082b8b557... ... ugi=hbase(auth:SIMPLE) ip=/192.16.1.150 cmd=rename src=/hbase/tableName/99082b8b557... dst=/hbase/.archive/tableName/99082b8b557... The 192.16.1.150 server is one server with a different version(0.94.15+114). ----------------------------- This is the result of running the list_snapshots command in the hbase shell of 192.16.1.150 as you said. hbase(main):001:0> list_snapshots SNAPSHOT TABLE + CREATION TIME ERROR: java.io.IOException: java.io.IOException: java.lang.NoSuchMethodException: org.apache.hadoop.hbase.ipc.HMasterInterface.listSnapshots() at java.lang.Class.getMethod(Class.java:1605) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:334) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1336) Here is some help for this command: List all snapshots taken (by printing the names and relative information). Optional regular expression parameter could be used to filter the output by snapshot name. Examples: hbase> list_snapshots hbase> list_snapshots 'abc.*'
... View more