Member since
10-21-2018
12
Posts
0
Kudos Received
0
Solutions
06-19-2020
08:56 AM
I am new in CDH cluster setup. I have CDH 6.3.2 with HA enabled. Total 3+5 nodes cluster(3 masters and 5 data nodes) Zookeeper is configured on 3 instance. From last 5 days we received alert from ZOOKEEPER_CANARY_HEALTH
1. The health test result for ZOOKEEPER_CANARY_HEALTH has become bad: Canary test failed to create an ephemeral znode.
2. The health test result for ZOOKEEPER_CANARY_HEALTH has become bad: Canary test failed to delete a znode.
How can I fix this issue? Please assist me step by step to fix the issue. Thank you
... View more
06-01-2020
05:32 AM
Hi
I am using SPARK structured streaming to read text log files from s3 bucket and store it in parquet format on HDFS location.
It noticed that this job generates too many files.
import org.apache.spark.SparkContext import org.apache.spark.SparkConf import org.apache.spark.sql.functions._ import org.apache.spark.sql.SparkSession import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType, TimestampType}; import java.util.Calendar import org.apache.spark.sql.SparkSession import sys.process._
val checkPointDir = "/tmp/rt/checkpoint/"
val spark = SparkSession.builder.config("fs.s3a.awsAccessKeyId","zzz").config("fs.s3a.awsSecretAccessKey","aaabbb").config("spark.sql.streaming.checkpointLocation",s"$checkPointDir").config("spark.cleaner.referenceTracking.blocking", true).config("spark.cleaner.referenceTracking.blocking.shuffle", true).config("spark.cleaner.referenceTracking.cleanCheckpoints", true).getOrCreate() import spark.implicits._
val serveSchema = new StructType().add("log_type",StringType).add("time_stamp",StringType).add("host_name",StringType)
val serveDF = spark.readStream.option("delimiter", "\t").format("com.databricks.spark.csv").schema(serveSchema).load("s3a://eee/logs/vvv*.log")
serveDF.withColumn("minute",substring(col("time_stamp"),15,2)).writeStream.format("parquet").option("path", "/tmp/serve/").outputMode("Append").start
I tried below options, but not useful.
1. trigger(Trigger.ProcessingTime("1 second")) : : got error value not found Trigger
e.g. serveDF.withColumn("minute",substring(col("time_stamp"),15,2)).writeStream.format("parquet").option("path", "/tmp/serve/").outputMode("Append").trigger(Trigger.ProcessingTime("1 second")).start
2. repartition(2) / coalesce(2) : not produced expected result
e.g. serveDF.withColumn("minute",substring(col("time_stamp"),15,2)).repartition(2).writeStream.format("parquet").option("path", "/tmp/serve/").outputMode("Append").start
3. config("spark.sql.files.maxRecordsPerFile", 15000000) : not produced expected result
... View more
Labels:
05-17-2020
07:42 AM
I want to setup real-time data visualization using zeppelin. I have CDH6.3.2 with 3+5 nodes (3 masters+5 data nodes) of cluster. We are using Spark streaming to read log files, aggregate them and store them on HDFS. I want to build some real-time dashboard on aggregated data using zeppelin and export to other team members/user. So these users can monitor real-time data at there end. I don't want to give them CDH access. Is it possible with zeppelin? and how? Please assist.
... View more
Labels:
04-24-2020
10:57 AM
Hi lwang, As suggested I disabled ' Hive Metastore Canary Health Test' and also reduced heap size from 5GiBs to 2GiBs. From last 14hours we have not noticed any alert from Service Monitor. Thanks,
... View more
04-23-2020
08:04 AM
Hi lwang, I noticed that we have only 285 entries in service monitor (find from Cloudera Management Service Monitored Entities). Recently I increased heap size to 5GiBs but still received alert. The health test result for SERVICE_MONITOR_HEAP_SIZE has become bad: Heap used: 4,991M. JVM maximum available heap size: 5,120M. Percentage of maximum heap: 97.48%. Critical threshold: 95.00%.
... View more
04-22-2020
08:46 PM
Thanks lwang, I increase the JVM heap size to 5GiBs. lets see how it will work. Version: Cloudera Express 6.3.0 (#1281944 built by jenkins on 20190719-0609 git: 5b793e9c9cb3f40b3912044aac00abb635183191) Java VM Name: Java HotSpot(TM) 64-Bit Server VM Java Version: 1.8.0_181
... View more
04-22-2020
07:25 AM
I am new in CDH cluster setup. I have CDH 6.3.2 with HA enabled. Total 3+5 nodes cluster(3 masters and 5 data nodes) From last 2 days we received alert from SERVICE_MONITOR_HEAP_SIZE The health test result for SERVICE_MONITOR_HEAP_SIZE has become bad: Heap used: 2,001M. JVM maximum available heap size: 2,048M. Percentage of maximum heap: 97.71%. Critical threshold: 95.00%. So I increased heap size to 3.0GiBs. But still we received alert as below The health test result for SERVICE_MONITOR_HEAP_SIZE has become bad: Heap used: 3,004M. JVM maximum available heap size: 3,072M. Percentage of maximum heap: 97.79%. Critical threshold: 95.00%. How can I estimate heap size? How can I fix this issue? Please assist me step by step to fix the issue. Thank you
... View more
Labels:
01-06-2020
03:06 AM
I saved a sample query in HUE UI(Impala editor). Try to find records in Mysql DB 'HUE' and table 'beeswax_savedquery'. However tables 'beeswax_savedquery' and beeswax_queryhistory are empty. Whereas other tables were able to store all required information. E.g. Table 'auth_user' contains all information about users. My question is that : Where those HUE query are getting stored (in Mysql OR some where in HDFS) I am using CDH 6.3.2 with Impala.
... View more
Labels:
01-30-2019
06:23 AM
Hi, have you got any solution on caption problem? please let me know. I also have same problem. Thank you
... View more
01-30-2019
02:05 AM
I ran a spark2 -submit command through command prompt. It ran successfully, after some time I terminated this command using CTRL+C on Centos.
e.g. spark2-submit --class org.apache.spark.SparkProgram.simpleapp --master yarn --deploy-mode cluster /x/xx/xxx/sbt/project/simpleapp/target/scala-2.11/simpleapp_2.11-1.0.jar
After that when I enter spark2-shell, it is stuck and not opening spark shell.
$ spark2-shell
WARNING: User-defined SPARK_HOME (/data/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2) overrides detected (/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2).
WARNING: Running spark-class from user-defined location.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
I also tried to submit job using spark2-submit but no use this time.
Please suggest..
... View more
10-22-2018
10:58 AM
Thanks a lot, this works for me.
... View more
10-21-2018
10:52 PM
We have a situation where the whole cluster was installed and managed by CM6/CDH6, 1 machine for CM, 4 other machines for CDH, embedded DB is not use, mysql is deployed as external DB. It runs well but then the CM machine crashed due to hardware failure. It there a way to replace the hardware and reinstall teh same version of CM and add existing hosts(datanodes) to the same cluster again? If only there is a way to re-install the CM machine after it crashes, and be able to add hosts machines to an existing cluster that is previously installed/managed by the same version of CM, it will be sufficient for us. I tried to add existing hosts(datanodes) but installation stopped with below message at Cluster Installation -> Install Parcels Src file /opt/cloudera/parcels/.flood/CDH-5.15.1-1.cdh5.15.1.p0.4-el6.parcel/CDH-5.15.1-1.cdh5.15.1.p0.4-el6.parcel does not exist Any suggestion? am I doing right way, is there any othe correct way to achive this?
... View more
Labels: