Member since
03-01-2017
58
Posts
5
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1574 | 11-05-2017 10:36 PM |
07-26-2022
08:47 PM
With the need of keeping metrics centralized into a single spot, people have sought a way to configure grafana for observability by attempting to collect the wide range of metrics available in a Datahub / Datalake in order to keep their dashboards arranged in a centralized hub. This tutorial will walk you through step-by-step in how to configure Grafana to query the metrics available in the Cloudera Manager of a datahub cluster. First, we need to ensure that the machine where Grafana is running has direct connection with the CM Server. In other words, it must be able to resolve the CM FQDN and establish a straight up connection with the service, bypassing knox completely. Unlike the Grafana utilized by other experiences, such as DWX, CDW and so forth of which use prometheus for integration, this grafana will have to integrate with CM Server for authentication and during the creation of a datasource when querying the metrics therein. This integration is available through a plugin that must be installed after the grafana deployment, which are by no means maintained or developed by Cloudera. Installing Grafana: * yum -y install grafana * systemctl start grafana-server Make sure the service has proper access to the folder "/var/lib/grafana/"" * grafana-cli plugins install foursquare-clouderamanager-datasource Grafana server must be restarted prior to utilizing the plugins. * systemctl restart grafana-server The first step is to locate the machine whose CM Server is running and fetch the appropriate client certificate in use by the service, and this step can be accomplished from any server with proper access to the Cloudera Manager. Ex. openssl s_client -showcerts -connect <datahub-name>.dperez-a.a465-9q4k.cloudera.site:7183 the port 7183 is the secured HTTPS endpoint used by CM, therefore we can use openssl in order to extract the client certificate actively used by the service, widely adopted by many sysadmins when having to deal with TLS/SSL circumstances. As a next step, we can toggle on the option "With CA Cert" and paste the certificate acquired as part of the previous step in addition with the workload user and password under "Basic Auth Details" The format of the URL should follow the pattern down below, where the CM_SERVER_FQDN must be replaced with the appropriate CM Server in use by your Datahub. If there are any errors along the way, you should see a pop up message displaying the exact error message. Furthermore, you can always inspect the grafana logs for further clarifications if the error isn't intuitive at first sight. https://CM_SERVER_FQDN:7183 After the procedure is completed, you should be able to start setting up the charts by providing a valid tsquery. The tsquery language is used to specify statements for retrieving time-series data from the Cloudera Manager time-series datastore. You can check the tsquery of those many charts available in your CM UI and use it as a reference when building your own set of charts. Ref: https://grafana.com/grafana/plugins/foursquare-clouderamanager-datasource/?tab=installation Ref: https://github.com/foursquare/datasource-plugin-clouderamanager Ref: https://docs.cloudera.com/cloudera-manager/7.4.2/monitoring-and-diagnostics/topics/cm-tsquery-language.html Ref: https://docs.cloudera.com/cloudera-manager/7.4.2/metrics/topics/cm-metrics-reference.html
... View more
05-14-2018
09:42 PM
2 Kudos
By default, hiveserver2 doesn't have an entry in the hive-env.sh that provides a way to change the type of GC. The CMS collector is designed to eliminate the long pauses associated with the full GC cycles of the throughput and serial collectors. CMS stops all application threads during a minor GC, which it also performs with multiple threads. In order to change the type of GC: Ambari > Hive > Configs > Advanced hive-env > hive-env template > Add the follow properties. if [ "$SERVICE" = "hiveserver2" ]; then
if [ -z "$DEBUG" ]; then
export HADOOP_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hive/$USER/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m"
else
export HADOOP_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hive/$USER/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m024m -Xmx1024m"
fi
fi
Make sure that the new settings has been applied successfully, as well as the heap sizes. /usr/jdk64/jdk1.8.0_112/bin/jcmd <hiveserver_pid> VM.flags /usr/jdk64/jdk1.8.0_112/bin/jmap -heap <hiveserver_pid> OBS: This article is not meant to provide the best JVM flags, this will vary according to your environment. The idea is to always scale out the load avg adding more HS2 instances, in case your HS2 are highly utilized. Please, check with a HWX consultant to better align it.
... View more
Labels:
04-18-2018
08:15 PM
Hello As some of you already know, Solr through knox on HDP platform isn't fully supported yet, however, it is possible to achieve that using an IBM IOP distribution. Here are some steps: Problems like that are usually related with kerberos issues. Pre-reqs: You should have already configured your knox with the desired authentication mode: OBS: In order to use this flag "Dsun.security.krb5.rcache", the jdk 1.8 or above must be used. Root cause: a) You may have not configured your browser for authentication ( SPNEGO ), b) You haven't included your users into the SolR Plugin on Ranger. c) You are hitting a known issue related with this parameter "Dsun.security.krb5.rcache=none", which is better described in this forum https://community.hortonworks.com/content/supportkb/150162/error-gssexception-failure-unspecified-at-gss-api.html 1) Add the following parameters into your Hadoop core-site.xml: hadoop.proxyuser.knox.groups = * hadoop.proxyuser.knox.hosts = * OBS: You can change the impersonation requirements accordingly with your environment. 2) According to the "known" issue aforementioned, you will have to add this parameter "Dsun.security.krb5.rcache=none" on Ambari > SolR > solr.env > The configuration needs to be configured like that: SOLR_OPTS="-Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.hdfs.confdir=/etc/hadoop/conf -Dsolr.hdfs.home={{fs_root}}{{solr_hdfs_home_dir}} -Dsolr.hdfs.security.kerberos.enabled={{security_enabled}} -Dsolr.hdfs.security.kerberos.keytabfile={{solr_kerberos_keytab}} -Dsolr.hdfs.security.kerberos.principal={{solr_kerberos_principal}} -Dsun.security.krb5.rcache=none
-Dsolr.log4j.dir={{solr_log_dir}}" 3) Go to the "Quick Links > Ranger > Ranger Admin UI > Solr" and add the user "knox" After these steps, your SolR UI should work fine through Knox.
... View more
Labels:
10-07-2018
09:31 AM
Thank for your great information. I have a trouble with connecting Mongodb with .ssl (.pem configuartion) from spark and scala via IDEA. Do you have any suggestion on this?
... View more
11-06-2017
03:19 PM
Hello,
I'm still seeing some people struggling to run their own mapreduce applications using a command line. For those who are not java developers, here is some quick guidance.
Let's create a new directory and put our new java extension within it.
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
From the client-side, we need to be able to resolve external resources classes / libraries ( import lines ). Let's find out our hadoop classpath to resolve any dependency.:
-sh-4.1$ hadoop classpath
/usr/hdp/2.6.2.0-205/hadoop/conf:/usr/hdp/2.6.2.0-205/hadoop/lib/*:/usr/hdp/2.6.2.0-205/hadoop/.//*:/usr/hdp/2.6.2.0-205/hadoop-hdfs/./:/usr/hdp/2.6.2.0-205/hadoop-hdfs/lib/*:/usr/hdp/2.6.2.0-205/hadoop-hdfs/.//*:/usr/hdp/2.6.2.0-205/hadoop-yarn/lib/*:/usr/hdp/2.6.2.0-205/hadoop-yarn/.//*:/usr/hdp/2.6.2.0-205/hadoop-mapreduce/lib/*:/usr/hdp/2.6.2.0-205/hadoop-mapreduce/.//*::mysql-connector-java-5.1.17.jar:mysql-connector-java.jar:/usr/hdp/2.6.2.0-205/tez/*:/usr/hdp/2.6.2.0-205/tez/lib/*:/usr/hdp/2.6.2.0-205/tez/conf
/usr/jdk64/jdk1.8.0_112/bin/javac -classpath $(/usr/hdp/current/hadoop-client/bin/hadoop classpath) -d job/ job/WordCount.java
Now, all the classes were turned into a .class, let's group them all into a single jar.
-sh-4.1$ /usr/jdk64/jdk1.8.0_112/bin/jar -cvf Test.jar -C job/ .
Execute the mapreduce program.
-sh-4.1$ hadoop jar Test.jar WordCount /tmp/sample_07.csv /tmp/output_mapred
17/11/05 23:41:50 INFO client.RMProxy: Connecting to ResourceManager at minotauro3.hostname.br/xxx.xx.xxx.xx:8050
17/11/05 23:41:51 INFO client.AHSProxy: Connecting to Application History server at minotauro3.hostname.br/xxx.xx.xxx.xx:10200
17/11/05 23:41:51 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 12603 for bob1 on ha-hdfs:cluster2
17/11/05 23:41:51 INFO security.TokenCache: Got dt for hdfs://cluster2; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cluster2, Ident: (HDFS_DELEGATION_TOKEN token 12603 for bob1)
......
File Input Format Counters
Bytes Read=46055
File Output Format Counters
Bytes Written=36214
... View more
Labels:
11-11-2017
12:39 PM
Using the the sandbox 2.6 solved this problem....thank you @Danilo Perez
... View more