Member since
03-01-2017
18
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
357 | 11-05-2017 10:36 PM |
06-21-2018
08:06 PM
Hello Vishal, As of yet, I don't think we have support for it. Most of the "use-cases" that I've seen is, if one of the ActiveDirectory domain controller goes down, then they should have another one as a backup. If this is the case, I would recommend you for a ldap load balancer.
... View more
05-14-2018
09:42 PM
2 Kudos
By default, hiveserver2 doesn't have an entry in the hive-env.sh that provides a way to change the type of GC. The CMS collector is designed to eliminate the long pauses associated with the full GC cycles of the throughput and serial collectors. CMS stops all application threads during a minor GC, which it also performs with multiple threads. In order to change the type of GC: Ambari > Hive > Configs > Advanced hive-env > hive-env template > Add the follow properties. if [ "$SERVICE" = "hiveserver2" ]; then
if [ -z "$DEBUG" ]; then
export HADOOP_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hive/$USER/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m"
else
export HADOOP_OPTS="-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hive/$USER/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m024m -Xmx1024m"
fi
fi
Make sure that the new settings has been applied successfully, as well as the heap sizes. /usr/jdk64/jdk1.8.0_112/bin/jcmd <hiveserver_pid> VM.flags /usr/jdk64/jdk1.8.0_112/bin/jmap -heap <hiveserver_pid> OBS: This article is not meant to provide the best JVM flags, this will vary according to your environment. The idea is to always scale out the load avg adding more HS2 instances, in case your HS2 are highly utilized. Please, check with a HWX consultant to better align it.
... View more
- Find more articles tagged with:
- garbage-collector
- Hadoop Core
- Hive
- hiveserver2
- How-ToTutorial
- logs
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
04-18-2018
08:15 PM
Hello As some of you already know, Solr through knox on HDP platform isn't fully supported yet, however, it is possible to achieve that using an IBM IOP distribution. Here are some steps: Problems like that are usually related with kerberos issues. Pre-reqs: You should have already configured your knox with the desired authentication mode: OBS: In order to use this flag "Dsun.security.krb5.rcache", the jdk 1.8 or above must be used. Root cause: a) You may have not configured your browser for authentication ( SPNEGO ), b) You haven't included your users into the SolR Plugin on Ranger. c) You are hitting a known issue related with this parameter "Dsun.security.krb5.rcache=none", which is better described in this forum https://community.hortonworks.com/content/supportkb/150162/error-gssexception-failure-unspecified-at-gss-api.html 1) Add the following parameters into your Hadoop core-site.xml: hadoop.proxyuser.knox.groups = * hadoop.proxyuser.knox.hosts = * OBS: You can change the impersonation requirements accordingly with your environment. 2) According to the "known" issue aforementioned, you will have to add this parameter "Dsun.security.krb5.rcache=none" on Ambari > SolR > solr.env > The configuration needs to be configured like that: SOLR_OPTS="-Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs -Dsolr.hdfs.confdir=/etc/hadoop/conf -Dsolr.hdfs.home={{fs_root}}{{solr_hdfs_home_dir}} -Dsolr.hdfs.security.kerberos.enabled={{security_enabled}} -Dsolr.hdfs.security.kerberos.keytabfile={{solr_kerberos_keytab}} -Dsolr.hdfs.security.kerberos.principal={{solr_kerberos_principal}} -Dsun.security.krb5.rcache=none
-Dsolr.log4j.dir={{solr_log_dir}}" 3) Go to the "Quick Links > Ranger > Ranger Admin UI > Solr" and add the user "knox" After these steps, your SolR UI should work fine through Knox.
... View more
- Find more articles tagged with:
- How-ToTutorial
- Kerberos
- Knox
- Ranger
- Sandbox & Learning
- solr
Labels:
12-03-2017
10:39 PM
Hello Michael, These messages doesn't seem to be the root cause of the issue, as the error mentions, they are just some clients releasing their stream from the zookeeper.
Verify if it has sufficient space on all system, log, and hdfs partitions Some heapsize configuration issues ? Does it have the correct permissions within its directory ? ( Including owner / group ) Have you tried to check the ".out" files generated by zookeeper ? Regards,
... View more
11-24-2017
02:52 PM
Hello This seems to be happening because your spark have it configured to use master = [local] 1) Take a look at the link below: https://zeppelin.apache.org/docs/latest/manual/interpreters.html#what-is-interpreter-group 2) Try to change from (master) local to yarn-client if you still have it on your interpreter. 3) If your application shows up in the Resource Manager, it's likely that it is using the yarn framework. Regards,
... View more
11-23-2017
08:14 PM
Hello Fernando, Have you tried to capture those logs using the yarn command line ? yarn logs -applicationId 'number' <- VERTEX Failed is such a generic issue, just showing that some vertice has encountered issues during its task. With ORC tables, I've seen issues with their heapsize, orc.compress.size={to low} But is hard to know without evidences. You could also go through Resource Manager UI > Application_ID > History > then you can search for the attempt_id / container logs. Regards,
... View more
11-07-2017
01:43 AM
I have been struggling to configure Spark through Mongodb with SSL. Unfortunately, the steps are not well documented there. Here is some quick guidance:
Command line ( spark-shell / spark-submit / pyspark )
1) According with the JIRA below:
https://jira.mongodb.org/browse/SPARK-115?jql=project%20%3D%20SPARK%20AND%20component%20%3D%20Documentation
Copy both Spark Connector and Mongo-Java-Driver among your datanodes, ( Nodes where your spark executors / driver are supposed to be running ). The Spark Connector uses the Mongo-Java-Driver and the driver will need to be configured to work with SSL. See the ssl tutorial in the java documentation.
Ex.
- spark_mongo-spark-connector_2.11-2.1.0.jar
- mongodb_mongo-java-driver-3.4.2.jar
OBS: Find yours at the mongodb website.
2) Go to ambari > Spark > Custom spark-defaults, now pass these two parameters in order to make spark (executors/driver) aware about the certificates.
Example from my lab:
spark.driver.extraJavaOptions=-Djavax.net.ssl.trustStore=/tmp/path/keystore.jks -Djavax.net.ssl.trustStorePassword=bigdata -Djavax.net.ssl.keyStore=/tmp/path/keystore.jks -Djavax.net.ssl.keyStorePassword=bigdata
spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=/tmp/path/keystore.jks -Djavax.net.ssl.trustStorePassword=bigdata -Djavax.net.ssl.keyStore=/tmp/path/keystore.jks -Djavax.net.ssl.keyStorePassword=bigdata
3) Afterwards, move the .jks file to a common shared location among your datanodes ( Executors and Driver ).
4) Submit your spark code
./spark-shell --master yarn-client
import com.mongodb.spark.config._
import com.mongodb.spark._
val readConfig = ReadConfig(Map("uri" -> "mongodb://user:password@host:port/<database>?ssl=true"))
val rdd = MongoSpark.load(sc, readConfig)
println(rdd.count)
5) Zeppelin (Optional) If you want to use it though zeppelin, you should also configure the interpreter (%spark) to look at the correct truststore location ( where your certificate resides ).
Normally Zeppelin Interpreter is another process spawned from Zeppelin, we need to check if the interpreter process has it's own truststore (javax.net.ssl.trustStore), something like:
ps -ef | grep zeppelin
zeppelin 18064 1 0 Nov06 ? 00:01:13 /usr/jdk64/jdk1.8.0_112/bin/java -Dhdp.version=2.6.2.0-205 -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default -Dfile.encoding=UTF-8 -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/log4j.properties -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-zeppelin-minotauro1.hostname.br.log -cp ::/usr/hdp/current/zeppelin-server/lib/interpreter/*:/usr/hdp/current/zeppelin-server/lib/*:/usr/hdp/current/zeppelin-server/*::/usr/hdp/current/zeppelin-server/conf org.apache.zeppelin.server.ZeppelinServer
ps auxx | grep interpreter
zeppelin 18064 0.3 6.3 4664040 513732 ? Sl Nov06 1:13 /usr/jdk64/jdk1.8.0_112/bin/java -Dhdp.version=2.6.2.0-205 -Dspark.executor.memory=512m -Dspark.executor.instances=2 -Dspark.yarn.queue=default -Dfile.encoding=UTF-8 -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/log4j.properties -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-zeppelin-minotauro1.hostname.br.log -cp ::/usr/hdp/current/zeppelin-server/lib/interpreter/*:/usr/hdp/current/zeppelin-server/lib/*:/usr/hdp/current/zeppelin-server/*::/usr/hdp/current/zeppelin-server/conf org.apache.zeppelin.server.ZeppelinServer
.. /interpreter/spark/zeppelin-spark-0.6.0.2.5.0.0-1245.jar 36304
If the clause "-Djavax.net.ssl.trustStore" is not specified, we will need to import our certificate into our default cacerts ($JAVA_HOME):
If you need to export from an existing truststore.
keytool -export -keystore /tmp/path/truststore.ts -alias mongodb-cert -file /tmp/mongodb-cert.crt
and now import into the default cacerts.
keytool -import -keystore /usr/jdk64/jdk1.8.0_112/jre/lib/security/cacerts -alias mongodb-cert -file /tmp/mongodb-cert.crt
Optional: You could also convert certificates to various forms, sign certificate requests like a "mini CA " or edit certificate trust settings.
Ex. openssl x509 -in /tmp/mongodb-cert.crt -noout -text -format -inform der
6) Rerun the job using zeppelin.
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- FAQ
- How-ToTutorial
- mongodb
- Spark
- ssl
- zeppelin
Labels:
11-06-2017
03:19 PM
Hello,
I'm still seeing some people struggling to run their own mapreduce applications using a command line. For those who are not java developers, here is some quick guidance.
Let's create a new directory and put our new java extension within it.
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
From the client-side, we need to be able to resolve external resources classes / libraries ( import lines ). Let's find out our hadoop classpath to resolve any dependency.:
-sh-4.1$ hadoop classpath
/usr/hdp/2.6.2.0-205/hadoop/conf:/usr/hdp/2.6.2.0-205/hadoop/lib/*:/usr/hdp/2.6.2.0-205/hadoop/.//*:/usr/hdp/2.6.2.0-205/hadoop-hdfs/./:/usr/hdp/2.6.2.0-205/hadoop-hdfs/lib/*:/usr/hdp/2.6.2.0-205/hadoop-hdfs/.//*:/usr/hdp/2.6.2.0-205/hadoop-yarn/lib/*:/usr/hdp/2.6.2.0-205/hadoop-yarn/.//*:/usr/hdp/2.6.2.0-205/hadoop-mapreduce/lib/*:/usr/hdp/2.6.2.0-205/hadoop-mapreduce/.//*::mysql-connector-java-5.1.17.jar:mysql-connector-java.jar:/usr/hdp/2.6.2.0-205/tez/*:/usr/hdp/2.6.2.0-205/tez/lib/*:/usr/hdp/2.6.2.0-205/tez/conf
/usr/jdk64/jdk1.8.0_112/bin/javac -classpath $(/usr/hdp/current/hadoop-client/bin/hadoop classpath) -d job/ job/WordCount.java
Now, all the classes were turned into a .class, let's group them all into a single jar.
-sh-4.1$ /usr/jdk64/jdk1.8.0_112/bin/jar -cvf Test.jar -C job/ .
Execute the mapreduce program.
-sh-4.1$ hadoop jar Test.jar WordCount /tmp/sample_07.csv /tmp/output_mapred
17/11/05 23:41:50 INFO client.RMProxy: Connecting to ResourceManager at minotauro3.hostname.br/xxx.xx.xxx.xx:8050
17/11/05 23:41:51 INFO client.AHSProxy: Connecting to Application History server at minotauro3.hostname.br/xxx.xx.xxx.xx:10200
17/11/05 23:41:51 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 12603 for bob1 on ha-hdfs:cluster2
17/11/05 23:41:51 INFO security.TokenCache: Got dt for hdfs://cluster2; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:cluster2, Ident: (HDFS_DELEGATION_TOKEN token 12603 for bob1)
......
File Input Format Counters
Bytes Read=46055
File Output Format Counters
Bytes Written=36214
... View more
- Find more articles tagged with:
- compile
- Hadoop Core
- How-ToTutorial
- Mapreduce
Labels:
11-05-2017
10:36 PM
Hello Zeppelin supports https://handsontable.com/ from 0.7.0 version. You can find more detailed information in here https://github.com/apache/zeppelin/commit/13d77e07f834ff375941841e5b2e8cc344702749. Attached a GIF that shows you how to do it. 13787-change-type.gif Regards
... View more
11-05-2017
09:58 PM
Hello Aviram, I would try to execute a REST API command to check if there is anything else still using the previous version of HDP, something like that: curl -u $AMBARI_USER:$AMBARI_PASSWD -H 'X-Requested-By: ambari' -X GET "http://<ambari-server:8080/api/v1/clusters/<cluster_name>/hosts/host_a5" | grep 2.6.3.0-235 This would give you an idea if all the components were indeed upgraded. Regards Danilo Perez
... View more