Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Where to put my.truststore?

avatar
Contributor

If I am using self-signed certificates, where do I put the created my.truststore? Into $JAVA_HOME/jre/lib/security/my.truststore or $JAVA_HOME/jre/lib/security/jssecacerts?

 

It is not clear to me how it is going to be used and where would I need to specify it when configuring Levels 1-3 of TLS. It is also not clear to me if both keystore and truststore are needed or keystores are enough?

 

Any good introduction on the concepts of certificates, truststore, keystore?

 

17 REPLIES 17

avatar
Contributor

I have tested various Hadoop functionality running from a command line. Mostly things are working unless you try to run pig or pyspark on a local machine:

 

1) hdfs commands seem to be working

2) mapreduce is working

3) pig is working when submitting a job to the cluster but runs out of memory when submitting a tiny job to a local machine with 

pig -x local

 

 

2017-03-02 13:48:36,124 [Thread-20] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local1306469224_0004
java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
       at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
       at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549)
Caused by: java.lang.OutOfMemoryError: Java heap space
       at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:987)
       at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
       at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
       at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
       at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at java.lang.Thread.run(Thread.java:745)
2017-03-02 13:48:42,128 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to
stop immediately on failure.
2017-03-02 13:48:42,128 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local1306469224_0004 has failed! Stop running all dependent jobs

 

 

 

 

4) I can submit a spark job to a cluster with

spark-submit --master=yarn --num-executors=3 lettercount.py
but pyspark without arguments crashes:

 

[ivy2@md01 lab7_Spark]$ pyspark
Python 2.7.5 (default, Nov 20 2015, 02:00:19)  
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Error: Cluster deploy mode is not applicable to Spark shells.
Run with --help for usage help or --verbose for debug output
Traceback (most recent call last):
 File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/shell.py", line 43, in <module>
   sc = SparkContext(pyFiles=add_files)
 File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 112, in __init__
   SparkContext._ensure_initialized(self, gateway=gateway)
 File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 245, in _ensure_initialized
   SparkContext._gateway = gateway or launch_gateway()
 File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
   raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number

 

On the other hand, it works with --master=yarn:

 

[ivy2@md01 lab7_Spark]$ pyspark --master=yarn
Python 2.7.5 (default, Nov 20 2015, 02:00:19)  
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
     ____              __
    / __/__  ___ _____/ /__
   _\ \/ _ \/ _ `/ __/  '_/
  /__ / .__/\_,_/_/ /_/\_\   version 1.6.0
     /_/

Using Python version 2.7.5 (default, Nov 20 2015 02:00:19)
SparkContext available as sc, HiveContext available as sqlContext.

 

5) hive works

6) HBase works

 

Whay pyspark and pig misbehave on a local node but are OK when running on the cluster?

 

avatar
Contributor

I think, I misinterpreted it: pyspark or spark-submit crash on the cluster but not locally.

 

As far as I understand --master=yarn --deploy-mode=client is running locally and --master=yarn --deploy-mode=cluster is running on the cluster and pyspark by default is probably trying to run on a cluster.

avatar
Contributor

Also, pig actually seem to work both with -x local and -x mapreduce. I think I just mess up the directories the first time. But spark is definitely a problem.

avatar
Contributor

As a back up solution, how do I disable TLS? use_tls=0 in
/etc/cloudera-scm-agent/config.ini plus undo all TLS/SSL enables on two web
pages, then restart server, agents, cloudera management services?
I need to have cluster in production within a few days.

 

avatar
Contributor

I have undone all the TLS enabling and still had the same problem.

Eventually it occurred to me that some processes might be stuck.

So I physically rebooted all the Hadoop machines and that resolved the problem.

After that I was able to reenable all the steps for TLS.

 

However, I still have a problem with pyspark and spark-submit: they run with

--master=yarn --deploy-mode=client

but fail with

--master=yarn --deploy-mode=cluster

 

==============

[ivy2@md01 ~]$ pyspark --master=yarn --deploy-mode=cluster
Python 2.7.5 (default, Nov 20 2015, 02:00:19)  
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Error: Cluster deploy mode is not applicable to Spark shells.
Run with --help for usage help or --verbose for debug output
Traceback (most recent call last):
 File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/shell.py", line 43, in <module>
   sc = SparkContext(pyFiles=add_files)
 File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 112, in __init__
   SparkContext._ensure_initialized(self, gateway=gateway)
 File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/context.py", line 245, in _ensure_initialized
   SparkContext._gateway = gateway or launch_gateway()
 File "/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
   raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
>>>
==========

Could TLS intefere with spark?

 

 

avatar
Contributor

It looks like Yarn also needs to be told about TLS? Would it work without it if TLS is fully enabled? 

avatar
Contributor

And Oozie?

avatar
Contributor

And a lot of other components have TLS in Security section ...

 

Are those mandatory or only needed for Kerberos?