Member since
12-09-2015
16
Posts
2
Kudos Received
0
Solutions
07-10-2017
10:24 AM
We are trying to read a teradata table from spark2.0 using jdbc using the following code : import sys
spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.10.1-src.zip'))
sys.path.insert(0, os.path.join(spark_home, 'python/lib/pyspark.zip'))
filename = os.path.join(spark_home, 'python/pyspark/shell.py')
print(os.environ.get('SPARK_HOME', None))
exec(compile(open(filename, "rb").read(), filename, 'exec'))
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 2" in open(spark_release_file).read():
print("Spark is there.")
argsstr= "--master yarn-client --deploy-mode cluster pyspark-shell --driver-class-path /path/to/teradata/terajdbc4.jar,/path/to/teradata/tdgssconfig.jar --driver-library-path /path/to/teradata/terajdbc4.jar,/path/to/teradata/tdgssconfig.jar --jars /path/to/teradata/terajdbc4.jar,/path/to/teradata/tdgssconfig.jar"
pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", argsstr)
if not "pyspark-shell" in pyspark_submit_args:
pyspark_submit_args += " pyspark-shell"
print(pyspark_submit_args)
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args
os.environ["SPARK_SUBMIT_ARGS"] = pyspark_submit_args
from pyspark.sql import SQLContext
from pyspark import SparkConf, SparkContext
url = 'jdbc:teradata://teradata.server.com'
user='username'
password=''
driver = 'com.teradata.jdbc.TeraDriver'
dbtable_read = 'mi_temp.bd_test_spark_read'
sqlContext = SQLContext(sc)
df = sqlContext.read.format("jdbc").options(url=url, user=user, password=password, driver=driver, dbtable=dbtable_read).load() We get the follwoing error : Py4JJavaError: An error occurred while calling o48.load.
: java.lang.ClassNotFoundException: com.teradata.jdbc.TeraDriver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:38)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:49)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:49)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createConnectionFactory(JdbcUtils.scala:49)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:123)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:117)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:53)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:315)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:211)
at java.lang.Thread.run(Thread.java:745) However the if we run the same code, via command line it works. Can you please give us some pointers?
... View more
Labels:
- Labels:
-
Apache Spark
06-09-2016
08:48 AM
It was edit to the /etc/hosts files in all the nodes . The hosts file was not set up correctly .
... View more
02-03-2016
12:32 PM
@Neeraj Sabharwal I tried this option, but to sucess there yet .
... View more
02-03-2016
12:32 PM
@Artem Ervits @Neeraj SabharwalI have noticed a few conflicting settings in the Yarn site.xml.yarn.nodemanager.container-executor.class = org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutorand we dont have same linux users across the cluster. Hence waiting for the users to be created. Once that is done will test and post the result.
... View more
01-25-2016
03:47 PM
@Neeraj Sabharwal Deleting the directory makes the job work for once, but afterwards it fails again.
... View more
01-25-2016
01:25 PM
Have tried that, also the issue is, when a new folder is created, the permissions dont apply, hence the job starts failing. Some cleaning up is not happening correctly, but I am unable to locate the issue 😞
... View more
01-25-2016
09:40 AM
@Artem Ervits The service checks run fine. Also we have started the services many time, the issue still persists. umask value in all nodes is set to 0022 . What are the mount options we should check ?
... View more
01-22-2016
04:39 PM
I am trying to run a benchmark job, with the following command :
yarn jar /path/to/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 1000 -resFile /tmp/TESTDFSio.txt
but my job fails with following error messages : 16/01/22 15:08:47 INFO mapreduce.Job: Task Id : attempt_1453395961197_0017_m_000008_2, Status : FAILED
Application application_1453395961197_0017 initialization failed (exitCode=255) with output: main : command provided 0 main : user is foo main : requested yarn user is foo Path /mnt/sdb1/yarn/local/usercache/foo/appcache/application_1453395961197_0017 has permission 700 but needs permission 750. Path /var/hadoop/yarn/local/usercache/foo/appcache/application_1453395961197_0017 has permission 700 but needs permission 750.
Did not create any app directories Even when I change these directories permission to 750, I get errors.
Also these caches dont get cleaned off, after one job'and create collisons when running the next job.
Any insights ?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN
01-18-2016
09:18 AM
We have a recently build 5 node HDP cluster, it is not in HA mode. Some time in the future there will be an requirement from the Unix team to apply patches/server maintenance. which would require rebooting the server machines. What is the best way to do it ? I plan to do the following : 1) Shut down all services using Ambari. 2) Shutdown ambari-agents on all nodes. 3) Shutdown ambari-server. 4) Reboot all nodes as required . 5) Restart ambari-server, agents and services in that order. Is this the correct sequence ? or am I missing anything .
... View more
Labels:
- Labels:
-
Apache Ambari
01-18-2016
09:10 AM
1 Kudo
Hi All, We were able to solve to issue, it was an issue with the host-names and ip addresses not being set correctly . Thanks for you replies @Neeraj Sabharwal @Artem Ervits @pankaj singh
... View more