Member since
12-09-2015
16
Posts
2
Kudos Received
0
Solutions
07-10-2017
10:24 AM
We are trying to read a teradata table from spark2.0 using jdbc using the following code : import sys
spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.10.1-src.zip'))
sys.path.insert(0, os.path.join(spark_home, 'python/lib/pyspark.zip'))
filename = os.path.join(spark_home, 'python/pyspark/shell.py')
print(os.environ.get('SPARK_HOME', None))
exec(compile(open(filename, "rb").read(), filename, 'exec'))
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 2" in open(spark_release_file).read():
print("Spark is there.")
argsstr= "--master yarn-client --deploy-mode cluster pyspark-shell --driver-class-path /path/to/teradata/terajdbc4.jar,/path/to/teradata/tdgssconfig.jar --driver-library-path /path/to/teradata/terajdbc4.jar,/path/to/teradata/tdgssconfig.jar --jars /path/to/teradata/terajdbc4.jar,/path/to/teradata/tdgssconfig.jar"
pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", argsstr)
if not "pyspark-shell" in pyspark_submit_args:
pyspark_submit_args += " pyspark-shell"
print(pyspark_submit_args)
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args
os.environ["SPARK_SUBMIT_ARGS"] = pyspark_submit_args
from pyspark.sql import SQLContext
from pyspark import SparkConf, SparkContext
url = 'jdbc:teradata://teradata.server.com'
user='username'
password=''
driver = 'com.teradata.jdbc.TeraDriver'
dbtable_read = 'mi_temp.bd_test_spark_read'
sqlContext = SQLContext(sc)
df = sqlContext.read.format("jdbc").options(url=url, user=user, password=password, driver=driver, dbtable=dbtable_read).load() We get the follwoing error : Py4JJavaError: An error occurred while calling o48.load.
: java.lang.ClassNotFoundException: com.teradata.jdbc.TeraDriver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:38)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:49)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:49)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createConnectionFactory(JdbcUtils.scala:49)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:123)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:117)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:53)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:315)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:211)
at java.lang.Thread.run(Thread.java:745) However the if we run the same code, via command line it works. Can you please give us some pointers?
... View more
Labels:
07-20-2016
12:41 PM
Hi @deepak sharma , I am not sure where to look for this ? Can you please guide me . I can see, some policies in the Ranger Access Manager. I assign some users access here, and try them on beeline, it works.Eg in the above screenshot, the user besides amabri-qa is able to access,(view tables, select etc) via beeline console . However on this page . . I am not able to connect to hive. I am able to make a successful connection to Yarn and HDFS .
... View more
07-19-2016
10:37 AM
I am getting this error when I try to check the connectivity to the repo . Interestingly, if I configure policies , those reflect back on the HDP cluster .
... View more
07-19-2016
09:31 AM
Hi @deepak sharma No it is not a secured cluster .
... View more
07-14-2016
03:29 PM
We enabled Hive plugin for Ranger, but when I test the connection, it fails . /var/log/ranger/admin/xa_portal.log log shows the following error , when I try to test the connection to Hive . 2016-07-14 17:24:32,006 [timed-executor-pool-0] INFO org.apache.ranger.plugin.client.BaseClient (BaseClient.java:104) - Init Login: security not enabled, using username
2016-07-14 17:24:32,006 [timed-executor-pool-0] INFO apache.ranger.services.hive.client.HiveClient (HiveClient.java:75) - Since Password is NOT provided, Trying to use UnSecure client with username and password
2016-07-14 17:24:32,077 [timed-executor-pool-0] ERROR apache.ranger.services.hive.client.HiveResourceMgr (HiveResourceMgr.java:51) - <== HiveResourceMgr.testConnection Error: org.apache.ranger.plugin.client.HadoopException: Unable to execute SQL [show databases like "*"].
2016-07-14 17:24:32,078 [timed-executor-pool-0] ERROR org.apache.ranger.services.hive.RangerServiceHive (RangerServiceHive.java:58) - <== RangerServiceHive.validateConfig Error:org.apache.ranger.plugin.client.HadoopException: Unable to execute SQL [show databases like "*"].
2016-07-14 17:24:32,078 [timed-executor-pool-0] ERROR org.apache.ranger.biz.ServiceMgr$TimedCallable (ServiceMgr.java:434) - TimedCallable.call: Error:org.apache.ranger.plugin.client.HadoopException: Unable to execute SQL [show databases like "*"].
2016-07-14 17:24:32,078 [http-bio-6080-exec-9] ERROR org.apache.ranger.biz.ServiceMgr (ServiceMgr.java:120) - ==> ServiceMgr.validateConfig Error:java.util.concurrent.ExecutionException: org.apache.ranger.plugin.client.HadoopException: Unable to execute SQL [show databases like "*"].
... View more
Labels:
06-09-2016
08:48 AM
It was edit to the /etc/hosts files in all the nodes . The hosts file was not set up correctly .
... View more
06-07-2016
04:42 PM
Hi Hari, I am facing the same issue . Were you able to get it solved ? I tried the method mentioned above, but the issue is not resolved .
... View more
02-03-2016
12:32 PM
@Neeraj Sabharwal I tried this option, but to sucess there yet .
... View more
02-03-2016
12:32 PM
@Artem Ervits @Neeraj SabharwalI have noticed a few conflicting settings in the Yarn site.xml.yarn.nodemanager.container-executor.class = org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutorand we dont have same linux users across the cluster. Hence waiting for the users to be created. Once that is done will test and post the result.
... View more
01-25-2016
03:47 PM
@Neeraj Sabharwal Deleting the directory makes the job work for once, but afterwards it fails again.
... View more
01-25-2016
01:25 PM
Have tried that, also the issue is, when a new folder is created, the permissions dont apply, hence the job starts failing. Some cleaning up is not happening correctly, but I am unable to locate the issue 😞
... View more
01-25-2016
09:40 AM
@Artem Ervits The service checks run fine. Also we have started the services many time, the issue still persists. umask value in all nodes is set to 0022 . What are the mount options we should check ?
... View more
01-22-2016
04:39 PM
I am trying to run a benchmark job, with the following command :
yarn jar /path/to/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 1000 -resFile /tmp/TESTDFSio.txt
but my job fails with following error messages : 16/01/22 15:08:47 INFO mapreduce.Job: Task Id : attempt_1453395961197_0017_m_000008_2, Status : FAILED
Application application_1453395961197_0017 initialization failed (exitCode=255) with output: main : command provided 0 main : user is foo main : requested yarn user is foo Path /mnt/sdb1/yarn/local/usercache/foo/appcache/application_1453395961197_0017 has permission 700 but needs permission 750. Path /var/hadoop/yarn/local/usercache/foo/appcache/application_1453395961197_0017 has permission 700 but needs permission 750.
Did not create any app directories Even when I change these directories permission to 750, I get errors.
Also these caches dont get cleaned off, after one job'and create collisons when running the next job.
Any insights ?
... View more
Labels:
01-18-2016
09:18 AM
We have a recently build 5 node HDP cluster, it is not in HA mode. Some time in the future there will be an requirement from the Unix team to apply patches/server maintenance. which would require rebooting the server machines. What is the best way to do it ? I plan to do the following : 1) Shut down all services using Ambari. 2) Shutdown ambari-agents on all nodes. 3) Shutdown ambari-server. 4) Reboot all nodes as required . 5) Restart ambari-server, agents and services in that order. Is this the correct sequence ? or am I missing anything .
... View more
Labels:
01-18-2016
09:10 AM
1 Kudo
Hi All, We were able to solve to issue, it was an issue with the host-names and ip addresses not being set correctly . Thanks for you replies @Neeraj Sabharwal @Artem Ervits @pankaj singh
... View more
01-12-2016
10:32 AM
1 Kudo
We are trying to install , HDP via Ambari 2.1, but we during the install process, the ambari server (ambari-server.log) reports that it has lost an heart beat of the agent. Error Message : Heartbeat lost from host amabri.agent.com The ambari-agent log reports: Failed to connect to https://amabri-server.com:8440/connection_info due to [Errno 111] Connection refused We are using openjdk 1.7 on RHEL 6.6 64 bit. Any pointer to the issue would help immensely ?
... View more
Labels: