Member since
04-10-2018
9
Posts
0
Kudos Received
0
Solutions
08-29-2018
05:09 PM
I solved it restartind intelij, now the dependecy of libraryDependencies +="org.apache.oozie"%"oozie-client"%"5.0.0" is in the project.
... View more
08-28-2018
10:53 PM
Hello all
I am would like some help to find information, tutorial, blogs, etc, about how to invoke the oozie rest api in a kerberized cluster, I am available to invoke the oozie rest api from shell with the command: curl -i --negotiate -u : dataloaderusr -X POST -H "Content-Type: application/xml" -d @/tmp/workflow.xml http://hdfshadoop:11000/oozie/v2/jobs?action=start I am trying to do the same request from my app, the app is build using playframework 2.6.15, I am searching information about, and I found 2 libraries: 1) pac4j 2) using org.apache.oozie.client.AuthOozieClient But I didn't find examples about how to do that request, I am thinking that will be easier with AuthOozieClient, but I couldn't been able to add the dependeci library using sbt, I have this: libraryDependencies += "org.apache.oozie" % "oozie-core" % "5.0.0"
libraryDependencies += "org.apache.oozie" % "oozie-client" % "5.0.0" class OozieWrapper {
var wc: AuthOozieClient
var OOZIE_SERVER_URL = "http://localhost:11000/oozie"
def OozieWrapper(oozieUrlStr: String) = {
val oozieURL = new URL(oozieUrlStr)
wc = new AuthOozieClient(oozieURL.toString)
}
}
I am getting this error: OozieWrapper.scala:15:14: not found: type AuthOozieClient I think that this should be a very popular topic, but I can't find examples, information or documentation about how make it. Some help will be great. Thank you so much.
... View more
Labels:
- Labels:
-
Apache Oozie
07-17-2018
04:25 PM
The probleam was solved adding the next line to assembly.sbt addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6") And in build.sbt libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.2" % "provided"
libraryDependencies +="org.apache.spark" % "spark-sql_2.10" % "1.6.2" % "provided"
libraryDependencies +="org.apache.spark" % "spark-hive_2.10" % "1.6.2" % "provided" thank you so much
... View more
04-27-2018
10:18 PM
Hello All I am having problems to run an oozie spark action, with spark-submit runs ok, this is the command spark-submit --driver-class-path=/filesystem/path/terajdbc4.jar:/filesystem/path/tdgssconfig.jar --class myapp.MainClass my-app-1.jar Now I created the ozzie action, this is: <action name="cargarTD">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>/some_path/generic/hive-site.xml</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>oozie.use.system.libpath</name>
<value>true</value>
</property>
<property>
<name>oozie.libpath</name>
<value>/some_other_path/oozie/share/lib/lib_20170127130322/hive</value>
</property>
</configuration>
<master>yarn-master</master>
<mode>cluster</mode>
<name>Test spark oozie</name>
<class>myapp.MainClass</class>
<jar>hdfs://mycompanyhdfs/myAppHdfsPath/jar/my-app-1.jar</jar>
<spark-opts>--driver-class-path hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar:hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar: --conf "spark.executor.extraClassPath=hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar:hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar" --conf "spark.driver.extraClassPath=hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar:hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar"</spark-opts>
</spark>
<ok to="End"/>
<error to="Kill"/>
</action>
i run it, and have this error:
18/04/27 16:50:58 INFO ExecutorRunnable:
===============================================================================
YARN executor launch context:
env:
CLASSPATH -> hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar:hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar spark.driver.extraClassPath=hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar:hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar<CPS>{{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CONF_DIR<CPS>/usr/hdp/current/hadoop-client/*<CPS>/usr/hdp/current/hadoop-client/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/2.5.0.0-1245/hadoop/lib/hadoop-lzo-0.6.0.2.5.0.0-1245.jar:/etc/hadoop/conf/secure
SPARK_YARN_CACHE_ARCHIVES -> hdfs://mycompanyhdfs/user/dataloaderusr/.sparkStaging/application_1524487536165_7870/__spark_conf__1770230405461321022.zip#__spark_conf__
SPARK_LOG_URL_STDERR -> http://probighhww006:8042/node/containerlogs/container_e321_1524487536165_7870_01_000003/dataloaderusr/stderr?start=-4096
SPARK_YARN_CACHE_FILES_FILE_SIZES -> 188727178,39194
SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1524487536165_7870
SPARK_USER -> dataloaderusr
SPARK_YARN_CACHE_ARCHIVES_FILE_SIZES -> 141577
SPARK_YARN_CACHE_FILES_VISIBILITIES -> PUBLIC,PRIVATE
SPARK_YARN_CACHE_ARCHIVES_TIME_STAMPS -> 1524865844229
SPARK_YARN_MODE -> true
SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1485286912562,1524773043221
SPARK_LOG_URL_STDOUT -> http://probighhww006:8042/node/containerlogs/container_e321_1524487536165_7870_01_000003/dataloaderusr/stdout?start=-4096
SPARK_YARN_CACHE_ARCHIVES_VISIBILITIES -> PRIVATE
SPARK_YARN_CACHE_FILES -> hdfs://mycompanyhdfs/hdp/apps/2.5.0.0-1245/spark/spark-hdp-assembly.jar#__spark__.jar,hdfs://mycompanyhdfs/myAppHdfsPath/jar/my-app-1.jar.jar#__app__.jar
command:
{{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms1024m -Xmx1024m '-Dlog4j.configuration=spark-log4j.properties' -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.driver.port=46512' '-Dspark.ui.port=0' -Dspark.yarn.app.container.log.dir=<LOG_DIR> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@30.30.30.27:46512 --executor-id 2 --hostname probighhww006 --cores 1 --app-id application_1524487536165_7870 --user-class-path file:$PWD/__app__.jar 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
===============================================================================
18/04/27 16:50:58 INFO ContainerManagementProtocolProxy: Opening proxy : probighhww006:45454
18/04/27 16:51:01 INFO AMRMClientImpl: Received new token for : probighhww003:45454
18/04/27 16:51:01 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 0 of them.
18/04/27 16:51:02 INFO YarnClusterSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (probighhww012:34654) with ID 1
18/04/27 16:51:02 INFO BlockManagerMasterEndpoint: Registering block manager probighhww012:39342 with 511.1 MB RAM, BlockManagerId(1, probighhww012, 39342)
18/04/27 16:51:03 INFO YarnClusterSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (probighhww006:41530) with ID 2
18/04/27 16:51:03 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
18/04/27 16:51:03 INFO YarnClusterScheduler: YarnClusterScheduler.postStartHook done
java.lang.ClassNotFoundException: com.teradata.jdbc.TeraDriver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
I am not sure why the driver is not in the executor classpath, I don't like put the teradata jdbc jar in the filesystem of all nodes. I have tried with <mode>client</mode> but notingh. ¿Some one know what can be hapenning? ¿Is there an option to config that only run in the master node, so I can put the jars in a filesystem folder of master node? Thank you so much
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Spark
-
Apache YARN
04-17-2018
07:32 PM
Thank you for your help. Y have changed to --output-method batch.insert and run ok, but it takes too many time (all the day). I am looking for alternatives to load 20 millons of rows. I am trying to create a spark/scala/sqoop script, so that using jdbc and sqoop make a parallel load in multiple tables, and exporting all the partitions of the table. At the side of Teradata i will make a View to consult all the exported tables. I will try also the fast load for csv files using de jdbc driver instance of sqoop. Thank you
... View more
04-12-2018
10:02 PM
Hello All We are trying to export from Hive to teradata using this command sqoop export -Dorg.apache.sqoop.export.text.dump_data_on_error=true -Dhadoop.security.logger=DEBUG,NullAppender -Dhadoop.root.logger=DEBUG,console -Dsqoop.export.records.per.statement=20 -Dsqoop.export.statements.per.transaction=20 \
--driver com.teradata.jdbc.TeraDriver \
--connect jdbc:teradata://xxx.xxx.xxx.xxx/teradata_database \
--username dwh_tbda \
--password Teradata_2017 \
--table teratada_table \
--export-dir "hdfs_file" \
--input-null-non-string '\\N' \
--input-null-string '\\N' \
--input-fields-terminated-by '|' \
--num-mappers 20 \
--verbose \
--direct \
-- --output-method internal.fastload It works ok some time, the results are random and some times export all the records, some times 0 records, or any value beetwen 0 and total. The yarn log is like : 0b1f030 sess=0 java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at com.teradata.jdbc.jdbc_4.io.TDNetworkIOIF$ConnectThread.run(TDNetworkIOIF.java:1229)
at org.apache.sqoop.mapreduce.ExportOutputFormat.getRecordWriter(ExportOutputFormat.java:79)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.sql.SQLException: [Teradata JDBC Driver] [TeraJDBC 16.00.00.23] [Error 1277] [SQLState 08S01] Login timeout for Connection to 10.80.4.20 Wed Apr 11 16:39:10 COT 2018 socket orig=10.80.4.20 cid=50b1f030 sess=0 java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at com.teradata.jdbc.jdbc_4.io.TDNetworkIOIF$ConnectThread.run(TDNetworkIOIF.java:1229) Some one can suggets a reason to explain whats is happening? Thank you
... View more
Labels:
- Labels:
-
Apache Sqoop