Member since
10-07-2015
107
Posts
73
Kudos Received
23
Solutions
05-03-2017
02:09 PM
Maybe check whether you can access WebHDFS via Knox to see if your kinit user is accepted by Knox
... View more
02-22-2017
08:16 AM
What we see in a kerberised cluster is that Livy needs to be able to impersonate other roles. Then when Know forwards the request to Livy with "doAs=<authc-user>", livy starts job as the authenticated user. To be on the safe side, the knox rewrite rule also replaces the proxyUser with the authenticated user
... View more
12-08-2016
06:25 PM
2 Kudos
Configure Livy in Ambari
Until
https://github.com/jupyter-incubator/sparkmagic/issues/285 is fixed, set
livy.server.csrf_protection.enabled ==> false
in Ambari under
Spark Config - Advanced livy-conf Install Sparkmagic
Details see
https://github.com/jupyter-incubator/sparkmagic
Install Jupyter, if you don't already have it:
$ sudo -H pip install jupyter notebook ipython
Install Sparkmagic:
$ sudo -H pip install sparkmagic
Install Kernels:
$ pip show sparkmagic # check path, e.g /usr/local/lib/python2.7/site-packages
$ cd /usr/local/lib/python2.7/site-packages
$ jupyter-kernelspec install --user sparkmagic/kernels/sparkkernel
$ jupyter-kernelspec install --user sparkmagic/kernels/pysparkkernel
Install Sparkmagic widgets
$ sudo -H jupyter nbextension enable --py --sys-prefix widgetsnbextension
Create local Configuration
The configuration file is a json file stored under
~/.sparkmagic/config.json
To avoid timeouts connecting to HDP 2.5 it is important to add
"livy_server_heartbeat_timeout_seconds": 0
To ensure the Spark job will run on the cluster (livy default is local),
spark.master needs needs to be set to yarn-cluster. Therefore a conf object needs to be provided (here you can also add extra jars for the session):
"session_configs": {
"driverMemory": "2G",
"executorCores": 4,
"executorMemory": "8G",
"proxyUser": "bernhard",
"conf": {
"spark.master": "yarn-cluster",
"spark.jars.packages": "com.databricks:spark-csv_2.10:1.5.0"
}
}
The
proxyUser is the user the Livy session will run under.
Here is an example
config.json. Adapt and copy to ~/.sparkmagic Start Jupyter Notebooks
1) Start Jupyter:
$ cd <project-dir>
$ jupyter notebook
In Notebook Home select
New -> Spark or New -> PySpark or New -> Python
2) Load Sparkmagic:
Add into your Notebook after the Kernel started
In[ ]: %load_ext sparkmagic.magics
3) Create Endpoint
In[ ]: %manage_spark
This will open a connection widget
Username and password can be ignored in non secured clusters
4) Create a session:
When this is successful, create a session:
Note that it uses the created endpoint and under properties the configuration on the config.json.
When you see
Spark session is successfully started and Notes
Livy on HDP 2.5 currently does not return YARN Application ID
Jupyter session name provided under Create Session is notebook internal and not used by Livy Server on the cluster. Livy-Server will create sessions on YARN called livy-session-###, e.g. livy-session-10. The session in Jupyter will have session id ###, e.g. 10.
For multiline Scala code in the Notebook you have to add the dot at the end, as in
val df = sqlContext.read.
format("com.databricks.spark.csv").
option("header", "true").
option("inferSchema", "true").
load("/tmp/iris.csv")
For more details and example notebooks in Sparkmagic , see https://github.com/bernhard-42/Sparkmagic-on-HDP Credits
Thanks to Alex (@azeltov) for the discussions and debugging session
... View more
12-08-2016
06:25 PM
7 Kudos
Update 10.12.2016: Added filter to rewrite proxyUser as authenticated user
Update 25.01.2017: Improved service.xml and rewrite.xml 1 Configure a new service for Livy Server in Knox
1.1 Create a service definition
$ sudo mkdir -p /usr/hdp/current/knox-server/data/services/livy/0.1.0/
$ sudo chown -R knox:knox /usr/hdp/current/knox-server/data/services/livy
Create a file
/usr/hdp/current/knox-server/data/services/livy/0.1.0/service.xml with
<service role="LIVYSERVER" name="livy" version="0.1.0">
<routes>
<route path="/livy/v1/sessions">
<rewrite apply="LIVYSERVER/livy/addusername/inbound" to="request.body"/>
</route>
<route path="/livy/v1/**?**"/>
<route path="/livy/v1"/>
<route path="/livy/v1/"/>
</routes>
</service>
Note that the name "livy" attribute and the path .../services/livy/... need to be the same. The route /livy/v1/sessions is a special treatment for the POST request to create a Livy session. The request body e.g. looks like: {"driverMemory":"2G","executorCores":4,"executorMemory":"8G","proxyUser":"bernhard","conf":{"spark.master":"yarn-cluster","spark.jars.packages":"com.databricks:spark-csv_2.10:1.5.0"} Livy server will use proxUser to run the Spark session. To avoid that a user can provide here any user (e.g. a more privileged), Knox will need to rewrite the the json body to replace what so ever is the value of proxyUser is with the username of the authenticated user, see next section. 1.2 Create a rewrite rule definition
Create a file
/usr/hdp/current/knox-server/data/services/livy/0.1.0/rewrite.xml with
<rules>
<rule name="LIVYSERVER/livy/user-name">
<rewrite template="{$username}"/>
</rule>
<rule dir="IN" name="LIVYSERVER/livy/root/inbound" pattern="*://*:*/**/livy/v1">
<rewrite template="{$serviceUrl[LIVYSERVER]}"/>
</rule>
<rule dir="IN" name="LIVYSERVER/livy/path/inbound" pattern="*://*:*/**/livy/v1/{path=**}?{**}">
<rewrite template="{$serviceUrl[LIVYSERVER]}/{path=**}?{**}"/>
</rule>
<filter name="LIVYSERVER/livy/addusername/inbound">
<content type="*/json">
<apply path="$.proxyUser" rule="LIVYSERVER/livy/user-name"/>
</content>
</filter>
</rules>
Note: The "v1" is only introduced to allow calls to Livy-server without any path. (Seems to be a limitation in Knox that at least one path element nees to be present in mapped URL. The rule LIVYSERVER/livy/user-name and the filter LIVYSERVER/livy/addusername/inbound are responsible to inject the authenticated user name as described in the last section. 1.3 Publish the new service via Ambari
Goto
Knox Configuration and add at the end of Advanced Topology:
<topology>
...
<service>
<role>LIVYSERVER</role>
<url>http://<livy-server>:8998</url>
</service>
</topology>
2 Use with Sparkmagic
Sparkmagic can be configured to use Livy Server in HDP 2.5, see Using Jupyter with Sparkmagic and Livy Server on HDP 2.5
To connect via Knox, change endpoint definition in
%manage_spark
Just replace Livy server URL with Knox gateway URL
https://<knox-gateway>:8443/livy/v1
If Knox does not have a valid certificate for HTTPS requests, reconfigure Sparkmagic's config.json end set
"ignore_ssl_errors": false
Credits Thanks to Kevin Minder for the article About Adding a service to Apache Knox
... View more
07-07-2016
06:17 PM
4 Kudos
In order to submit jobs to Spark, so called "fat jars" (containing all dependencies) are quite useful. If you develop your code in Scala,
"sbt"
(http://www.scala-sbt.org) is a great choice to build your project. The following relies on the newest version, sbt 0.13
For fat jars you first need
"sbt-assembly" (https://github.com/sbt/sbt-assembly). Assuming you have the standard sbt folder structure, the easiest way is to add a file "assembly.sbt"
into the "project" folder containing one line
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3")
The project structure now looks like (most probably without the "target" folder which will be created upon building the project)
MyProject
+-- build.sbt
+-- project
| +-- assembly.sbt
+-- src
| +-- main
| +-- scala
| +-- MyProject.scala
+-- target
For building Spark Kafka Streaming jobs on HDP 2.4.2, this is the build file
"build.sbt"
name := "MyProject"
version := "0.1"
scalaVersion := "2.10.6"
resolvers += "Hortonworks Repository" at "http://repo.hortonworks.com/content/repositories/releases/"
resolvers += "Hortonworks Jetty Maven Repository" at "http://repo.hortonworks.com/content/repositories/jetty-hadoop/"
libraryDependencies ++= Seq(
"org.apache.spark" % "spark-streaming_2.10" % "1.6.1.2.4.2.0-258" % "provided",
"org.apache.spark" % "spark-streaming-kafka-assembly_2.10" % "1.6.1.2.4.2.0-258"
)
assemblyMergeStrategy in assembly := {
case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
case PathList("com", "squareup", xs @ _*) => MergeStrategy.last
case PathList("com", "sun", xs @ _*) => MergeStrategy.last
case PathList("com", "thoughtworks", xs @ _*) => MergeStrategy.last
case PathList("commons-beanutils", xs @ _*) => MergeStrategy.last
case PathList("commons-cli", xs @ _*) => MergeStrategy.last
case PathList("commons-collections", xs @ _*) => MergeStrategy.last
case PathList("commons-io", xs @ _*) => MergeStrategy.last
case PathList("io", "netty", xs @ _*) => MergeStrategy.last
case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
case PathList("javax", "xml", xs @ _*) => MergeStrategy.last
case PathList("org", "apache", xs @ _*) => MergeStrategy.last
case PathList("org", "codehaus", xs @ _*) => MergeStrategy.last
case PathList("org", "fusesource", xs @ _*) => MergeStrategy.last
case PathList("org", "mortbay", xs @ _*) => MergeStrategy.last
case PathList("org", "tukaani", xs @ _*) => MergeStrategy.last
case PathList("xerces", xs @ _*) => MergeStrategy.last
case PathList("xmlenc", xs @ _*) => MergeStrategy.last
case "about.html" => MergeStrategy.rename
case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
case "META-INF/mailcap" => MergeStrategy.last
case "META-INF/mimetypes.default" => MergeStrategy.last
case "plugin.properties" => MergeStrategy.last
case "log4j.properties" => MergeStrategy.last
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
1) The
"resolvers" section adds the Hortonworks repositories.
2) In
"libraryDependencies" you add Spark-Streaming (which will also load Spark-Core) and Spark-Kafka-Streaming jars. To avoid problems with Kafka dependencies it is best to use the "spark-streaming-kafka-assembly" fat jar.
Note that Spark-Streaming can be tagged as
"provided" (it is omitted from the jat jar), since it is automatically available when you submit a job .
3) Unfortunately a lot of libraries are imported twice due to the dependencies which leads to assembly errors. To overcome the issue, the
"assemblyMergeStrategy" section tells sbt assembly to always use the last one (which is from the spark jars). This list is handcrafted and might change in a new version of HDP. However the idea should be clear.
4) Assemble the project (if you call it the first time it will "download the internet" like maven)
sbt assembly
will create "target/scala-2.10/myproject-assembly-0.1.jar" 5) You can now submit it to Spark spark-submit --master yarn --deploy-mode client \
--class my.package.MyProject target/scala-2.10/myproject-assembly-0.1.jar
... View more
Labels:
04-18-2016
05:52 PM
5 Kudos
This is an extension of Starting Spark jobs directly via YARN REST API Assume the cluster is kerberized and the only access is Knox. Further assume that Knox uses Basic Authentication and we have user and password of the user to start the Spark job. The overall idea is to call curl with <<user>>:<<password>> as Basic Authentication
==> Knox (verifying user:password against LDAP or AD)
==> Resource Manager (YARN REST API) with kerberos principal 1) Create and distribute a keytab The user that should run the spark job also needs to have a kerberos principal. For this principle create a keytab on one machine: [root@KDC-HOST ~]$ kadmin
kadmin> xst -k /etc/security/keytabs/<<primaryName>>.keytab <<primaryName>>@<<REALM>> Use your REALM and an appropriate primaryName of your principle Then distribute this keytab to all other machines in the cluster, copy to /etc/security/keytabs and set permissions [root@CLUSTER-HOST ~]$ chown <<user>>:hadoop /etc/security/keytabs/<<primaryName>>.keytab
[root@CLUSTER-HOST ~]$ chmod 400 /etc/security/keytabs/<<primaryName>>.keytab Test the keytab on each machine [root@CLUSTER-HOST ~]$ kinit <<primaryName>>@<<REALM>> \
-k -t /etc/security/keytabs/<<primaryName>>.keytab
# There must be no password prompt!
[root@KDC-HOST ~]$ klist -l
# Principal name Cache name
# -------------- ----------
# <<primaryName>>@<<REALM>> FILE:/tmp/krb5cc_<<####>> 2) Test connection from the workstation outside the cluster a) HDFS: [MacBook simple-project]$ curl -s -k -u '<<user>>:<<password>>' \
https://$KNOX_SERVER:8443/gateway/default/webhdfs/v1/?op=GETFILESTATUS
# {
# "FileStatus": {
# "accessTime": 0,
# "blockSize": 0,
# "childrenNum": 9,
# "fileId": 16385,
# "group": "hdfs",
# "length": 0,
# "modificationTime": 1458070072105,
# "owner": "hdfs",
# "pathSuffix": "",
# "permission": "755",
# "replication": 0,
# "storagePolicy": 0,
# "type": "DIRECTORY"
# }
# } b) YARN: [MacBook simple-project]$ curl -s -k -u '<<user>>:<<password>>' -d '' \
https://$KNOX_SERVER:8443/gateway/default/resourcemanager/v1/cluster/apps/new-application
# {
# "application-id": "application_1460654399208_0004",
# "maximum-resource-capability": {
# "memory": 8192,
# "vCores": 3
# }
# } 3) Changes to spark-yarn.properties The following values need to changed added compared to Starting Spark jobs directly via YARN REST API: spark.history.kerberos.keytab=/etc/security/keytabs/spark.headless.keytabs
spark.history.kerberos.principal=spark-Demo@<<REALM>>
spark.yarn.keytab=/etc/security/keytabs/<<primaryName>>.keytab
spark.yarn.principal=<<primaryName>>@<<REALM>> 4) Changes to spark-yarn.json The following properties need to be added to the command attribute (before org.apache.spark.deploy.yarn.ApplicationMaster) compared to Starting Spark jobs directly via YARN REST API via YARN REST API: -Dspark.yarn.keytab=/etc/security/keytabs/<<primaryName>>.keytab \
-Dspark.yarn.principal=<<primaryName>>@<<REALM>> \
-Dspark.yarn.credentials.file=hdfs://<<name-node>>:8020/tmp/simple-project/credentials_4b023f93-fbde-48ff-b2c8-516251aeed52 \
-Dspark.history.kerberos.keytab=/etc/security/keytabs/spark.headless.keytabs \
-Dspark.history.kerberos.principal=spark-Demo@<<REALM>> \
-Dspark.history.kerberos.enabled=true credentials_4b023f93-fbde-48ff-b2c8-516251aeed52 is just a unique filename and the file does not need to exist. Concatenate "credentials" with an UUID4. This is the trigger for Spark to start a Delegation Token refresh thread. Details see attachment. 5) Submit a Job Same as in Starting Spark jobs directly via YARN REST API via YARN REST API, however one needs to provide -u <<user>>:<<password>> to the curl command to authenticate with Knox. After being authenticated by Knox, the keytabs for the following steps will be taken by YARN and Spark from the properties and job json file. 6) Know Issue After finishing the job successfully, the log aggregation status will continue to be "RUNNING" until it gets a "TIME_OUT" 7) More details Again, more details and a python script to ease the whole process can be found in Spark-Yarn-REST-API Repo Any comment to make this process easier is highly appreciated ...
... View more
Labels:
04-18-2016
05:20 PM
12 Kudos
There are situations, when one might want to submit a Spark job via a REST API:
If you want to submit Spark jobs from your IDE on our workstation outside the cluster
If the cluster can only be accessed via Knox (perimeter security)
One possibility is to use the Oozie REST API and the Oozie Spark action,
However, this article looks into the option of using the YARN REST API directly. Starting with the Cluster Applications API I tried to come up with an approach that resembles the spark-submit command.
1) Copy Spark assembly jar to HDFS
Per default the spark assembly jar file is not available in HDFS. For remote access we will need it. Some standard locations in HDP are:
HDP 2.3.2:
Version: 2.3.2.0-2950
Spark Jar: /usr/hdp/2.3.2.0-2950/spark/lib/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar
HDP 2.4.0:
Version: 2.4.0.0-169
Spark Jar: /usr/hdp/2.4.0.0-169/spark/lib/spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar
This is a one time preparation step, for example for HDP 2.4 it would be:
sudo su - hdfs
HDP_VERSION=2.4.0.0-169
SPARK_JAR=spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar
hdfs dfs -mkdir "/hdp/apps/${HDP_VERSION}/spark/"
hdfs dfs -put "/usr/hdp/${HDP_VERSION}/spark/lib/$SPARK_JAR" "/hdp/apps/${HDP_VERSION}/spark/spark-hdp-assembly.jar"
2) Upload your spark application jar file to HDFS
Upload your spark application jar file packaged by sbt to the project folder in HDFS via WebHdfs (maybe use something better than "/tmp"):
export APP_FILE=simple-project_2.10-1.0.jar
curl -X PUT "${WEBHDFS_HOST}:50070/webhdfs/v1/tmp/simple-project?op=MKDIRS"
curl -i -X PUT "${WEBHDFS_HOST}:50070/webhdfs/v1/tmp/simple-project/${APP_FILE}?op=CREATE&overwrite=true"
# take Location header from the response and issue a PUT request
LOCATION="http://..."
curl -i -X PUT -T "target/scala-2.10/${APP_FILE}" "${LOCATION}"
3) Create spark property file and upload to HDFS
spark.yarn.submit.file.replication=3
spark.yarn.executor.memoryOverhead=384
spark.yarn.driver.memoryOverhead=384
spark.master=yarn
spark.submit.deployMode=cluster
spark.eventLog.enabled=true
spark.yarn.scheduler.heartbeat.interval-ms=5000
spark.yarn.preserve.staging.files=true
spark.yarn.queue=default
spark.yarn.containerLauncherMaxThreads=25
spark.yarn.max.executor.failures=3
spark.executor.instances=2
spark.eventLog.dir=hdfs\:///spark-history
spark.history.kerberos.enabled=true
spark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider
spark.history.ui.port=18080
spark.history.fs.logDirectory=hdfs\:///spark-history
spark.executor.memory=2G
spark.executor.cores=2
spark.history.kerberos.keytab=none
spark.history.kerberos.principal=none
and upload it via WebHDFS as spark-yarn.properties to your simple-project folder as before 4) Create a Spark Job json file a) We need to construct the command to start the Spark ApplicationMaster:
java -server -Xmx1024m -Dhdp.version=2.4.0.0-169 \
-Dspark.yarn.app.container.log.dir=/hadoop/yarn/log/rest-api \
-Dspark.app.name=SimpleProject \
org.apache.spark.deploy.yarn.ApplicationMaster \
--class IrisApp --jar __app__.jar \
--arg '--class' --arg 'SimpleProject' \
1><LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppMaster.stderr
It is important to provide the Spark Application Name and the HDP Version. Spark will resolve <LOG_DIR> b) We need to set some general environment variables
JAVA_HOME="/usr/jdk64/jdk1.8.0_60/"
SPARK_YARN_MODE=true
HDP_VERSION="2.4.0.0-169"
Then we need to tell Spark which files to distribute across all Spark executors. Therefor we need to set 4 variables. One variable is of format "<hdfs path1>#<cache name 1>,<hdfs path2>#<cache name 2>, ...", and the three others contain comma separated timestamps, file sizes and visbility of each file (same order):
SPARK_YARN_CACHE_FILES: "hdfs://<<name-node>>:8020/tmp/simple-project/simple-project.jar#__app__.jar,hdfs://<<name-node>>:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar#__spark__.jar"
SPARK_YARN_CACHE_FILES_FILE_SIZES: "10588,191724610"
SPARK_YARN_CACHE_FILES_TIME_STAMPS: "1460990579987,1460219553714"
SPARK_YARN_CACHE_FILES_VISIBILITIES: "PUBLIC,PRIVATE"
Replace <<name-node>> with the correct address. File size and timestamp can be retrieved from HDFS vie WebHDFS.
Next, construct the classpath
CLASSPATH="{{PWD}}<CPS>__spark__.jar<CPS>{{PWD}}/__app__.jar<CPS>{{PWD}}/__app__.properties<CPS>{{HADOOP_CONF_DIR}}<CPS>/usr/hdp/current/hadoop-client/*<CPS>/usr/hdp/current/hadoop-client/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>{{PWD}}/mr-framework/hadoop/share/hadoop/common/*<CPS>{{PWD}}/mr-framework/hadoop/share/hadoop/common/lib/*<CPS>{{PWD}}/mr-framework/hadoop/share/hadoop/yarn/*<CPS>{{PWD}}/mr-framework/hadoop/share/hadoop/yarn/lib/*<CPS>{{PWD}}/mr-framework/hadoop/share/hadoop/hdfs/*<CPS>{{PWD}}/mr-framework/hadoop/share/hadoop/hdfs/lib/*<CPS>{{PWD}}/mr-framework/hadoop/share/hadoop/tools/lib/*<CPS>/usr/hdp/2.4.0.0-169/hadoop/lib/hadoop-lzo-0.6.0.2.4.0.0-169.jar<CPS>/etc/hadoop/conf/secure<CPS>"
Notes:
- __spark__.jar and __app__.jar are the same as provided in SPARK_YARN_CACHE_FILES
- Spark will resolve <CPS> to `:` c) Create the Spark job json file
The information above will be added to the Spark json file as the command and environment attribute (details see attachment - remove the .txt ending)
The last missing piece are the so called local_resources which describes all files in HDFS necessary for the Spark job:
- Spark assembly jar (as in the caching environment variable)
- Spark application jar for this project (as in the caching environment variable)
- Spark properties file for this project (only for Application Master, no caching necessary)
All three need to be given in a form
{
"key": "__app__.jar",
"value": {
"resource": "hdfs://<<name-node>>:8020/tmp/simple-project/simple-project.jar",
"size": 10588,
"timestamp": 1460990579987,
"type": "FILE",
"visibility": "APPLICATION"
}
},
Again, replace <<name-node>>. Timestamp, hdfs path, size and key need to be the same as for the caching environment variables.
Save it as spark-yarn.json (details see attachment - remove the .txt ending) 5) Submit the job First request an application ID from YARN curl -s -X POST -d '' \
https://$KNOX_SERVER:8443/gateway/default/resourcemanager/v1/cluster/apps/new-application
# {
# "application-id": "application_1460195242962_0054",
# "maximum-resource-capability": {
# "memory": 8192,
# "vCores": 3
# }
# } Edit the "application-id" in spark-yarn.json and then submit the job:
curl -s -i -X POST -H "Content-Type: application/json" ${HADOOP_RM}/ws/v1/cluster/apps \
--data-binary spark-yarn.json
# HTTP/1.1 100 Continue
#
# HTTP/1.1 202 Accepted
# Cache-Control: no-cache
# Expires: Sun, 10 Apr 2016 13:02:47 GMT
# Date: Sun, 10 Apr 2016 13:02:47 GMT
# Pragma: no-cache
# Expires: Sun, 10 Apr 2016 13:02:47 GMT
# Date: Sun, 10 Apr 2016 13:02:47 GMT
# Pragma: no-cache
# Content-Type: application/json
# Location: http://<<resource-manager>>:8088/ws/v1/cluster/apps/application_1460195242962_0054
# Content-Length: 0
# Server: Jetty(6.1.26.hwx)
6) Track the job curl -s "http://<<resource-manager>>:8088/ws/v1/cluster/apps/application_1460195242962_0054"
# {
# "app": {
# "id": "application_1460195242962_0054",
# "user": "dr.who",
# "name": "IrisApp",
# "queue": "default",
# "state": "FINISHED",
# "finalStatus": "SUCCEEDED",
# "progress": 100,
# "trackingUI": "History",
# "trackingUrl": "http://<<ResourceManager>>:8088/proxy/application_1460195242962_0054/",
# "diagnostics": "",
# "clusterId": 1460195242962,
# "applicationType": "YARN",
# "applicationTags": "",
# "startedTime": 1460293367576,
# "finishedTime": 1460293413568,
# "elapsedTime": 45992,
# "amContainerLogs": "http://<<node-manager>>:8042/node/containerlogs/container_e29_1460195242962_0054_01_000001/dr.who",
# "amHostHttpAddress": "<<node-manager>>:8042",
# "allocatedMB": -1,
# "allocatedVCores": -1,
# "runningContainers": -1,
# "memorySeconds": 172346,
# "vcoreSeconds": 112,
# "queueUsagePercentage": 0,
# "clusterUsagePercentage": 0,
# "preemptedResourceMB": 0,
# "preemptedResourceVCores": 0,
# "numNonAMContainerPreempted": 0,
# "numAMContainerPreempted": 0,
# "logAggregationStatus": "SUCCEEDED"
# }
# } 7) Using Knox (without kerberos) The whole process works with Knox, just replace the WebHdfs and Resource Manager URLs with Knox substitutes: a) Resource Manager:
http://<<resource-manager>>:8088/ws/v1 ==> https://<<knox-gateway>>:8443/gateway/default/resourcemanager/v1 b) Webhdfs Host http://<<webhdfs-host>>:50070/webhdfs/v1 ==> https://<<knox-gateway>>:8443/gateway/default/webhdfs/v1 Additionally you need to provide Knox credentials (e.g. Basic Authentication <<user>:<<password>>) 😎 More details More details and a python script to ease the whole process can be found in Spark-Yarn-REST-API Repo Any comment to make this process easier is highly appreciated ...
... View more
Labels:
10-16-2015
11:18 AM
4 Kudos
I ran into some issues using the latest iODBC 3.52.10 version from www.iodbc.org (mxkozzzz.dmg) Instead of ~/.odbc.ini
~/.odbcinst.ini
~/.hortonworks.hiveodbc.ini it uses (without leading dots): ~/Library/ODBC/odbc.ini
~/Library/ODBC/odbcinst.ini
~/Library/ODBC/hortonworks.hiveodbc.ini Note: iODBCwill link ~/.odbc.ini and ~/.odbcinst.ini to the ~/Library/ODBC/ versions My setup: ~/Library/ODBC/odbcinst.ini - edit at least host, port and user
~/Library/ODBC/odbcinst.ini - keep as is
~/Library/ODBC/hortonworks.hiveodbc.ini - Change ODBCInstLib to the fully qualified path of iODBC's libiodbcinst.dylib under /usr/local/iODBC/lib/ [Driver]
## - Note that this default DriverManagerEncoding of UTF-32 is for iODBC.
DriverManagerEncoding=UTF-32
ErrorMessagesPath=/usr/lib/hive/lib/native/hiveodbc/ErrorMessages/
LogLevel=0
LogPath=
SwapFilePath=/tmp
# iODBC
ODBCInstLib=/usr/local/iODBC/lib/libiodbcinst.dylib {ignore the "5." in the output above - no idea how to get rid of it}
... View more
Labels: