About bwalter1

bwalter1 · ‎05-03-2017

Maybe check whether you can access WebHDFS via Knox to see if your kinit user is accepted by Knox

bwalter1 · ‎02-22-2017

What we see in a kerberised cluster is that Livy needs to be able to impersonate other roles. Then when Know forwards the request to Livy with "doAs=<authc-user>", livy starts job as the authenticated user. To be on the safe side, the knox rewrite rule also replaces the proxyUser with the authenticated user

bwalter1 · ‎12-08-2016

Configure Livy in Ambari Until https://github.com/jupyter-incubator/sparkmagic/issues/285 is fixed, set livy.server.csrf_protection.enabled ==> false in Ambari under Spark Config - Advanced livy-conf Install Sparkmagic Details see https://github.com/jupyter-incubator/sparkmagic Install Jupyter, if you don't already have it: $ sudo -H pip install jupyter notebook ipython Install Sparkmagic: $ sudo -H pip install sparkmagic Install Kernels: $ pip show sparkmagic # check path, e.g /usr/local/lib/python2.7/site-packages $ cd /usr/local/lib/python2.7/site-packages $ jupyter-kernelspec install --user sparkmagic/kernels/sparkkernel $ jupyter-kernelspec install --user sparkmagic/kernels/pysparkkernel Install Sparkmagic widgets $ sudo -H jupyter nbextension enable --py --sys-prefix widgetsnbextension Create local Configuration The configuration file is a json file stored under ~/.sparkmagic/config.json To avoid timeouts connecting to HDP 2.5 it is important to add "livy_server_heartbeat_timeout_seconds": 0 To ensure the Spark job will run on the cluster (livy default is local), spark.master needs needs to be set to yarn-cluster. Therefore a conf object needs to be provided (here you can also add extra jars for the session): "session_configs": { "driverMemory": "2G", "executorCores": 4, "executorMemory": "8G", "proxyUser": "bernhard", "conf": { "spark.master": "yarn-cluster", "spark.jars.packages": "com.databricks:spark-csv_2.10:1.5.0" } } The proxyUser is the user the Livy session will run under. Here is an example config.json. Adapt and copy to ~/.sparkmagic Start Jupyter Notebooks 1) Start Jupyter: $ cd <project-dir> $ jupyter notebook In Notebook Home select New -> Spark or New -> PySpark or New -> Python 2) Load Sparkmagic: Add into your Notebook after the Kernel started In[ ]: %load_ext sparkmagic.magics 3) Create Endpoint In[ ]: %manage_spark This will open a connection widget Username and password can be ignored in non secured clusters 4) Create a session: When this is successful, create a session: Note that it uses the created endpoint and under properties the configuration on the config.json. When you see Spark session is successfully started and Notes Livy on HDP 2.5 currently does not return YARN Application ID Jupyter session name provided under Create Session is notebook internal and not used by Livy Server on the cluster. Livy-Server will create sessions on YARN called livy-session-###, e.g. livy-session-10. The session in Jupyter will have session id ###, e.g. 10. For multiline Scala code in the Notebook you have to add the dot at the end, as in val df = sqlContext.read. format("com.databricks.spark.csv"). option("header", "true"). option("inferSchema", "true"). load("/tmp/iris.csv") For more details and example notebooks in Sparkmagic , see https://github.com/bernhard-42/Sparkmagic-on-HDP Credits Thanks to Alex (@azeltov) for the discussions and debugging session

bwalter1 · ‎12-08-2016

Update 10.12.2016: Added filter to rewrite proxyUser as authenticated user Update 25.01.2017: Improved service.xml and rewrite.xml 1 Configure a new service for Livy Server in Knox 1.1 Create a service definition $ sudo mkdir -p /usr/hdp/current/knox-server/data/services/livy/0.1.0/ $ sudo chown -R knox:knox /usr/hdp/current/knox-server/data/services/livy Create a file /usr/hdp/current/knox-server/data/services/livy/0.1.0/service.xml with <service role="LIVYSERVER" name="livy" version="0.1.0"> <routes> <route path="/livy/v1/sessions"> <rewrite apply="LIVYSERVER/livy/addusername/inbound" to="request.body"/> </route> <route path="/livy/v1/**?**"/> <route path="/livy/v1"/> <route path="/livy/v1/"/> </routes> </service> Note that the name "livy" attribute and the path .../services/livy/... need to be the same. The route /livy/v1/sessions is a special treatment for the POST request to create a Livy session. The request body e.g. looks like: {"driverMemory":"2G","executorCores":4,"executorMemory":"8G","proxyUser":"bernhard","conf":{"spark.master":"yarn-cluster","spark.jars.packages":"com.databricks:spark-csv_2.10:1.5.0"} Livy server will use proxUser to run the Spark session. To avoid that a user can provide here any user (e.g. a more privileged), Knox will need to rewrite the the json body to replace what so ever is the value of proxyUser is with the username of the authenticated user, see next section. 1.2 Create a rewrite rule definition Create a file /usr/hdp/current/knox-server/data/services/livy/0.1.0/rewrite.xml with <rules> <rule name="LIVYSERVER/livy/user-name"> <rewrite template="{$username}"/> </rule> <rule dir="IN" name="LIVYSERVER/livy/root/inbound" pattern="*://*:*/**/livy/v1"> <rewrite template="{$serviceUrl[LIVYSERVER]}"/> </rule> <rule dir="IN" name="LIVYSERVER/livy/path/inbound" pattern="*://*:*/**/livy/v1/{path=**}?{**}"> <rewrite template="{$serviceUrl[LIVYSERVER]}/{path=**}?{**}"/> </rule> <filter name="LIVYSERVER/livy/addusername/inbound"> <content type="*/json"> <apply path="$.proxyUser" rule="LIVYSERVER/livy/user-name"/> </content> </filter> </rules> Note: The "v1" is only introduced to allow calls to Livy-server without any path. (Seems to be a limitation in Knox that at least one path element nees to be present in mapped URL. The rule LIVYSERVER/livy/user-name and the filter LIVYSERVER/livy/addusername/inbound are responsible to inject the authenticated user name as described in the last section. 1.3 Publish the new service via Ambari Goto Knox Configuration and add at the end of Advanced Topology: <topology> ... <service> <role>LIVYSERVER</role> <url>http://<livy-server>:8998</url> </service> </topology> 2 Use with Sparkmagic Sparkmagic can be configured to use Livy Server in HDP 2.5, see Using Jupyter with Sparkmagic and Livy Server on HDP 2.5 To connect via Knox, change endpoint definition in %manage_spark Just replace Livy server URL with Knox gateway URL https://<knox-gateway>:8443/livy/v1 If Knox does not have a valid certificate for HTTPS requests, reconfigure Sparkmagic's config.json end set "ignore_ssl_errors": false Credits Thanks to Kevin Minder for the article About Adding a service to Apache Knox

bwalter1 · ‎07-07-2016

In order to submit jobs to Spark, so called "fat jars" (containing all dependencies) are quite useful. If you develop your code in Scala, "sbt" (http://www.scala-sbt.org) is a great choice to build your project. The following relies on the newest version, sbt 0.13 For fat jars you first need "sbt-assembly" (https://github.com/sbt/sbt-assembly). Assuming you have the standard sbt folder structure, the easiest way is to add a file "assembly.sbt" into the "project" folder containing one line addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3") The project structure now looks like (most probably without the "target" folder which will be created upon building the project) MyProject +-- build.sbt +-- project | +-- assembly.sbt +-- src | +-- main | +-- scala | +-- MyProject.scala +-- target For building Spark Kafka Streaming jobs on HDP 2.4.2, this is the build file "build.sbt" name := "MyProject" version := "0.1" scalaVersion := "2.10.6" resolvers += "Hortonworks Repository" at "http://repo.hortonworks.com/content/repositories/releases/" resolvers += "Hortonworks Jetty Maven Repository" at "http://repo.hortonworks.com/content/repositories/jetty-hadoop/" libraryDependencies ++= Seq( "org.apache.spark" % "spark-streaming_2.10" % "1.6.1.2.4.2.0-258" % "provided", "org.apache.spark" % "spark-streaming-kafka-assembly_2.10" % "1.6.1.2.4.2.0-258" ) assemblyMergeStrategy in assembly := { case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last case PathList("com", "squareup", xs @ _*) => MergeStrategy.last case PathList("com", "sun", xs @ _*) => MergeStrategy.last case PathList("com", "thoughtworks", xs @ _*) => MergeStrategy.last case PathList("commons-beanutils", xs @ _*) => MergeStrategy.last case PathList("commons-cli", xs @ _*) => MergeStrategy.last case PathList("commons-collections", xs @ _*) => MergeStrategy.last case PathList("commons-io", xs @ _*) => MergeStrategy.last case PathList("io", "netty", xs @ _*) => MergeStrategy.last case PathList("javax", "activation", xs @ _*) => MergeStrategy.last case PathList("javax", "xml", xs @ _*) => MergeStrategy.last case PathList("org", "apache", xs @ _*) => MergeStrategy.last case PathList("org", "codehaus", xs @ _*) => MergeStrategy.last case PathList("org", "fusesource", xs @ _*) => MergeStrategy.last case PathList("org", "mortbay", xs @ _*) => MergeStrategy.last case PathList("org", "tukaani", xs @ _*) => MergeStrategy.last case PathList("xerces", xs @ _*) => MergeStrategy.last case PathList("xmlenc", xs @ _*) => MergeStrategy.last case "about.html" => MergeStrategy.rename case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last case "META-INF/mailcap" => MergeStrategy.last case "META-INF/mimetypes.default" => MergeStrategy.last case "plugin.properties" => MergeStrategy.last case "log4j.properties" => MergeStrategy.last case x => val oldStrategy = (assemblyMergeStrategy in assembly).value oldStrategy(x) } 1) The "resolvers" section adds the Hortonworks repositories. 2) In "libraryDependencies" you add Spark-Streaming (which will also load Spark-Core) and Spark-Kafka-Streaming jars. To avoid problems with Kafka dependencies it is best to use the "spark-streaming-kafka-assembly" fat jar. Note that Spark-Streaming can be tagged as "provided" (it is omitted from the jat jar), since it is automatically available when you submit a job . 3) Unfortunately a lot of libraries are imported twice due to the dependencies which leads to assembly errors. To overcome the issue, the "assemblyMergeStrategy" section tells sbt assembly to always use the last one (which is from the spark jars). This list is handcrafted and might change in a new version of HDP. However the idea should be clear. 4) Assemble the project (if you call it the first time it will "download the internet" like maven) sbt assembly will create "target/scala-2.10/myproject-assembly-0.1.jar" 5) You can now submit it to Spark spark-submit --master yarn --deploy-mode client \ --class my.package.MyProject target/scala-2.10/myproject-assembly-0.1.jar

bwalter1 · ‎04-18-2016

This is an extension of Starting Spark jobs directly via YARN REST API Assume the cluster is kerberized and the only access is Knox. Further assume that Knox uses Basic Authentication and we have user and password of the user to start the Spark job. The overall idea is to call curl with <<user>>:<<password>> as Basic Authentication ==> Knox (verifying user:password against LDAP or AD) ==> Resource Manager (YARN REST API) with kerberos principal 1) Create and distribute a keytab The user that should run the spark job also needs to have a kerberos principal. For this principle create a keytab on one machine: [root@KDC-HOST ~]$ kadmin kadmin> xst -k /etc/security/keytabs/<<primaryName>>.keytab <<primaryName>>@<<REALM>> Use your REALM and an appropriate primaryName of your principle Then distribute this keytab to all other machines in the cluster, copy to /etc/security/keytabs and set permissions [root@CLUSTER-HOST ~]$ chown <<user>>:hadoop /etc/security/keytabs/<<primaryName>>.keytab [root@CLUSTER-HOST ~]$ chmod 400 /etc/security/keytabs/<<primaryName>>.keytab Test the keytab on each machine [root@CLUSTER-HOST ~]$ kinit <<primaryName>>@<<REALM>> \ -k -t /etc/security/keytabs/<<primaryName>>.keytab # There must be no password prompt! [root@KDC-HOST ~]$ klist -l # Principal name Cache name # -------------- ---------- # <<primaryName>>@<<REALM>> FILE:/tmp/krb5cc_<<####>> 2) Test connection from the workstation outside the cluster a) HDFS: [MacBook simple-project]$ curl -s -k -u '<<user>>:<<password>>' \ https://$KNOX_SERVER:8443/gateway/default/webhdfs/v1/?op=GETFILESTATUS # { # "FileStatus": { # "accessTime": 0, # "blockSize": 0, # "childrenNum": 9, # "fileId": 16385, # "group": "hdfs", # "length": 0, # "modificationTime": 1458070072105, # "owner": "hdfs", # "pathSuffix": "", # "permission": "755", # "replication": 0, # "storagePolicy": 0, # "type": "DIRECTORY" # } # } b) YARN: [MacBook simple-project]$ curl -s -k -u '<<user>>:<<password>>' -d '' \ https://$KNOX_SERVER:8443/gateway/default/resourcemanager/v1/cluster/apps/new-application # { # "application-id": "application_1460654399208_0004", # "maximum-resource-capability": { # "memory": 8192, # "vCores": 3 # } # } 3) Changes to spark-yarn.properties The following values need to changed added compared to Starting Spark jobs directly via YARN REST API: spark.history.kerberos.keytab=/etc/security/keytabs/spark.headless.keytabs spark.history.kerberos.principal=spark-Demo@<<REALM>> spark.yarn.keytab=/etc/security/keytabs/<<primaryName>>.keytab spark.yarn.principal=<<primaryName>>@<<REALM>> 4) Changes to spark-yarn.json The following properties need to be added to the command attribute (before org.apache.spark.deploy.yarn.ApplicationMaster) compared to Starting Spark jobs directly via YARN REST API via YARN REST API: -Dspark.yarn.keytab=/etc/security/keytabs/<<primaryName>>.keytab \ -Dspark.yarn.principal=<<primaryName>>@<<REALM>> \ -Dspark.yarn.credentials.file=hdfs://<<name-node>>:8020/tmp/simple-project/credentials_4b023f93-fbde-48ff-b2c8-516251aeed52 \ -Dspark.history.kerberos.keytab=/etc/security/keytabs/spark.headless.keytabs \ -Dspark.history.kerberos.principal=spark-Demo@<<REALM>> \ -Dspark.history.kerberos.enabled=true credentials_4b023f93-fbde-48ff-b2c8-516251aeed52 is just a unique filename and the file does not need to exist. Concatenate "credentials" with an UUID4. This is the trigger for Spark to start a Delegation Token refresh thread. Details see attachment. 5) Submit a Job Same as in Starting Spark jobs directly via YARN REST API via YARN REST API, however one needs to provide -u <<user>>:<<password>> to the curl command to authenticate with Knox. After being authenticated by Knox, the keytabs for the following steps will be taken by YARN and Spark from the properties and job json file. 6) Know Issue After finishing the job successfully, the log aggregation status will continue to be "RUNNING" until it gets a "TIME_OUT" 7) More details Again, more details and a python script to ease the whole process can be found in Spark-Yarn-REST-API Repo Any comment to make this process easier is highly appreciated ...

bwalter1 · ‎04-18-2016

There are situations, when one might want to submit a Spark job via a REST API: If you want to submit Spark jobs from your IDE on our workstation outside the cluster If the cluster can only be accessed via Knox (perimeter security) One possibility is to use the Oozie REST API and the Oozie Spark action, However, this article looks into the option of using the YARN REST API directly. Starting with the Cluster Applications API I tried to come up with an approach that resembles the spark-submit command. 1) Copy Spark assembly jar to HDFS Per default the spark assembly jar file is not available in HDFS. For remote access we will need it. Some standard locations in HDP are: HDP 2.3.2: Version: 2.3.2.0-2950 Spark Jar: /usr/hdp/2.3.2.0-2950/spark/lib/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar HDP 2.4.0: Version: 2.4.0.0-169 Spark Jar: /usr/hdp/2.4.0.0-169/spark/lib/spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar This is a one time preparation step, for example for HDP 2.4 it would be: sudo su - hdfs HDP_VERSION=2.4.0.0-169 SPARK_JAR=spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar hdfs dfs -mkdir "/hdp/apps/${HDP_VERSION}/spark/" hdfs dfs -put "/usr/hdp/${HDP_VERSION}/spark/lib/$SPARK_JAR" "/hdp/apps/${HDP_VERSION}/spark/spark-hdp-assembly.jar" 2) Upload your spark application jar file to HDFS Upload your spark application jar file packaged by sbt to the project folder in HDFS via WebHdfs (maybe use something better than "/tmp"): export APP_FILE=simple-project_2.10-1.0.jar curl -X PUT "${WEBHDFS_HOST}:50070/webhdfs/v1/tmp/simple-project?op=MKDIRS" curl -i -X PUT "${WEBHDFS_HOST}:50070/webhdfs/v1/tmp/simple-project/${APP_FILE}?op=CREATE&overwrite=true" # take Location header from the response and issue a PUT request LOCATION="http://..." curl -i -X PUT -T "target/scala-2.10/${APP_FILE}" "${LOCATION}" 3) Create spark property file and upload to HDFS spark.yarn.submit.file.replication=3 spark.yarn.executor.memoryOverhead=384 spark.yarn.driver.memoryOverhead=384 spark.master=yarn spark.submit.deployMode=cluster spark.eventLog.enabled=true spark.yarn.scheduler.heartbeat.interval-ms=5000 spark.yarn.preserve.staging.files=true spark.yarn.queue=default spark.yarn.containerLauncherMaxThreads=25 spark.yarn.max.executor.failures=3 spark.executor.instances=2 spark.eventLog.dir=hdfs\:///spark-history spark.history.kerberos.enabled=true spark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider spark.history.ui.port=18080 spark.history.fs.logDirectory=hdfs\:///spark-history spark.executor.memory=2G spark.executor.cores=2 spark.history.kerberos.keytab=none spark.history.kerberos.principal=none and upload it via WebHDFS as spark-yarn.properties to your simple-project folder as before 4) Create a Spark Job json file a) We need to construct the command to start the Spark ApplicationMaster: java -server -Xmx1024m -Dhdp.version=2.4.0.0-169 \ -Dspark.yarn.app.container.log.dir=/hadoop/yarn/log/rest-api \ -Dspark.app.name=SimpleProject \ org.apache.spark.deploy.yarn.ApplicationMaster \ --class IrisApp --jar __app__.jar \ --arg '--class' --arg 'SimpleProject' \ 1><LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppMaster.stderr It is important to provide the Spark Application Name and the HDP Version. Spark will resolve <LOG_DIR> b) We need to set some general environment variables JAVA_HOME="/usr/jdk64/jdk1.8.0_60/" SPARK_YARN_MODE=true HDP_VERSION="2.4.0.0-169" Then we need to tell Spark which files to distribute across all Spark executors. Therefor we need to set 4 variables. One variable is of format "<hdfs path1>#<cache name 1>,<hdfs path2>#<cache name 2>, ...", and the three others contain comma separated timestamps, file sizes and visbility of each file (same order): SPARK_YARN_CACHE_FILES: "hdfs://<<name-node>>:8020/tmp/simple-project/simple-project.jar#__app__.jar,hdfs://<<name-node>>:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar#__spark__.jar" SPARK_YARN_CACHE_FILES_FILE_SIZES: "10588,191724610" SPARK_YARN_CACHE_FILES_TIME_STAMPS: "1460990579987,1460219553714" SPARK_YARN_CACHE_FILES_VISIBILITIES: "PUBLIC,PRIVATE" Replace <<name-node>> with the correct address. File size and timestamp can be retrieved from HDFS vie WebHDFS. Next, construct the classpath CLASSPATH="{{PWD}}<CPS>__spark__.jar<CPS>{{PWD}}/__app__.jar<CPS>{{PWD}}/__app__.properties<CPS>{{HADOOP_CONF_DIR}}<CPS>/usr/hdp/current/hadoop-client/*<CPS>/usr/hdp/current/hadoop-client/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>{{PWD}}/mr-framework/hadoop/share/hadoop/common/*<CPS>{{PWD}}/mr-framework/hadoop/share/hadoop/common/lib/*<CPS>{{PWD}}/mr-framework/hadoop/share/hadoop/yarn/*<CPS>{{PWD}}/mr-framework/hadoop/share/hadoop/yarn/lib/*<CPS>{{PWD}}/mr-framework/hadoop/share/hadoop/hdfs/*<CPS>{{PWD}}/mr-framework/hadoop/share/hadoop/hdfs/lib/*<CPS>{{PWD}}/mr-framework/hadoop/share/hadoop/tools/lib/*<CPS>/usr/hdp/2.4.0.0-169/hadoop/lib/hadoop-lzo-0.6.0.2.4.0.0-169.jar<CPS>/etc/hadoop/conf/secure<CPS>" Notes: - __spark__.jar and __app__.jar are the same as provided in SPARK_YARN_CACHE_FILES - Spark will resolve <CPS> to `:` c) Create the Spark job json file The information above will be added to the Spark json file as the command and environment attribute (details see attachment - remove the .txt ending) The last missing piece are the so called local_resources which describes all files in HDFS necessary for the Spark job: - Spark assembly jar (as in the caching environment variable) - Spark application jar for this project (as in the caching environment variable) - Spark properties file for this project (only for Application Master, no caching necessary) All three need to be given in a form { "key": "__app__.jar", "value": { "resource": "hdfs://<<name-node>>:8020/tmp/simple-project/simple-project.jar", "size": 10588, "timestamp": 1460990579987, "type": "FILE", "visibility": "APPLICATION" } }, Again, replace <<name-node>>. Timestamp, hdfs path, size and key need to be the same as for the caching environment variables. Save it as spark-yarn.json (details see attachment - remove the .txt ending) 5) Submit the job First request an application ID from YARN curl -s -X POST -d '' \ https://$KNOX_SERVER:8443/gateway/default/resourcemanager/v1/cluster/apps/new-application # { # "application-id": "application_1460195242962_0054", # "maximum-resource-capability": { # "memory": 8192, # "vCores": 3 # } # } Edit the "application-id" in spark-yarn.json and then submit the job: curl -s -i -X POST -H "Content-Type: application/json" ${HADOOP_RM}/ws/v1/cluster/apps \ --data-binary spark-yarn.json # HTTP/1.1 100 Continue # # HTTP/1.1 202 Accepted # Cache-Control: no-cache # Expires: Sun, 10 Apr 2016 13:02:47 GMT # Date: Sun, 10 Apr 2016 13:02:47 GMT # Pragma: no-cache # Expires: Sun, 10 Apr 2016 13:02:47 GMT # Date: Sun, 10 Apr 2016 13:02:47 GMT # Pragma: no-cache # Content-Type: application/json # Location: http://<<resource-manager>>:8088/ws/v1/cluster/apps/application_1460195242962_0054 # Content-Length: 0 # Server: Jetty(6.1.26.hwx) 6) Track the job curl -s "http://<<resource-manager>>:8088/ws/v1/cluster/apps/application_1460195242962_0054" # { # "app": { # "id": "application_1460195242962_0054", # "user": "dr.who", # "name": "IrisApp", # "queue": "default", # "state": "FINISHED", # "finalStatus": "SUCCEEDED", # "progress": 100, # "trackingUI": "History", # "trackingUrl": "http://<<ResourceManager>>:8088/proxy/application_1460195242962_0054/", # "diagnostics": "", # "clusterId": 1460195242962, # "applicationType": "YARN", # "applicationTags": "", # "startedTime": 1460293367576, # "finishedTime": 1460293413568, # "elapsedTime": 45992, # "amContainerLogs": "http://<<node-manager>>:8042/node/containerlogs/container_e29_1460195242962_0054_01_000001/dr.who", # "amHostHttpAddress": "<<node-manager>>:8042", # "allocatedMB": -1, # "allocatedVCores": -1, # "runningContainers": -1, # "memorySeconds": 172346, # "vcoreSeconds": 112, # "queueUsagePercentage": 0, # "clusterUsagePercentage": 0, # "preemptedResourceMB": 0, # "preemptedResourceVCores": 0, # "numNonAMContainerPreempted": 0, # "numAMContainerPreempted": 0, # "logAggregationStatus": "SUCCEEDED" # } # } 7) Using Knox (without kerberos) The whole process works with Knox, just replace the WebHdfs and Resource Manager URLs with Knox substitutes: a) Resource Manager: http://<<resource-manager>>:8088/ws/v1 ==> https://<<knox-gateway>>:8443/gateway/default/resourcemanager/v1 b) Webhdfs Host http://<<webhdfs-host>>:50070/webhdfs/v1 ==> https://<<knox-gateway>>:8443/gateway/default/webhdfs/v1 Additionally you need to provide Knox credentials (e.g. Basic Authentication <<user>:<<password>>) 😎 More details More details and a python script to ease the whole process can be found in Spark-Yarn-REST-API Repo Any comment to make this process easier is highly appreciated ...

bwalter1 · ‎10-16-2015

I ran into some issues using the latest iODBC 3.52.10 version from www.iodbc.org (mxkozzzz.dmg) Instead of ~/.odbc.ini ~/.odbcinst.ini ~/.hortonworks.hiveodbc.ini it uses (without leading dots): ~/Library/ODBC/odbc.ini ~/Library/ODBC/odbcinst.ini ~/Library/ODBC/hortonworks.hiveodbc.ini Note: iODBCwill link ~/.odbc.ini and ~/.odbcinst.ini to the ~/Library/ODBC/ versions My setup: ~/Library/ODBC/odbcinst.ini - edit at least host, port and user ~/Library/ODBC/odbcinst.ini - keep as is ~/Library/ODBC/hortonworks.hiveodbc.ini - Change ODBCInstLib to the fully qualified path of iODBC's libiodbcinst.dylib under /usr/local/iODBC/lib/ [Driver] ## - Note that this default DriverManagerEncoding of UTF-32 is for iODBC. DriverManagerEncoding=UTF-32 ErrorMessagesPath=/usr/lib/hive/lib/native/hiveodbc/ErrorMessages/ LogLevel=0 LogPath= SwapFilePath=/tmp # iODBC ODBCInstLib=/usr/local/iODBC/lib/libiodbcinst.dylib {ignore the "5." in the output above - no idea how to get rid of it}

Online	Offline
Last Visited	‎06-07-2017 08:08 AM

Member Since	‎10-07-2015 10:28 PM
Last Visited	‎06-07-2017 08:08 AM
Posts	107
Kudos received	71

Cloudera Community

Re: Adding Livy Server as service to Apache Knox

Re: Adding Livy Server as service to Apache Knox

Using Jupyter with Sparkmagic and Livy Server on H...

Adding Livy Server as service to Apache Knox

Creating fat jars for Spark Kafka Streaming using ...

Starting Spark jobs via REST API on a kerberized c...

Starting Spark jobs directly via YARN REST API

Installing Hive ODBC with iODBC 3.52.10 on Mac OS ...