About ChrisV

ChrisV · ‎07-13-2017

Hi Peter, The goal is to run Data Analysis using Spark where part of the data is stored in hive, The idea is to use the whole cluster to distribute the workload. I missed the fact that I was running in local mode (As said it's a pilot, i'm totally new to Cloudera stack). I assumed that the workbench was by default in yarn mode. I'll dig in the doc again. Trying to access hive using Sparksession object in my test was returning nothing. I'll try using yarn & report back thanks! Regards. Chris

ChrisV · ‎07-13-2017

Hello, While running a Pilot for CDSW & CDH, i'm struggling running queries on hive table. Scala Workbench Test code: import org.apache.spark.sql.Row import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.Column val sparkSession = SparkSession.builder.master("local").appName("mapexample"). enableHiveSupport().getOrCreate() sparkSession.catalog.listTables.show() val sqlContext = new HiveContext(sc) sqlContext.sql("describe database default").show sqlContext.sql("describe formatted default.mytable").show sc.version Test return import org.apache.spark.sql.Row import org.apache.spark.sql.hive.HiveContext import org.apache.spark.sql.Column val sparkSession = SparkSession.builder.master("local").appName("mapexample"). enableHiveSupport().getOrCreate() sparkSession.catalog.listTables.show() +----+--------+-----------+---------+-----------+ |name|database|description|tableType|isTemporary| +----+--------+-----------+---------+-----------+ +----+--------+-----------+---------+-----------+ val sqlContext = new HiveContext(sc) sqlContext.sql("describe database default").show +-------------------------+--------------------------+ |database_description_item|database_description_value| +-------------------------+--------------------------+ | Database Name| default| | Description| default database| | Location| /user/hive/warehouse| +-------------------------+--------------------------+ sqlContext.sql("describe formatted default.mytable").show Name: org.apache.spark.sql.catalyst.analysis.NoSuchTableException Message: Table or view 'mytable' not found in database 'default'; StackTrace: at org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:138) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:289) at org.apache.spark.sql.execution.command.DescribeTableCommand.run(tables.scala:437) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699) Interesing to note: Access hive though SparkSession returns nothing, while HiveContext does return a description of the default DB that does not match the actual default DB From Hue default Default Hive database hdfs://<masked>/user/hive/warehouse public ROLE The CDSW host has the right gateway roles installed, hive-site.xml is present with the spark config files. Hive Metastore log does not register any access when the workbench is trying to access Hive No Kerberos involved I ran out of option to check, hence this post Thanks Chris

ChrisV · ‎07-07-2017

Found it. I interpreted "MASTER" as being Master node of the CDH cluster 😉 Unsing the right IP did fix the issue Thanks

ChrisV · ‎07-07-2017

Hello, The 3 last pods are not starting due some issue mounting volumes Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 15h 33s 224 {kubelet workbench} Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/nfs/bee36b58-6247-11e7-9372-000d3a29b7ab-projects-share" (spec.Name: "projects-share") pod "bee36b58-6247-11e7-9372-000d3a29b7ab" (UID: "bee36b58-6247-11e7-9372-000d3a29b7ab") with: mount failed: exit status 32 Mounting arguments: 10.0.0.4:/var/lib/cdsw/current/projects /var/lib/kubelet/pods/bee36b58-6247-11e7-9372-000d3a29b7ab/volumes/kubernetes.io~nfs/projects-share nfs [] Output: mount.nfs: Connection timed out 19h 4s 502 {kubelet workbench} Warning FailedMount Unable to mount volumes for pod "web-3826671331-7xchm_default(bee36b58-6247-11e7-9372-000d3a29b7ab)": timeout expired waiting for volumes to attach/mount for pod "web-3826671331-7xchm"/"default". list of unattached/unmounted volumes=[projects-claim] 19h 4s 502 {kubelet workbench} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "web-3826671331-7xchm"/"default". list of unattached/unmounted volumes=[projects-claim] Google research did not really help me pointing waht could be the cause. Any pointers where is should look at ? Thanks!

ChrisV · ‎07-06-2017

Hi Peter, It actually did the trick $ sudo docker pull "docker.repository.cloudera.com/cdsw/1.0.1/web:052787a" 052787a: Pulling from cdsw/1.0.1/web b6f892c0043b: Already exists 55010f332b04: Already exists 2955fb827c94: Already exists 3deef3fcbd30: Already exists cf9722e506aa: Already exists 72923da64564: Already exists 3101e33a625d: Already exists c03d5fa4b8e5: Already exists 35c1e4a8663c: Already exists a1b3940356ad: Already exists 62370be47aba: Already exists ddb5566a99f9: Already exists 8b5b82cdf853: Already exists 0c1a28ba377b: Already exists 5911a6a3d3db: Already exists eb2b63f33d61: Already exists 3af8b8e8dc75: Already exists 19d9e7bce45d: Pull complete 396039e72b5e: Pull complete b1fa7de66580: Pull complete c15cd2ff85a4: Pull complete 87916a3ab13a: Pull complete 6c2fbb95a61e: Pull complete 938edf86928e: Pull complete e0889d759edc: Extracting [==================================================>] 526.4 MB/526.4 MB e0889d759edc: Pull complete 319dc7c60d62: Pull complete dd1001380640: Pull complete Digest: sha256:ecb807b8758acdfd1c6b0ff5acb1dad947cded312b47b60012c7478a0fcd9232 Status: Downloaded newer image for docker.repository.cloudera.com/cdsw/1.0.1/web:052787a Still a problem with 3 pods - will check that later Cloudera Data Science Workbench Status Service Status docker: active kubelet: active nfs: active Checking kernel parameters... Node Status NAME STATUS AGE STATEFUL workbench Ready 3h true System Pod status NAME READY STATUS RESTARTS AGE IP NODE dummy-2088944543-uev12 1/1 Running 0 3h 10.0.0.5 workbench etcd-workbench 1/1 Running 0 3h 10.0.0.5 workbench kube-apiserver-workbench 1/1 Running 0 3h 10.0.0.5 workbench kube-controller-manager-workbench 1/1 Running 3 3h 10.0.0.5 workbench kube-discovery-1150918428-v7vu8 1/1 Running 0 3h 10.0.0.5 workbench kube-dns-3873593988-vos07 3/3 Running 0 3h 100.66.0.2 workbench kube-proxy-7qq63 1/1 Running 0 3h 10.0.0.5 workbench kube-scheduler-workbench 1/1 Running 3 3h 10.0.0.5 workbench node-problem-detector-v0.1-kngbh 1/1 Running 0 3h 10.0.0.5 workbench weave-net-clu7s 2/2 Running 0 3h 10.0.0.5 workbench Cloudera Data Science Workbench Pod Status NAME READY STATUS RESTARTS AGE IP NODE ROLE cron-2934152315-56p1n 1/1 Running 0 3h 100.66.0.8 workbench cron db-39862959-icvq9 1/1 Running 1 3h 100.66.0.5 workbench db db-migrate-052787a-mvb40 1/1 Running 0 3h 100.66.0.4 workbench db-migrate engine-deps-du8cx 1/1 Running 0 3h 100.66.0.3 workbench engine-deps ingress-controller-3138093376-l5z46 1/1 Running 0 3h 10.0.0.5 workbench ingress-controller livelog-1900214889-qppq2 1/1 Running 0 3h 100.66.0.6 workbench livelog reconciler-459456250-wgems 1/1 Running 0 3h 100.66.0.7 workbench reconciler spark-port-forwarder-a31as 1/1 Running 0 3h 10.0.0.5 workbench spark-port-forwarder web-3826671331-7xchm 0/1 ContainerCreating 0 3h <none> workbench web web-3826671331-h3gkd 0/1 ContainerCreating 0 3h <none> workbench web web-3826671331-vtbdh 0/1 ContainerCreating 0 3h <none> workbench web Cloudera Data Science Workbench is not ready yet: some application pods are not ready Thanks!

ChrisV · ‎07-06-2017

Hello, New to Cloudera, I'm deploying the CDSW in Azure (on a Cloudera Centos7.2 template). The installation went ok, the Init started well, but eventually not all the pod will start: Cloudera Data Science Workbench Status Service Status docker: active kubelet: active nfs: active Checking kernel parameters... Node Status NAME STATUS AGE STATEFUL workbench Ready 2h true System Pod status NAME READY STATUS RESTARTS AGE IP NODE dummy-2088944543-uev12 1/1 Running 0 2h 10.0.0.5 workbench etcd-workbench 1/1 Running 0 2h 10.0.0.5 workbench kube-apiserver-workbench 1/1 Running 0 2h 10.0.0.5 workbench kube-controller-manager-workbench 1/1 Running 2 2h 10.0.0.5 workbench kube-discovery-1150918428-v7vu8 1/1 Running 0 2h 10.0.0.5 workbench kube-dns-3873593988-vos07 3/3 Running 0 2h 100.66.0.2 workbench kube-proxy-7qq63 1/1 Running 0 2h 10.0.0.5 workbench kube-scheduler-workbench 1/1 Running 2 2h 10.0.0.5 workbench node-problem-detector-v0.1-kngbh 1/1 Running 0 2h 10.0.0.5 workbench weave-net-clu7s 2/2 Running 0 2h 10.0.0.5 workbench Cloudera Data Science Workbench Pod Status NAME READY STATUS RESTARTS AGE IP NODE ROLE cron-2934152315-56p1n 1/1 Running 0 2h 100.66.0.8 workbench cron db-39862959-icvq9 1/1 Running 1 2h 100.66.0.5 workbench db db-migrate-052787a-mvb40 0/1 ImagePullBackOff 0 2h 100.66.0.4 workbench db-migrate engine-deps-du8cx 1/1 Running 0 2h 100.66.0.3 workbench engine-deps ingress-controller-3138093376-l5z46 1/1 Running 0 2h 10.0.0.5 workbench ingress-controller livelog-1900214889-qppq2 1/1 Running 0 2h 100.66.0.6 workbench livelog reconciler-459456250-wgems 1/1 Running 0 2h 100.66.0.7 workbench reconciler spark-port-forwarder-a31as 1/1 Running 0 2h 10.0.0.5 workbench spark-port-forwarder web-3826671331-7xchm 0/1 ContainerCreating 0 2h <none> workbench web web-3826671331-h3gkd 0/1 ContainerCreating 0 2h <none> workbench web web-3826671331-vtbdh 0/1 ContainerCreating 0 2h <none> workbench web Cloudera Data Science Workbench is not ready yet: some application pods are not ready $sudo journalctl -u docker Jul 06 13:42:03 workbench docker[6669]: time="2017-07-06T13:42:03.996814534Z" level=error msg="Handler for GET /images/docker.repository.cloudera.com/cdsw/1.0.1/web:052787a/json returned error: No such image: docker.repository.cloudera.com/cdsw/1.0.1/we The access to Internet is available (As most of the other pods are started) Any ideas ? Thanks

ChrisV · ‎05-24-2017

Hello, After upgrading HDP to 2.6 and getting the GA release of Spark 2.1, i'm trying (to no luck so far) to add Spark2 Interpreter in Zeppelin (if possible at all). I did create a new interpreter spark2 in Zeppelin which will be instantiate properly (%spark2), however sc.version indicates that i'm still running Spark 1.6.3. Digging in the config & the doc, I found out that SPARK_HOME is defined in zeppelin-env.sh, pointing by default to Spark 1.6.63. Editing the config & restart Zeppelin will "work" in the sense that I can now successfully instantiate Spark2, but Spark 1.6.3 is not available anymore from the notebook (and livy is still configured to Spark 1.6.3). Is there any way to create interpreters to allow using both Spark 1.6.3 & Spark 2 from Zeppelin 0.7 ? Thanks Christophe

ChrisV · ‎04-05-2017

Fixed! The HDFS client was not installed on the new Oozie server.

ChrisV · ‎04-04-2017

Hello I moved my Oozie server between nodes today and while the move was successful for Ambari, and in the great lines, all the Coordinator jobs are failing since with Error 10005, java Exception UnknownHost, 'ha name of the name nodes' Apparently Oozie server is struggling located The name node which is in HA mode The hdfs- and core-site.xml look good and in sync with the rest of the cluster, the proxy config in Hdfs and other services are properly set. I'm running out of idea on where to look. Any suggestions? Thanks Christophe

ChrisV · ‎03-23-2017

Hi @Binu Mathew Thanks for your answer. I'll dive into this approach & post further if/when required. Thanks! Christoohe

Online	Offline
Last Visited	‎07-29-2019 02:52 PM

Member Since	‎07-06-2017 07:31 AM
Last Visited	‎07-29-2019 02:52 PM
Posts	53
Kudos received	12

Cloudera Community

Re: spark2 upgrade to 2.3.0 from 2.2.0 wont read o...

Re: scm_prepare_database.sh / Mysql / Access deni...

Re: CDSW - Can't access hive Tables from Scala Wor...

Re: Error E10005 / unknown host after moving Oozi...

Re: Hive Metastore / java.lang.RuntimeException: o...

Re: CDSW - Can't access hive Tables from Scala Wor...

CDSW - Can't access hive Tables from Scala Workben...

Re: Data Science Workbench - "no such image: docke...

Re: Data Science Workbench - "no such image: docke...

Re: Data Science Workbench - "no such image: docke...

Data Science Workbench - "no such image: docker.re...

HDP2.6 / Zeppelin0.7 / How to have both Spark 1.6....

Re: Error E10005 / unknown host after moving Oozi...

Error E10005 / unknown host after moving Oozie

Re: Best approach to ingest CSV with changing sche...