Member since
07-06-2017
53
Posts
12
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
16006 | 05-03-2018 08:01 AM | |
9816 | 10-11-2017 08:17 AM | |
10632 | 07-20-2017 07:04 AM | |
1193 | 04-05-2017 07:32 AM | |
3089 | 03-09-2017 12:05 PM |
07-13-2017
04:33 AM
Hi Peter, The goal is to run Data Analysis using Spark where part of the data is stored in hive, The idea is to use the whole cluster to distribute the workload. I missed the fact that I was running in local mode (As said it's a pilot, i'm totally new to Cloudera stack). I assumed that the workbench was by default in yarn mode. I'll dig in the doc again. Trying to access hive using Sparksession object in my test was returning nothing. I'll try using yarn & report back thanks! Regards. Chris
... View more
07-13-2017
03:40 AM
Hello, While running a Pilot for CDSW & CDH, i'm struggling running queries on hive table. Scala Workbench Test code: import org.apache.spark.sql.Row
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.Column
val sparkSession = SparkSession.builder.master("local").appName("mapexample").
enableHiveSupport().getOrCreate()
sparkSession.catalog.listTables.show()
val sqlContext = new HiveContext(sc)
sqlContext.sql("describe database default").show
sqlContext.sql("describe formatted default.mytable").show
sc.version Test return import org.apache.spark.sql.Row
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.Column
val sparkSession = SparkSession.builder.master("local").appName("mapexample").
enableHiveSupport().getOrCreate()
sparkSession.catalog.listTables.show()
+----+--------+-----------+---------+-----------+
|name|database|description|tableType|isTemporary|
+----+--------+-----------+---------+-----------+
+----+--------+-----------+---------+-----------+
val sqlContext = new HiveContext(sc)
sqlContext.sql("describe database default").show
+-------------------------+--------------------------+
|database_description_item|database_description_value|
+-------------------------+--------------------------+
| Database Name| default|
| Description| default database|
| Location| /user/hive/warehouse|
+-------------------------+--------------------------+
sqlContext.sql("describe formatted default.mytable").show
Name: org.apache.spark.sql.catalyst.analysis.NoSuchTableException
Message: Table or view 'mytable' not found in database 'default';
StackTrace: at org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:138)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:289)
at org.apache.spark.sql.execution.command.DescribeTableCommand.run(tables.scala:437)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699) Interesing to note: Access hive though SparkSession returns nothing, while HiveContext does return a description of the default DB that does not match the actual default DB From Hue default
Default Hive database
hdfs://<masked>/user/hive/warehouse
public
ROLE The CDSW host has the right gateway roles installed, hive-site.xml is present with the spark config files. Hive Metastore log does not register any access when the workbench is trying to access Hive No Kerberos involved I ran out of option to check, hence this post Thanks Chris
... View more
Labels:
07-07-2017
01:40 AM
Found it. I interpreted "MASTER" as being Master node of the CDH cluster 😉 Unsing the right IP did fix the issue Thanks
... View more
07-07-2017
01:18 AM
Hello, The 3 last pods are not starting due some issue mounting volumes Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
15h 33s 224 {kubelet workbench} Warning FailedMount MountVolume.SetUp failed for volume "kubernetes.io/nfs/bee36b58-6247-11e7-9372-000d3a29b7ab-projects-share" (spec.Name: "projects-share") pod "bee36b58-6247-11e7-9372-000d3a29b7ab" (UID: "bee36b58-6247-11e7-9372-000d3a29b7ab") with: mount failed: exit status 32
Mounting arguments: 10.0.0.4:/var/lib/cdsw/current/projects /var/lib/kubelet/pods/bee36b58-6247-11e7-9372-000d3a29b7ab/volumes/kubernetes.io~nfs/projects-share nfs []
Output: mount.nfs: Connection timed out
19h 4s 502 {kubelet workbench} Warning FailedMount Unable to mount volumes for pod "web-3826671331-7xchm_default(bee36b58-6247-11e7-9372-000d3a29b7ab)": timeout expired waiting for volumes to attach/mount for pod "web-3826671331-7xchm"/"default". list of unattached/unmounted volumes=[projects-claim]
19h 4s 502 {kubelet workbench} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "web-3826671331-7xchm"/"default". list of unattached/unmounted volumes=[projects-claim] Google research did not really help me pointing waht could be the cause. Any pointers where is should look at ? Thanks!
... View more
07-06-2017
09:17 AM
Hi Peter, It actually did the trick $ sudo docker pull "docker.repository.cloudera.com/cdsw/1.0.1/web:052787a"
052787a: Pulling from cdsw/1.0.1/web
b6f892c0043b: Already exists
55010f332b04: Already exists
2955fb827c94: Already exists
3deef3fcbd30: Already exists
cf9722e506aa: Already exists
72923da64564: Already exists
3101e33a625d: Already exists
c03d5fa4b8e5: Already exists
35c1e4a8663c: Already exists
a1b3940356ad: Already exists
62370be47aba: Already exists
ddb5566a99f9: Already exists
8b5b82cdf853: Already exists
0c1a28ba377b: Already exists
5911a6a3d3db: Already exists
eb2b63f33d61: Already exists
3af8b8e8dc75: Already exists
19d9e7bce45d: Pull complete
396039e72b5e: Pull complete
b1fa7de66580: Pull complete
c15cd2ff85a4: Pull complete
87916a3ab13a: Pull complete
6c2fbb95a61e: Pull complete
938edf86928e: Pull complete
e0889d759edc: Extracting [==================================================>] 526.4 MB/526.4 MB
e0889d759edc: Pull complete
319dc7c60d62: Pull complete
dd1001380640: Pull complete
Digest: sha256:ecb807b8758acdfd1c6b0ff5acb1dad947cded312b47b60012c7478a0fcd9232
Status: Downloaded newer image for docker.repository.cloudera.com/cdsw/1.0.1/web:052787a Still a problem with 3 pods - will check that later Cloudera Data Science Workbench Status
Service Status
docker: active
kubelet: active
nfs: active
Checking kernel parameters...
Node Status
NAME STATUS AGE STATEFUL
workbench Ready 3h true
System Pod status
NAME READY STATUS RESTARTS AGE IP NODE
dummy-2088944543-uev12 1/1 Running 0 3h 10.0.0.5 workbench
etcd-workbench 1/1 Running 0 3h 10.0.0.5 workbench
kube-apiserver-workbench 1/1 Running 0 3h 10.0.0.5 workbench
kube-controller-manager-workbench 1/1 Running 3 3h 10.0.0.5 workbench
kube-discovery-1150918428-v7vu8 1/1 Running 0 3h 10.0.0.5 workbench
kube-dns-3873593988-vos07 3/3 Running 0 3h 100.66.0.2 workbench
kube-proxy-7qq63 1/1 Running 0 3h 10.0.0.5 workbench
kube-scheduler-workbench 1/1 Running 3 3h 10.0.0.5 workbench
node-problem-detector-v0.1-kngbh 1/1 Running 0 3h 10.0.0.5 workbench
weave-net-clu7s 2/2 Running 0 3h 10.0.0.5 workbench
Cloudera Data Science Workbench Pod Status
NAME READY STATUS RESTARTS AGE IP NODE ROLE
cron-2934152315-56p1n 1/1 Running 0 3h 100.66.0.8 workbench cron
db-39862959-icvq9 1/1 Running 1 3h 100.66.0.5 workbench db
db-migrate-052787a-mvb40 1/1 Running 0 3h 100.66.0.4 workbench db-migrate
engine-deps-du8cx 1/1 Running 0 3h 100.66.0.3 workbench engine-deps
ingress-controller-3138093376-l5z46 1/1 Running 0 3h 10.0.0.5 workbench ingress-controller
livelog-1900214889-qppq2 1/1 Running 0 3h 100.66.0.6 workbench livelog
reconciler-459456250-wgems 1/1 Running 0 3h 100.66.0.7 workbench reconciler
spark-port-forwarder-a31as 1/1 Running 0 3h 10.0.0.5 workbench spark-port-forwarder
web-3826671331-7xchm 0/1 ContainerCreating 0 3h <none> workbench web
web-3826671331-h3gkd 0/1 ContainerCreating 0 3h <none> workbench web
web-3826671331-vtbdh 0/1 ContainerCreating 0 3h <none> workbench web
Cloudera Data Science Workbench is not ready yet: some application pods are not ready Thanks!
... View more
07-06-2017
07:46 AM
Hello, New to Cloudera, I'm deploying the CDSW in Azure (on a Cloudera Centos7.2 template). The installation went ok, the Init started well, but eventually not all the pod will start: Cloudera Data Science Workbench Status
Service Status
docker: active
kubelet: active
nfs: active
Checking kernel parameters...
Node Status
NAME STATUS AGE STATEFUL
workbench Ready 2h true
System Pod status
NAME READY STATUS RESTARTS AGE IP NODE
dummy-2088944543-uev12 1/1 Running 0 2h 10.0.0.5 workbench
etcd-workbench 1/1 Running 0 2h 10.0.0.5 workbench
kube-apiserver-workbench 1/1 Running 0 2h 10.0.0.5 workbench
kube-controller-manager-workbench 1/1 Running 2 2h 10.0.0.5 workbench
kube-discovery-1150918428-v7vu8 1/1 Running 0 2h 10.0.0.5 workbench
kube-dns-3873593988-vos07 3/3 Running 0 2h 100.66.0.2 workbench
kube-proxy-7qq63 1/1 Running 0 2h 10.0.0.5 workbench
kube-scheduler-workbench 1/1 Running 2 2h 10.0.0.5 workbench
node-problem-detector-v0.1-kngbh 1/1 Running 0 2h 10.0.0.5 workbench
weave-net-clu7s 2/2 Running 0 2h 10.0.0.5 workbench
Cloudera Data Science Workbench Pod Status
NAME READY STATUS RESTARTS AGE IP NODE ROLE
cron-2934152315-56p1n 1/1 Running 0 2h 100.66.0.8 workbench cron
db-39862959-icvq9 1/1 Running 1 2h 100.66.0.5 workbench db
db-migrate-052787a-mvb40 0/1 ImagePullBackOff 0 2h 100.66.0.4 workbench db-migrate
engine-deps-du8cx 1/1 Running 0 2h 100.66.0.3 workbench engine-deps
ingress-controller-3138093376-l5z46 1/1 Running 0 2h 10.0.0.5 workbench ingress-controller
livelog-1900214889-qppq2 1/1 Running 0 2h 100.66.0.6 workbench livelog
reconciler-459456250-wgems 1/1 Running 0 2h 100.66.0.7 workbench reconciler
spark-port-forwarder-a31as 1/1 Running 0 2h 10.0.0.5 workbench spark-port-forwarder
web-3826671331-7xchm 0/1 ContainerCreating 0 2h <none> workbench web
web-3826671331-h3gkd 0/1 ContainerCreating 0 2h <none> workbench web
web-3826671331-vtbdh 0/1 ContainerCreating 0 2h <none> workbench web
Cloudera Data Science Workbench is not ready yet: some application pods are not ready $sudo journalctl -u docker
Jul 06 13:42:03 workbench docker[6669]: time="2017-07-06T13:42:03.996814534Z" level=error msg="Handler for GET /images/docker.repository.cloudera.com/cdsw/1.0.1/web:052787a/json returned error: No such image: docker.repository.cloudera.com/cdsw/1.0.1/we The access to Internet is available (As most of the other pods are started) Any ideas ? Thanks
... View more
Labels:
05-24-2017
02:16 PM
1 Kudo
Hello, After upgrading HDP to 2.6 and getting the GA release of Spark 2.1, i'm trying (to no luck so far) to add Spark2 Interpreter in Zeppelin (if possible at all). I did create a new interpreter spark2 in Zeppelin which will be instantiate properly (%spark2), however sc.version indicates that i'm still running Spark 1.6.3. Digging in the config & the doc, I found out that SPARK_HOME is defined in zeppelin-env.sh, pointing by default to Spark 1.6.63. Editing the config & restart Zeppelin will "work" in the sense that I can now successfully instantiate Spark2, but Spark 1.6.3 is not available anymore from the notebook (and livy is still configured to Spark 1.6.3). Is there any way to create interpreters to allow using both Spark 1.6.3 & Spark 2 from Zeppelin 0.7 ? Thanks Christophe
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
04-05-2017
07:32 AM
Fixed! The HDFS client was not installed on the new Oozie server.
... View more
04-04-2017
06:10 PM
Hello I moved my Oozie server between nodes today and while the move was successful for Ambari, and in the great lines, all the Coordinator jobs are failing since with Error 10005, java Exception UnknownHost, 'ha name of the name nodes' Apparently Oozie server is struggling located The name node which is in HA mode The hdfs- and core-site.xml look good and in sync with the rest of the cluster, the proxy config in Hdfs and other services are properly set. I'm running out of idea on where to look. Any suggestions? Thanks Christophe
... View more
Labels:
- Labels:
-
Apache Oozie
03-23-2017
10:27 AM
Hi @Binu Mathew Thanks for your answer. I'll dive into this approach & post further if/when required. Thanks! Christoohe
... View more