Member since
09-01-2016
51
Posts
12
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9738 | 02-26-2018 02:37 PM | |
1443 | 01-25-2017 07:41 PM |
01-11-2023
09:18 AM
Hello everyone. Since CentOS 8 has been discontinued more than a year ago, and Rocky Linux / Alma Linux have been left occupying the same role as free RHEL's mirror distributions, I would like to know if Cloudera has already a date scheduled in the near future to start supporting any of these distributions as a base operating system for CDP base and related products. Thanks in advance
... View more
Labels:
- Labels:
-
Cloudera Data Platform (CDP)
02-23-2019
03:38 PM
The main issue here is integration between Zeppelin+Spark to HWC. This problem is described here by @Luis Vazquez
... View more
02-22-2019
10:39 PM
Additional comments: - Noticeably, this issue means a fundamental gap when it comes to leveraging Ranger authorization from Zeppelin. - Impersonation is a must as individual developers and end users do require to use Zeppelin.
... View more
02-07-2019
08:18 PM
We need to configure Superset, running within HDP 3.1, to use existing LDAP. We could not find any proper documentation on how to do this. Are there any defined steps? Thanks in advance.
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
02-05-2019
09:02 PM
We are running HDP 3.1 (latest release as of Feb 2019). We configured Spark to Hive+Ranger integration as instructed here:
https://community.hortonworks.com/articles/223626/integrating-apache-hive-with-apache-spark-hive-war.html This works OK from spark shell; eg. you can access Hive tables from Spark, and Ranger policies do apply correctly. However, when you do the same from Zeppelin, Ranger seems to be out of the picture... you can access all Hive tables and rows as it Ranger would not exist. forum-spark-hive.zip
... View more
01-14-2019
12:35 PM
yarn.scheduler.capacity.maximum-am-resource-percent=0.2 yarn.scheduler.capacity.maximum-applications=10000 yarn.scheduler.capacity.node-locality-delay=40 yarn.scheduler.capacity.root.accessible-node-labels=* yarn.scheduler.capacity.root.acl_administer_queue=* yarn.scheduler.capacity.root.capacity=100 yarn.scheduler.capacity.root.default.acl_submit_applications=* yarn.scheduler.capacity.root.default.capacity=10 yarn.scheduler.capacity.root.default.maximum-capacity=30 yarn.scheduler.capacity.root.default.state=RUNNING yarn.scheduler.capacity.root.default.user-limit-factor=2 yarn.scheduler.capacity.root.queues=Hive,Zeppelin,default yarn.scheduler.capacity.queue-mappings=u:zeppelin:Zeppelin,u:hdfs:Hive,g:dl-analytics-group:Zeppelin yarn.scheduler.capacity.queue-mappings-override.enable=false yarn.scheduler.capacity.root.Hive.acl_administer_queue=* yarn.scheduler.capacity.root.Hive.acl_submit_applications=* yarn.scheduler.capacity.root.Hive.capacity=50 yarn.scheduler.capacity.root.Hive.maximum-capacity=90 yarn.scheduler.capacity.root.Hive.minimum-user-limit-percent=25 yarn.scheduler.capacity.root.Hive.ordering-policy=fair yarn.scheduler.capacity.root.Hive.ordering-policy.fair.enable-size-based-weight=false yarn.scheduler.capacity.root.Hive.priority=10 yarn.scheduler.capacity.root.Hive.state=RUNNING yarn.scheduler.capacity.root.Hive.user-limit-factor=2 yarn.scheduler.capacity.root.Zeppelin.acl_administer_queue=* yarn.scheduler.capacity.root.Zeppelin.acl_submit_applications=* yarn.scheduler.capacity.root.Zeppelin.capacity=40 yarn.scheduler.capacity.root.Zeppelin.maximum-capacity=80 yarn.scheduler.capacity.root.Zeppelin.minimum-user-limit-percent=20 yarn.scheduler.capacity.root.Zeppelin.ordering-policy=fair yarn.scheduler.capacity.root.Zeppelin.ordering-policy.fair.enable-size-based-weight=false yarn.scheduler.capacity.root.Zeppelin.priority=5 yarn.scheduler.capacity.root.Zeppelin.state=RUNNING yarn.scheduler.capacity.root.Zeppelin.user-limit-factor=3 yarn.scheduler.capacity.root.default.minimum-user-limit-percent=25 yarn.scheduler.capacity.root.default.ordering-policy=fair yarn.scheduler.capacity.root.default.ordering-policy.fair.enable-size-based-weight=false yarn.scheduler.capacity.root.default.priority=0 yarn.scheduler.capacity.root.maximum-capacity=100 yarn.scheduler.capacity.root.ordering-policy=priority-utilization yarn.scheduler.capacity.root.priority=0
... View more
01-11-2019
08:37 PM
We have defined several YARN queues. Say that you have queue Q1, where users A and B run Spark processes. If A submits a job that demands all of the queue resources, they are allocated by YARN. Subsequently, when B submits his job, he is affected by resource scarcity. We need to prevent this situation, by assigning resources more evenly between A and B (and all other incoming users), within Q1. We have already set Scheduler to Fair. Can this eager resource allocation behaviour be prevented?
... View more
Labels:
- Labels:
-
Apache YARN
07-14-2018
11:39 PM
Something like that. I want to include my own Scala libraries in Zeppelin.
... View more
07-13-2018
08:43 PM
This about running Spark/Scala from a Zeppelin notebook. In order to better modularize and reorganize code, I need to import existing Scala classes, packages or functions into the notebook, skipping crearing a jar file (much the same as in PySpark). Is this possible? Thanks
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
03-07-2018
12:40 PM
For me, proxy settings (no matter if they were set at Intellij, SBT.conf or environment variables), did not work. A couple of considerations that solved this issue (for me at least): - if you use SBT 0.13.16 (not newer that that) - set Use Auto Import Then, no "FAILED DOWNLOADS" messages appear.
... View more
02-26-2018
02:37 PM
A couple of considerations that solved this issue: if you use SBT 0.13.16 (not newer that that) and set Use Auto Import, then no download issues occur.
When you do like this, no "FAILED DOWNLOADS" messages appear.
... View more
02-23-2018
07:27 PM
I am trying to use Scala/Spark from IntelliJ in Windows 7, but it IntelliJ (and SBT command line) fails to download files. I am behind a proxy server. Similar problems were already reported here. Already tried: - SBT versions 0.13.16, 1.0.3 and 1.1.1 - setting proxy properties JAVA_OPTS, SBT_OPTS, sbtconfigtxt -Dhttp.proxyHost=***-Dhttp.proxyPort=***-Dhttp.proxyUser=***-Dhttp.proxyPassword=***-Dhttps.proxyHost=***-Dhttps.proxyPort=***-Dhttps.proxyUser=***-Dhttps.proxyPassword=*** without success - verified the issue in SBT: d:\Users\user1>sbt.bat
[info] Loading project definition from D:\Users\user1\project
[info] Updating {file:/D:/Users/user1/project/}user1-build...
[warn] [FAILED ] org.apache.logging.log4j#log4j-core;2.8.1!log4j-core.jar(t
est-jar): typesafe-ivy-releases: unable to get resource for org.apache.logging.l
og4j#log4j-core;2.8.1: res=https://repo.typesafe.com/typesafe/ivy-releases/org.a
pache.logging.log4j/log4j-core/2.8.1/test-jars/log4j-core-tests.jar: java.io.IOE
xception: Failed to authenticate with proxy (353ms)
[warn] [FAILED ] org.apache.logging.log4j#log4j-core;2.8.1!log4j-core.jar(t
est-jar): sbt-plugin-releases: unable to get resource for org.apache.logging.log
4j#log4j-core;2.8.1: res=https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases
/org.apache.logging.log4j/log4j-core/2.8.1/test-jars/log4j-core-tests.jar: java.
io.IOException: Failed to authenticate with proxy (8ms)
[warn] [FAILED ] org.apache.logging.log4j#log4j-core;2.8.1!log4j-core.jar(t
est-jar): public: unable to get resource for org/apache/logging/log4j#log4j-core
;2.8.1: res=https://repo1.maven.org/maven2/org/apache/logging/log4j/log4j-core/2
.8.1/log4j-core-2.8.1-tests.jar: java.io.IOException: Failed to authenticate wit
h proxy (8ms)
[warn] Detected merged artifact: [FAILED ] org.apache.logging.log4j#log4j-c
ore;2.8.1!log4j-core.jar(test-jar): (0ms).
[warn] ==== typesafe-ivy-releases: tried
[warn] ==== sbt-plugin-releases: tried
[warn] https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/org.apache.logg
ing.log4j/log4j-core/2.8.1/test-jars/log4j-core-tests.jar
[warn] ==== local: tried
[warn] d:\Users\user1\.ivy2\local\org.apache.logging.log4j\log4j-core\2.8.1\
test-jars\log4j-core-tests.jar
[warn] ==== public: tried
[warn] https://repo1.maven.org/maven2/org/apache/logging/log4j/log4j-core/2.8.
1/log4j-core-2.8.1-tests.jar
[warn] ==== local-preloaded-ivy: tried
[warn] d:\Users\user1\.sbt\preloaded\org.apache.logging.log4j\log4j-core\2.8
.1\test-jars\log4j-core-tests.jar
[warn] ==== local-preloaded: tried
[warn] file:/d:/Users/user1/.sbt/preloaded/org/apache/logging/log4j/log4j-co
re/2.8.1/log4j-core-2.8.1-tests.jar
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: FAILED DOWNLOADS ::
[warn] :: ^ see resolution messages for details ^ ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.apache.logging.log4j#log4j-core;2.8.1!log4j-core.jar(test-jar)
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[error] sbt.librarymanagement.ResolveException: download failed: org.apache.logg
ing.log4j#log4j-core;2.8.1!log4j-core.jar(test-jar)
[error] at sbt.internal.librarymanagement.IvyActions$.resolveAndRetrieve
(IvyActions.scala:331)
[error] at sbt.internal.librarymanagement.IvyActions$.$anonfun$updateEit
her$1(IvyActions.scala:205)
[error] at sbt.internal.librarymanagement.IvySbt$Module.$anonfun$withMod
ule$1(Ivy.scala:229)
[error] at sbt.internal.librarymanagement.IvySbt.$anonfun$withIvy$1(Ivy.
scala:190)
[error] at sbt.internal.librarymanagement.IvySbt.sbt$internal$libraryman
agement$IvySbt$$action$1(Ivy.scala:70)
[error] at sbt.internal.librarymanagement.IvySbt$$anon$3.call(Ivy.scala:
77)
[error] at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:95)
[error] at xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withCh
annelRetries$1(Locks.scala:80)
[error] at xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Lock
s.scala:99)
[error] at xsbt.boot.Using$.withResource(Using.scala:10)
[error] at xsbt.boot.Using$.apply(Using.scala:9)
[error] at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scal
a:60)
[error] at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:50)
[error] at xsbt.boot.Locks$.apply0(Locks.scala:31)
[error] at xsbt.boot.Locks$.apply(Locks.scala:28)
[error] at sbt.internal.librarymanagement.IvySbt.withDefaultLogger(Ivy.s
cala:77)
[error] at sbt.internal.librarymanagement.IvySbt.withIvy(Ivy.scala:185)
[error] at sbt.internal.librarymanagement.IvySbt.withIvy(Ivy.scala:182)
[error] at sbt.internal.librarymanagement.IvySbt$Module.withModule(Ivy.s
cala:228)
[error] at sbt.internal.librarymanagement.IvyActions$.updateEither(IvyAc
tions.scala:190)
[error] at sbt.librarymanagement.ivy.IvyDependencyResolution.update(IvyD
ependencyResolution.scala:20)
[error] at sbt.librarymanagement.DependencyResolution.update(DependencyR
esolution.scala:56)
[error] at sbt.internal.LibraryManagement$.resolve$1(LibraryManagement.s
cala:38)
[error] at sbt.internal.LibraryManagement$.$anonfun$cachedUpdate$12(Libr
aryManagement.scala:91)
[error] at sbt.util.Tracked$.$anonfun$lastOutput$1(Tracked.scala:68)
[error] at sbt.internal.LibraryManagement$.$anonfun$cachedUpdate$19(Libr
aryManagement.scala:104)
[error] at scala.util.control.Exception$Catch.apply(Exception.scala:224)
[error] at sbt.internal.LibraryManagement$.$anonfun$cachedUpdate$11(Libr
aryManagement.scala:104)
[error] at sbt.internal.LibraryManagement$.$anonfun$cachedUpdate$11$adap
ted(LibraryManagement.scala:87)
[error] at sbt.util.Tracked$.$anonfun$inputChanged$1(Tracked.scala:149)
[error] at sbt.internal.LibraryManagement$.cachedUpdate(LibraryManagemen
t.scala:118)
[error] at sbt.Classpaths$.$anonfun$updateTask$5(Defaults.scala:2353)
[error] at scala.Function1.$anonfun$compose$1(Function1.scala:44)
[error] at sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFuncti
ons.scala:42)
[error] at sbt.std.Transform$$anon$4.work(System.scala:64)
[error] at sbt.Execute.$anonfun$submit$2(Execute.scala:257)
[error] at sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.sc
ala:16)
[error] at sbt.Execute.work(Execute.scala:266)
[error] at sbt.Execute.$anonfun$submit$1(Execute.scala:257)
[error] at sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(Con
currentRestrictions.scala:167)
[error] at sbt.CompletionService$$anon$2.call(CompletionService.scala:32
)
[error] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error] at java.util.concurrent.Executors$RunnableAdapter.call(Executors
.java:511)
[error] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolE
xecutor.java:1149)
[error] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPool
Executor.java:624)
[error] at java.lang.Thread.run(Thread.java:748)
Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? [error] (*:update)
sbt.librarymanagement.ResolveException: download failed: org.apache.logging.log
4j#log4j-core;2.8.1!log4j-core.jar(test-jar)
... View more
Labels:
- Labels:
-
Apache Spark
01-25-2018
06:47 PM
Not yet. As of 2.6.4 I still see the issue.
... View more
01-08-2018
02:25 PM
What do you mean by 'upgrading the Ambari'? We have installed 2.6.3 from scratch are experiencing exactly the same issue, this is: all Zeppelin notebook custom configurations are lost when you restart Zeppelin service. Thanks.
... View more
12-15-2017
05:53 PM
I have a cluster running Spark on YARN (HDP 2.6). Without loosing YARN capabilities, I need to turn off Spark on certain nodes. How can I achieve this?
... View more
- Tags:
- Hadoop Core
- Spark
- YARN
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
12-11-2017
02:48 PM
When you use Spark from a Zeppelin notebook, and at the end you isse a context stop, sc.stop() it affects the context of other running Zeppeling notebooks, making them fail because of no Spark active context. They seem to be sharing the same Spark context. How can this be avoided?
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
11-01-2017
04:24 PM
We are considering implementing security in a progressive manner, on our HDP data lake: 1- At a first stage, basic authentication would comprise HDFS users, groups and ACLs. 2- After that we would incorporate AD/LDAP. 3- Final destination would be to add Kerberos + Knox. Between each two steps, data lake will continue to group steadily, incorporating new feeders. The question is: which are the caveats of proceeding in these three separated, progressive steps? Does this approach pose any complications to existing components (Hive, Spark, Kafka, Atlas, Ranger) and data stores? Thanks in advance!
... View more
Labels:
- Labels:
-
Apache Knox
10-30-2017
01:49 PM
No application log was generated. It looks that the Atlas hook is not working when you create objects from Hive view.
... View more
10-27-2017
07:05 PM
(This applies to HDP 2.6.1.) There seems to be an issue handling Atlas lineage for tables depending on where they were created. 1) If you create objects using hive command line, lineage is displayed correctly. create table test_table_hive2 as... See attachment for correct lineage display. No problem here. 2) If you perform the same operation from Hive View (previous version or version 2), either by typing in the CREATE sentence or by uploading the table from a file, lineage is not tracked (or at least displayed). See attached screenshot for an example.
... View more
Labels:
- Labels:
-
Apache Atlas
10-24-2017
03:54 PM
I am trying to create a new policy in Ranger. When you click on Add, 'Error creating policy' is shown. captura.jpg This is what /var/log/ranger/admin/xa_portal.log shows: 2017-10-24 15:51:28,710 [http-bio-6080-exec-12] INFO org.apache.ranger.common.RESTErrorUtil (RESTErrorUtil.java:345) - Request failed. loginId=holger_gov, logMessage=User 'holger_gov' does not have delegated-admin privilege on given resourcesjavax.ws.rs.WebApplicationException at org.apache.ranger.common.RESTErrorUtil.createRESTException(RESTErrorUtil.java:337) Connection test to Ranger Admin DB runs ok (jdbc:postgresql://localhost:5432/ranger). This is a 2.6.1 sandbox based environment. What can be causing this issue?
... View more
Labels:
- Labels:
-
Apache Ranger
10-23-2017
08:11 PM
Thanks. That check did the trick. (It is not explicitly indicated in the above mentioned tutorial, for Storm)
... View more
10-23-2017
07:58 PM
Thanks. I did include the referred jar as you suggested: storm jar /root/tutoriales/crosscomponent_scripts/storm-demo/lib/storm-samples-1.0-jar-with-dependencies.jar com.dsinpractice.storm.samples.WordCountTopology --cluster true --name storm-demo-topology-03 --path /user/storm/storm-hdfs-test-01 --topic my-topic-01 --jars /usr/hdp/current/atlas-client/hook/storm/atlas-storm-plugin-impl/storm-bridge-0.8.0.2.6.0.3-8.jar but now there is another missing class "AtlasHook": 4930 [main] INFO o.a.s.StormSubmitter - Initializing the registered ISubmitterHook [org.apache.atlas.storm.hook.StormAtlasHook]Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/atlas/hook/AtlasHook at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) Another jar that needs to be referred?
... View more
10-23-2017
07:31 PM
Yes, there is: [root@sandbox storm-demo]# ls -l /usr/hdp/current/atlas-client/hook/storm/atlas-storm-plugin-impl/*storm-bridge*
-rw-r--r-- 1 root root 27999 Apr 1 2017 /usr/hdp/current/atlas-client/hook/storm/atlas-storm-plugin-impl/storm-bridge-0.8.0.2.6.0.3-8.jar
... View more
10-23-2017
07:14 PM
Storm error check shows the same issue. Altough this was reported in an older version (2.5.x), it seems to be the same problem. How can it be workarounded for 2.6.x?
... View more
10-23-2017
07:04 PM
hdp-version-info.txt
When running Storm - Atlas integration for "Cross Component Lineage with Apache Atlas across Apache Sqoop, Hive, Kafka & Storm" tutorial, at step
sh 004-run-storm-job.sh we get a class not found exception: Exception in thread "main" org.apache.storm.hooks.SubmitterHookException: java.lang.ClassNotFoundException: org.apache.atlas.storm.hook.StormAtlasHook at org.apache.storm.StormSubmitter.invokeSubmitterHook(StormSubmitter.java:368) at org.apache.storm.StormSubmitter.submitTopologyAs(StormSubmitter.java:278) at org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:390) at org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:162) at com.dsinpractice.storm.samples.WordCountTopology.main(WordCountTopology.java:140)
Any hints here? storm-error-dump.txt Regards
... View more
Labels:
- Labels:
-
Apache Atlas
-
Apache Storm
04-11-2017
11:58 AM
1 Kudo
I am looking for an visualization tool or framework (preferrably opensource), with end user quality, capable of displaying analytics data at different geographic aggregation levels (eg. state / county / city / neighbourhood). Data source will be Spark / SparkSQL. Zeppelin is not an option, since is not intended to end users AFAIK. Recommendations will be appreciated.
... View more
- Tags:
- GIS
- visualization
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
03-08-2017
05:39 PM
2 Kudos
When it comes to BI integration (eg. consuming from Cognos/Tableau/Pentaho/SpagoBI), it is quite straightforward to see the similarity between Hive and a RDMBS. As in the old SQL-over-relational-DB times, the reporting engine just issues a query through JDBC/ODBC, and voilá. No question here. But... which would be an equivalent flow using Spark / SparkSQL? How does it match to BI engine? For example, suppose you have a data store (any Hadoop flavour like HDFS flat file or Hive or HBase) and a Spark process that grabs the data, creates RDDs from it, creates a dataframe, and then you query the latter using SparkSQL, and producing analytics results. This is not just a single query to a datastore. How do you execute this from the BI engine? Thanks!
... View more
Labels:
- Labels:
-
Apache Spark
03-08-2017
03:23 PM
For JDBC, is the Simba commercial driver the only option? Thanks!
... View more
03-08-2017
03:21 PM
Yes, we could connect to Thrift Server via ODBC: - Download and install ODBC driver for Spark from HDP downloads page. - Make sure Thrift Server is up and running (default port 10015). Double check with telnet to that port, for instance. - Configure ODBC driver like this: Driver=Hortonworks Spark ODBC Driver;Host=192.168.170.45;Port=10015;SparkServerType=3;AuthMech=2;ThriftTransport=1; On the other hand, I still need to connect via JDBC -without the Simba commercial driver. How could you do this? Regards, Fernando
... View more