About lgeorge

lgeorge · ‎05-05-2017

The Spark Component Guide and Command Line Installation Guide were updated to reflect new Spark features. Here are links to several of the latest features: Support for Spark 2, documented in several topics including: Installing Spark Using Ambari Installing and Configuring Apache Spark 2 (manual installation) Running Spark Configuring Spark2 for Wire Encryption Automating Spark Jobs with Oozie Spark Action Using Livy with Spark Versions 1 and 2 Livy API information, in Submitting Spark Applications Through Livy Enabling Spark SQL user impersonation for the Spark Thrift Server (doAs support), in Configuring the Spark Thrift Server The Zeppelin Component Guide was updated with additional details and examples for configuring Zeppelin with LDAP/AD and Kerberos security; see Configuring Zeppelin Security. In addition, the documentation for interpreters and user impersonation was extended. Portions of this information that apply to HDP 2.5 were also added to the Security chapter in the HDP 2.5 Zeppelin Component Guide. In the messaging area, the Kafka Component Guide has additional information in Configuring Kafka for a Production Environment.

lgeorge · ‎04-24-2017

@Daniel Müller this is a general comment so it might not help, but there's a set of custom property boxes under the Spark service Configs tab, including one called Custom spark-hive-site-override. The Spark guide describes a similar custom property step (for doAs support) on the following page in the Spark guide, under the Ambari subsection: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_spark-component-guide/content/config-sts-user-imp.html

lgeorge · ‎03-13-2017

Yes, documentation planned to sync with HDP 2.6 GA release.

lgeorge · ‎02-22-2017

@Dinesh Chitlangia I'd also ask about your goals. If you plan to focus more on analytics, Python should support more statistical packages/libraries. There is also a Java API for Spark, which might get you started with Spark constructs more quickly; see https://spark.apache.org/docs/0.9.1/java-programming-guide.html. When I was thinking about a similar question the following article was helpful: https://datasciencevademecum.wordpress.com/2016/01/28/6-points-to-compare-python-and-scala-for-data-science-using-apache-spark/

lgeorge · ‎01-20-2017

@Edgar Daeds, from what I understand, not yet--you need to use multiple paragraphs. @jzhang, to confirm: did you mean (for now, in HDP 2.5.x) that they can run 10 queries in parallel, in 10 separate paragraphs?

lgeorge · ‎01-17-2017

@Christian Guegi I believe there are constraints for Kafka based on versions and on whether a cluster has Kerberos enabled or not. I'll try to find someone to respond for 2.4 to 2.5.3.

lgeorge · ‎12-14-2016

In general, you can find Spark-HDP version info in the Spark Component Guide. For example, see http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_spark-component-guide/content/ch_introduction-spark.html. For HDP-Ambari version compatibility, see the stack compatibility section of the Ambari Installation Guide; for example, http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-installation/content/determine_stack_compatibility.html

lgeorge · ‎12-13-2016

@Andi Sonde , not sure if you came across the following in your research: http://kafka.apache.org/documentation.html#basic_ops_racks.

lgeorge · ‎12-01-2016

There are many ways to run Hadoop on virtual machines. Earlier this year I tried several approaches, and ended up using a helpful Quick Start Guide written by Yusaku Sako. The Quick Start uses VirtualBox, Vagrant, and predefined scripts to set up a multi-node HDP cluster. You can choose which version of Ambari to install, and then choose and install an associated version of the HDP stack. For anyone new to virtual machines, there is now a Quick Start for New VM Users. The extended version adds background information and additional details for installing Ambari and the HDP stack. Topics include: Terminology Prerequisites Installing VirtualBox and Vagrant Starting Linux Virtual Machines Accessing Virtual Machines Installing Ambari Installing the HDP Stack Troubleshooting Reference information for basic Vagrant commands

lgeorge · ‎11-29-2016

It looks like you have R installed; is it on all nodes in your cluster? There is also a requirement to set JAVA_HOME. If you have access to Spark directly you might want to try accessing R from Spark first, to help isolate the issue. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/ch_spark-r.html

Online	Offline
Last Visited	‎08-08-2019 07:47 PM

Member Since	‎09-10-2015 08:46 PM
Last Visited	‎08-08-2019 07:47 PM
Posts	93
Kudos received	31

Cloudera Community

Re: how to install Solr using Ambari view?

Re: Dynamic allocation on Spark Standalone cluster

Re: spark2 not accessable

Re: apache spark 2.0

Re: Zeppelin: %hive and %phoenix interpreters in 2...

HDP 2.6 Documentation Updates for Data Science Com...

Re: Spark Settings with Ambari: Can't find corresp...

Re: Is Kerberos required for zeppelin hive identit...

Re: What language should I use to learn Spark?

Re: Does zeppelin support multiple hive queries in...

Re: Kafka Rolling Upgrade

Re: Spark 1.5.2 installation

Re: Can anyone explain Kafka rack awareness featur...

Ambari and HDP Installation: Quick Start for new V...

Re: HDP, Zeppelin Notebook, Livy, Spark & R: "requ...