About vshukla

LSIMS · ‎07-09-2025

how to use Numpy?

KenTabuchi · ‎02-14-2022

Hi @vshukla, Thank you for the article. Could you help us with the insights around the Deployment Choices reasons, please? My customer wants to know why to justify deploying Memory: Minimum of 64 GB node and Cores: Minimum of 8 cores, especially. Thank you!

HiveLoadSpark · ‎09-30-2019

I am also looking for something like this. I need to convert all my date datatypes to varchar in a dataframe having more than 300 columns. Any suggestions??

dreamzdaisy · ‎09-23-2017

5 Common use cases for Apache Spark: Streaming ingest and analytics Spark isn’t the first big data tool for handling streaming ingest, but it is the first one to integrate it with the rest of the analytic environment. Spark is friendly with the rest of the streaming data ecosystem, supporting data sources including Flume, Kafka, ZeroMQ, and HDFS. Exploratory analytics One of the headline benefits of using Spark is that you no longer need to maintain different environments for exploratory and production work. The relatively long execution times of a Hadoop MapReduce job make it difficult for hands-on exploration of data: data scientists typically still must sample data if they want to move quickly. Thanks to the speed of Spark’s in-memory capabilities, interactive exploration can now happen completely within Spark , without the need for Java engineering or sampling of the data. Model building and machine learning Spark’s status as a big data tool that data scientists find easy to use makes it ideal for building models for analytical purposes. In a pre-Spark world, big data modelers typically built their models in a language such as R or SAS, then threw them to data engineers to re-implement in Java for production on Hadoop. Graph analysis By incorporating the GraphX component, Spark brings all the benefits of using its environment to graph computation: enabling use cases such as social network analysis, fraud detection, and recommendations. Simpler, faster, ETL Though less glamorous than the analytical applications, ETL is often the lion’s share of data workloads. If the rest of your data pipeline is based on Spark, then the benefits of using Spark for ETL are obvious, with consequent increases in maintainability and code-reuse.

vshukla · ‎05-25-2017

This is applicable in a Kerberos enabled HDP 2.5.x cluster with Zeppelin, Livy & Spark. Post successful Kerberos setup, log in to Zeppelin and run Spark note, the note runs file. But running simple sc.version from livy interpreter gives "Cannot start spark" in the Zeppelin UI. In the Livy log at /var/log/livy/livy-livy-server.out you may find a message similar to the following. INFO: 17/05/25 21:24:12 INFO metastore: Trying to connect to metastore with URI thrift://vinay-hdp25-2.field.hortonworks.com:9083 May 25, 2017 9:24:12 PM org.apache.spark.launcher.OutputRedirector redirect INFO: 17/05/25 21:24:12 ERROR TSaslTransport: SASL negotiation failure May 25, 2017 9:24:12 PM org.apache.spark.launcher.OutputRedirector redirect INFO: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] This happens when Livy tried to connect to Hive Metastore and fails with above message. The fix is to configure Zeppelin's Livy interpreter to run in yarn-cluster mode, instead of the default yarn-client mode. After you change any interpreter configuration, you will need to restart the interpreter. Below works. livy.spark.master yarn-cluster Starting HDP 2.6.x this configuration is changed OOB to yarn-cluster.

rgelhausen · ‎08-10-2016

I found the answer. In the Spark interpreter menu there is a "zeppelin.spark.printREPLOutput" property which you can set to false.

pacosoplas · ‎10-16-2016

Hi: I have edit vi /usr/hdp/current/zeppelin-server/lib/conf/shiro.ini: [urls] /api/version = anon #/** = anon /** = authcBasic [users] admin = admin hdfs = hdfs and restart zeppelin, but the login doesnt appear, just the anonimous user. I need anything else ???

teravaidya · ‎04-04-2018

May I know the fix for this error?

pminovic · ‎05-06-2016

Hi @Mike Vogt, thanks and glad to hear it worked. Can you kindly accept the answer and thus help us managing answered questions. Tnx!

vshukla · ‎04-19-2016

See Slides & video for the session.

Online	Offline
Last Visited	‎12-08-2017 06:12 PM

Member Since	‎09-29-2015 03:27 AM
Last Visited	‎12-08-2017 06:12 PM
Posts	122
Kudos received	149

Cloudera Community

Re: Exception: Session not found, Livy server woul...

Re: Spark 2 and Zeppelin

Re: Does the HiveContext object expire in Zeppelin...

Re: Adding libraries to Zeppelin

Re: Zeppelin Error - bash interpreter running a Hi...

Re: Adding libraries to Zeppelin

Re: Zeppelin Best Practices

Re: How to change column Type in SparkSQL?

Re: What are common use cases for Spark and Data s...

Livy Interpreter: Cannot start spark

Re: Is there a way to show only form output in a Z...

Re: APACHE ZEPPELIN ON HDP 2.4.2

Re: Configuring Zeppelin Spark Interpreters

Re: Cannot get Apache Spark to start

Running Spark in Production?