Member since
09-29-2015
122
Posts
159
Kudos Received
26
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6717 | 11-12-2016 12:32 AM | |
1925 | 10-05-2016 08:08 PM | |
2641 | 08-02-2016 11:29 PM | |
23359 | 06-24-2016 11:46 PM | |
2062 | 05-25-2016 11:12 PM |
02-14-2022
11:59 PM
Hi @vshukla, Thank you for the article. Could you help us with the insights around the Deployment Choices reasons, please? My customer wants to know why to justify deploying Memory: Minimum of 64 GB node and Cores: Minimum of 8 cores, especially. Thank you!
... View more
09-30-2019
10:55 AM
I am also looking for something like this. I need to convert all my date datatypes to varchar in a dataframe having more than 300 columns. Any suggestions??
... View more
09-23-2017
06:20 AM
5 Common use cases for Apache Spark: Streaming ingest and analytics Spark isn’t the first big data tool for handling streaming ingest, but it is the first one to integrate it with the rest of the analytic environment. Spark is friendly with the rest of the streaming data ecosystem, supporting data sources including Flume, Kafka, ZeroMQ, and HDFS. Exploratory analytics One of the headline benefits of using Spark is that you no longer need to maintain different environments for exploratory and production work. The relatively long execution times of a Hadoop MapReduce job make it difficult for hands-on exploration of data: data scientists typically still must sample data if they want to move quickly. Thanks to the speed of Spark’s in-memory capabilities, interactive exploration can now happen completely within Spark , without the need for Java engineering or sampling of the data. Model building and machine learning Spark’s status as a big data tool that data scientists find easy to use makes it ideal for building models for analytical purposes. In a pre-Spark world, big data modelers typically built their models in a language such as R or SAS, then threw them to data engineers to re-implement in Java for production on Hadoop. Graph analysis By incorporating the GraphX component, Spark brings all the benefits of using its environment to graph computation: enabling use cases such as social network analysis, fraud detection, and recommendations. Simpler, faster, ETL Though less glamorous than the analytical applications, ETL is often the lion’s share of data workloads. If the rest of your data pipeline is based on Spark, then the benefits of using Spark for ETL are obvious, with consequent increases in maintainability and code-reuse.
... View more
05-25-2017
09:46 PM
This is applicable in a Kerberos enabled HDP 2.5.x cluster with Zeppelin, Livy & Spark. Post successful Kerberos setup, log in to Zeppelin and run Spark note, the note runs file. But running simple sc.version from livy interpreter gives "Cannot start spark" in the Zeppelin UI. In the Livy log at /var/log/livy/livy-livy-server.out you may find a message similar to the following. INFO: 17/05/25 21:24:12 INFO metastore: Trying to connect to metastore with URI thrift://vinay-hdp25-2.field.hortonworks.com:9083 May 25, 2017 9:24:12 PM org.apache.spark.launcher.OutputRedirector redirect INFO: 17/05/25 21:24:12 ERROR TSaslTransport: SASL negotiation failure May 25, 2017 9:24:12 PM org.apache.spark.launcher.OutputRedirector redirect INFO: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] This happens when Livy tried to connect to Hive Metastore and fails with above message. The fix is to configure Zeppelin's Livy interpreter to run in yarn-cluster mode, instead of the default yarn-client mode. After you change any interpreter configuration, you will need to restart the interpreter. Below works. livy.spark.master yarn-cluster Starting HDP 2.6.x this configuration is changed OOB to yarn-cluster.
... View more
Labels:
08-10-2016
04:18 PM
I found the answer. In the Spark interpreter menu there is a "zeppelin.spark.printREPLOutput" property which you can set to false.
... View more
10-16-2016
02:34 PM
Hi: I have edit vi /usr/hdp/current/zeppelin-server/lib/conf/shiro.ini: [urls]
/api/version = anon
#/** = anon
/** = authcBasic
[users]
admin = admin
hdfs = hdfs
and restart zeppelin, but the login doesnt appear, just the anonimous user. I need anything else ???
... View more
05-06-2016
10:37 AM
Hi @Mike Vogt, thanks and glad to hear it worked. Can you kindly accept the answer and thus help us managing answered questions. Tnx!
... View more