Member since
01-04-2016
409
Posts
313
Kudos Received
35
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 6971 | 01-16-2018 07:00 AM | |
| 2610 | 09-13-2017 06:17 PM | |
| 4964 | 09-13-2017 05:58 AM | |
| 3165 | 08-28-2017 07:16 AM | |
| 4737 | 05-11-2017 11:30 AM |
08-24-2017
05:02 AM
2 Kudos
@Shreya Gupta Check following link if it helps you to understand:- http://data-flair.training/blogs/dag-in-apache-spark/ https://www.quora.com/What-are-the-Apache-Spark-concepts-around-its-DAG-Directed-Acyclic-Graph-execution-engine-and-its-overall-architecture Spark features an advanced Directed Acyclic Graph (DAG) engine supporting cyclic data flow. Each Spark job creates a DAG of task stages to be performed on the cluster. Compared to MapReduce, which creates a DAG with two predefined stages - Map and Reduce, DAGs created by Spark can contain any number of stages. This allows some jobs to complete faster than they would in MapReduce, with simple jobs completing after just one stage, and more complex tasks completing in a single run of many stages, rather than having to be split into multiple jobs. Spark jobs perform work on Resilient Distributed Datasets (RDDs), an abstraction for a collection of elements that can be operated on in parallel. When running Spark in a Hadoop cluster, RDDs are created from files in the distributed file system in any format supported by Hadoop, such as text files, SequenceFiles, or anything else supported by a Hadoop InputFormat. Once data is read into an RDD object in Spark, a variety of operations can be performed by calling abstract Spark APIs. The two major types of operation available are:
Transformations: Transformations return a new, modified RDD based on the original. Several transformations are available through the Spark API, including map(), filter(), sample(), and union(). Actions: Actions return a value based on some computation being performed on an RDD. Some examples of actions supported by the Spark API include reduce(), count(), first(), and foreach(). Some Spark jobs will require that several actions or transformations be performed on a particular data set, making it highly desirable to hold RDDs in memory for rapid access. Spark exposes a simple API to do this - cache(). Once this API is called on an RDD, future operations called on the RDD will return in a fraction of the time they would if retrieved from disk.
... View more
08-24-2017
04:50 AM
@Sonu sahi I have checked all possibilities like disk size, hostname, network. There is no issue with disk, network.
... View more
08-23-2017
01:36 PM
Hi, I am using HDP 2.5 on azure HDinsight cluster. Yesterday it cluster was working fine and I was able to put data from local to hdfs from one client node. But since morning I am getting error while putting data to hdfs. Error:- hdfs dfs -put abc /shaz
-put: Self-suppression not permitted
Usage: hadoop fs [generic options] -put [-f] [-p] [-l] <localsrc> ... <dst> Thanks in Advance...!!!!!!!!!
... View more
Labels:
- Labels:
-
Apache Hadoop
05-11-2017
11:30 AM
1 Kudo
@Bala Vignesh N V I have found the acurate issue. Issue was kerberos cross realm. I have configured cross realm:- https://community.hortonworks.com/articles/18686/kerberos-cross-realm-trust-for-distcp.html
... View more
05-10-2017
10:20 AM
1 Kudo
We have two secure cluster with kerbores. While doing distcp getting following error. Error:- Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:710)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193)
... 38 more
Caused by: KrbException: Server not found in Kerberos database (7) - UNKNOWN_SERVER
at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:192)
at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:203)
at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:311)
at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:115)
at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:449)
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:641)
... 41 more
Caused by: KrbException: Identifier doesn't match expected value (906)
at sun.security.krb5.internal.KDCRep.init(KDCRep.java:143)
at sun.security.krb5.internal.TGSRep.init(TGSRep.java:66)
at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:61)
at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
... View more
04-11-2017
08:15 AM
you can write a job and schedule it from oozie/azkaban.
... View more
03-14-2017
10:29 AM
Hi, My query is from where we can find out the high queue priority details.or how can we find out which queue is on high proirity
... View more
03-09-2017
05:43 PM
1 Kudo
how can we find what is the high priority queue in a map reduce. And from where we can find out the high queue prioroty details.
... View more
Labels:
- Labels:
-
Apache Hadoop
02-01-2017
06:46 AM
@Artem Ervits Is there any docs link to go through for more details? And this will be provide me the per job utilization details or overall cluster level detail?
... View more
01-31-2017
10:59 AM
1 Kudo
Hi Is there any utility to find out the resource utilization for particular job level. Like:- CPU, memory, per queue utilization, time utilization.
... View more
Labels:
- Labels:
-
Apache Hadoop