About ashneesharma88

ashneesharma88 · ‎08-24-2017

@Shreya Gupta Check following link if it helps you to understand:- http://data-flair.training/blogs/dag-in-apache-spark/ https://www.quora.com/What-are-the-Apache-Spark-concepts-around-its-DAG-Directed-Acyclic-Graph-execution-engine-and-its-overall-architecture Spark features an advanced Directed Acyclic Graph (DAG) engine supporting cyclic data flow. Each Spark job creates a DAG of task stages to be performed on the cluster. Compared to MapReduce, which creates a DAG with two predefined stages - Map and Reduce, DAGs created by Spark can contain any number of stages. This allows some jobs to complete faster than they would in MapReduce, with simple jobs completing after just one stage, and more complex tasks completing in a single run of many stages, rather than having to be split into multiple jobs. Spark jobs perform work on Resilient Distributed Datasets (RDDs), an abstraction for a collection of elements that can be operated on in parallel. When running Spark in a Hadoop cluster, RDDs are created from files in the distributed file system in any format supported by Hadoop, such as text files, SequenceFiles, or anything else supported by a Hadoop InputFormat. Once data is read into an RDD object in Spark, a variety of operations can be performed by calling abstract Spark APIs. The two major types of operation available are: Transformations: Transformations return a new, modified RDD based on the original. Several transformations are available through the Spark API, including map(), filter(), sample(), and union(). Actions: Actions return a value based on some computation being performed on an RDD. Some examples of actions supported by the Spark API include reduce(), count(), first(), and foreach(). Some Spark jobs will require that several actions or transformations be performed on a particular data set, making it highly desirable to hold RDDs in memory for rapid access. Spark exposes a simple API to do this - cache(). Once this API is called on an RDD, future operations called on the RDD will return in a fraction of the time they would if retrieved from disk.

ashneesharma88 · ‎08-24-2017

@Sonu sahi I have checked all possibilities like disk size, hostname, network. There is no issue with disk, network.

ashneesharma88 · ‎08-23-2017

Hi, I am using HDP 2.5 on azure HDinsight cluster. Yesterday it cluster was working fine and I was able to put data from local to hdfs from one client node. But since morning I am getting error while putting data to hdfs. Error:- hdfs dfs -put abc /shaz -put: Self-suppression not permitted Usage: hadoop fs [generic options] -put [-f] [-p] [-l] <localsrc> ... <dst> Thanks in Advance...!!!!!!!!!

ashneesharma88 · ‎05-11-2017

@Bala Vignesh N V I have found the acurate issue. Issue was kerberos cross realm. I have configured cross realm:- https://community.hortonworks.com/articles/18686/kerberos-cross-realm-trust-for-distcp.html

ashneesharma88 · ‎05-10-2017

We have two secure cluster with kerbores. While doing distcp getting following error. Error:- Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER) at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:710) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248) at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193) ... 38 more Caused by: KrbException: Server not found in Kerberos database (7) - UNKNOWN_SERVER at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73) at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:192) at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:203) at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:311) at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:115) at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:449) at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:641) ... 41 more Caused by: KrbException: Identifier doesn't match expected value (906) at sun.security.krb5.internal.KDCRep.init(KDCRep.java:143) at sun.security.krb5.internal.TGSRep.init(TGSRep.java:66) at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:61) at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)

ashneesharma88 · ‎04-11-2017

you can write a job and schedule it from oozie/azkaban.

ashneesharma88 · ‎03-14-2017

Hi, My query is from where we can find out the high queue priority details.or how can we find out which queue is on high proirity

ashneesharma88 · ‎03-09-2017

how can we find what is the high priority queue in a map reduce. And from where we can find out the high queue prioroty details.

ashneesharma88 · ‎02-01-2017

@Artem Ervits Is there any docs link to go through for more details? And this will be provide me the per job utilization details or overall cluster level detail?

ashneesharma88 · ‎01-31-2017

Hi Is there any utility to find out the resource utilization for particular job level. Like:- CPU, memory, per queue utilization, time utilization.

Online	Offline
Last Visited	‎07-18-2024 05:04 PM

Member Since	‎01-04-2016 12:48 PM
Last Visited	‎07-18-2024 05:04 PM
Posts	409
Kudos received	313

Cloudera Community

Re: getting issue from spark-sql.

Re: hive and hbase issue.

Re: Getting erro in hive.

Re: getting error for hadoop command.

Re: Getting error while doing distcp with two secu...

Re: What is Directed Acyclic Graph in Apache Spark...

Re: getting error for hadoop command.

getting error for hadoop command.

Re: Getting error while doing distcp with two secu...

Getting error while doing distcp with two secure c...

Re: Schedule Major Compaction Using Cron Job

Re: FInd queue details.

FInd queue details.

Re: Utility to for utilization.

Utility to for utilization.