Member since
01-12-2016
123
Posts
12
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1500 | 12-12-2016 08:59 AM |
01-15-2019
10:08 AM
Hi All, Any Input on my clarifications?Faced this scenario one more time
... View more
11-24-2018
11:37 AM
Which one will occur first in MapReduce Flow among shuffling and sorting? To my knowledge shuffling will occur first and then Sorting? Correct me I am wrong. Any body can explain these two things? Below statement from the Definative guide: MapReduce makes the guarantee that the input to every reducer is sorted by key. The process by which the system performs the sort—and transfers the map outputs to the reducers as inputs—is known as the shuffle.
... View more
Labels:
- Labels:
-
Apache Hadoop
11-15-2018
08:56 AM
@Aditya Sirna Do you mean if we are familiar with Python,We can Work on Spark.In Real time Projects only Python is sufficient. Do I need to learn Scala or Java for real time Projects?
... View more
11-15-2018
05:47 AM
Could anybody guide me what is the learning path for Spark?
I am familiar with Hadoop,Hive,Pig,sqoop,oozie,Python and Hbase.I do not know much about Java.
Do I need to learn both Java and Scala to start with spark?
I am completed confused where to start for Spark?
... View more
Labels:
- Labels:
-
Apache Spark
10-13-2018
05:13 AM
I have set the No of reducers to 2 but still Hive is executing with 1.Any body help on this
set hive.exec.reducers.max=2 Hive (default)> insert overwrite directory '/input123456'
> select count(*) from partitioned_user;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201810122125_0003, Tracking URL = http://ubuntu:50030/jobdetails.jsp?jobid=job_201810122125_0003
Kill Command = /home/naresh/Work1/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201810122125_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-10-12 21:36:24,774 Stage-1 map = 0%, reduce = 0%
2018-10-12 21:36:32,825 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.12 sec
2018-10-12 21:36:41,919 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 4.12 sec
2018-10-12 21:36:42,926 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.38 sec
MapReduce Total cumulative CPU time: 6 seconds 380 msec
Ended Job = job_201810122125_0003
Moving data to: /input123456
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 6.38 sec HDFS Read: 354134 HDFS Write: 5 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 380 msec
OK
_c0
Time taken: 37.199 seconds
... View more
Labels:
- Labels:
-
Apache Hive
10-11-2018
02:24 AM
How to get the list of functions available in any jar file? Let us say I have Piggybank.Jar.It contains Reverse,UnixToISO() etc. Is there any command to get list of functions available in Jar file rather than using Google for it?
... View more
Labels:
- Labels:
-
Apache Pig
03-07-2017
08:37 AM
Thanks for comments.I will do it definately starting from this post.
... View more
03-03-2017
08:48 AM
Thanks for input.what is the problem with my relation C. STRSPLIT will generate tuple as output.Here it will consists of two fields in a tuple. (a1:chararray, a1of1:chararray) is also a tuple since it is enclosed in parentheses and also consists of two fields
... View more