Member since
01-12-2016
123
Posts
12
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1531 | 12-12-2016 08:59 AM |
12-14-2020
02:37 AM
echo "scan 'emp'" | $HBASE_HOME/bin/hbase shell | awk -F'=' '{print $2}' | awk -F ':' '{print $2}'|awk -F ',' '{print $1}'
... View more
08-13-2020
09:08 AM
@torafca5 Could you please try downloading the jar from the below link, http://www.congiu.net/hive-json-serde/1.3.8/hdp23/json-serde-1.3.8-jar-with-dependencies.jar Once the jar is downloaded, move the jar to the location /usr/hdp/3.0.1.0-187/hive/lib. Please place the jar on all the nodes hosting Hive services. Also, please make sure you are not using LLAP(HiveserverInteractive) to connect to the hive. add jar command does not work with LLAP. implementing the above recommendation should help overcome this issue.
... View more
11-26-2018
02:13 PM
2 Kudos
@vamsi valiveti Shuffling is the process of transferring data from the mappers to the reducers, so I think it is obvious that it is necessary for the reducers, since otherwise, they wouldn't be able to have any input (or input from every mapper). Shuffling can start even before the map phase has finished, to save some time. That's why you can see a reduce status greater than 0% (but less than 33%) when the map status is not yet 100%. Sorting saves time for the reducer, helping it easily distinguish when a new reduce task should start. It simply starts a new reduce task, when the next key in the sorted input data is different than the previous, to put it simply. Each reduce task takes a list of key-value pairs, but it has to call the reduce() method which takes a key-list(value) input, so it has to group values by key. It's easy to do so, if input data is pre-sorted (locally) in the map phase and simply merge-sorted in the reduce phase (since the reducers get data from many mappers). A great source of information for these steps is this Yahoo tutorial. A nice graphical representation of this is the following: Note that shuffling and sorting are not performed at all if you specify zero reducers (setNumReduceTasks(0)). Then, the MapReduce job stops at the map phase, and the map phase does not include any kind of sorting (so even the map phase is faster) Ref Please accept the answer you found most useful
... View more
11-15-2018
06:34 PM
@vamsi valiveti I have found Learning spark to be very helpful for the beginners. https://www.oreilly.com/library/view/learning-spark/9781449359034/
... View more
10-13-2018
09:00 AM
In count(*) query, final aggregation vertex should always be single task which fetches count of all the mappers & sumup all of the them. If my response helped your query, accept the answer. It might help others in the community.
... View more
10-11-2018
04:09 AM
1 Kudo
@vamsi valiveti, You need atleast the classname to get all the methods available in the class. I can think of a solution without Google. Method 1: Run this command (Replace jar-path with real jar path) jar -tf {jar-path} | grep -i class | sed -e 's/\//./g' | sed -e 's/\.class//g' | xargs javap -classpath {jar-path} Method 2: You can open the Jar file and check the list of the classes and then list the methods in the class 1) Check the classnames using vim (not vi) vim Piggybank.jar 2) Take the clasname in which you want to list the methods (copy the path including package name) javap -classpath {path-to-jar-file} {full-class-name-including-package-name}
ex: javap -classpath example.jar org.apache.hadoop.xyz.Abc (Abc is the class name) . If this helps, please take a moment to login and Accept the answer.
... View more
09-22-2018
01:17 PM
Naveen, Thanks for exhaustive answer. I am a newbie so I might be wrong, but after some experiments I tend to believe that the current output.format.string, as it is written in tutorial is wrong. Currently it is: "%1$$s %2$$s %3$$s %4$$s %5$$s %6$$s %7$$s %8$$s %9$$s" I believe it should be: "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s" What makes me think so? I have tried, just for fun and experimenting, inserting a new row in intermediate_access_log table in hive. And the original output.format.string was making the statement to fail. After the change of the format string, the new row was nicely inserted.
... View more
03-07-2017
08:37 AM
Thanks for comments.I will do it definately starting from this post.
... View more
01-17-2017
02:58 PM
1. Only once 2. Use decision property https://www.infoq.com/articles/oozieexample/
... View more
02-01-2017
02:49 AM
@vamsi valiveti you need to escape parenthesis with double forward slashes grunt> a = load 'data' using PigStorage(',');
grunt> b = filter a by ($1 matches '{\\(\\)}');
2017-02-01 02:45:07,159 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
grunt> dump b;
output Output(s):
Successfully stored 1 records (17 bytes) in: "hdfs://sandbox.hortonworks.com:8020/tmp/temp-1129941617/tmp-1428622787"
2017-02-01 02:49:30,801 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2017-02-01 02:49:30,811 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2017-02-01 02:49:30,811 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(Gietz,{()})
... View more