About vamsi123

Manoj690 · ‎12-14-2020

echo "scan 'emp'" | $HBASE_HOME/bin/hbase shell | awk -F'=' '{print $2}' | awk -F ':' '{print $2}'|awk -F ',' '{print $1}'

Krishnadevan · ‎08-13-2020

@torafca5 Could you please try downloading the jar from the below link, http://www.congiu.net/hive-json-serde/1.3.8/hdp23/json-serde-1.3.8-jar-with-dependencies.jar Once the jar is downloaded, move the jar to the location /usr/hdp/3.0.1.0-187/hive/lib. Please place the jar on all the nodes hosting Hive services. Also, please make sure you are not using LLAP(HiveserverInteractive) to connect to the hive. add jar command does not work with LLAP. implementing the above recommendation should help overcome this issue.

jagadeesan · ‎11-26-2018

@vamsi valiveti Shuffling is the process of transferring data from the mappers to the reducers, so I think it is obvious that it is necessary for the reducers, since otherwise, they wouldn't be able to have any input (or input from every mapper). Shuffling can start even before the map phase has finished, to save some time. That's why you can see a reduce status greater than 0% (but less than 33%) when the map status is not yet 100%. Sorting saves time for the reducer, helping it easily distinguish when a new reduce task should start. It simply starts a new reduce task, when the next key in the sorted input data is different than the previous, to put it simply. Each reduce task takes a list of key-value pairs, but it has to call the reduce() method which takes a key-list(value) input, so it has to group values by key. It's easy to do so, if input data is pre-sorted (locally) in the map phase and simply merge-sorted in the reduce phase (since the reducers get data from many mappers). A great source of information for these steps is this Yahoo tutorial. A nice graphical representation of this is the following: Note that shuffling and sorting are not performed at all if you specify zero reducers (setNumReduceTasks(0)). Then, the MapReduce job stops at the map phase, and the map phase does not include any kind of sorting (so even the map phase is faster) Ref Please accept the answer you found most useful

yuvapraveen_k · ‎11-15-2018

@vamsi valiveti I have found Learning spark to be very helpful for the beginners. https://www.oreilly.com/library/view/learning-spark/9781449359034/

nramanaiah · ‎10-13-2018

In count(*) query, final aggregation vertex should always be single task which fetches count of all the mappers & sumup all of the them. If my response helped your query, accept the answer. It might help others in the community.

asirna · ‎10-11-2018

@vamsi valiveti, You need atleast the classname to get all the methods available in the class. I can think of a solution without Google. Method 1: Run this command (Replace jar-path with real jar path) jar -tf {jar-path} | grep -i class | sed -e 's/\//./g' | sed -e 's/\.class//g' | xargs javap -classpath {jar-path} Method 2: You can open the Jar file and check the list of the classes and then list the methods in the class 1) Check the classnames using vim (not vi) vim Piggybank.jar 2) Take the clasname in which you want to list the methods (copy the path including package name) javap -classpath {path-to-jar-file} {full-class-name-including-package-name} ex: javap -classpath example.jar org.apache.hadoop.xyz.Abc (Abc is the class name) . If this helps, please take a moment to login and Accept the answer.

pkaluski · ‎09-22-2018

Naveen, Thanks for exhaustive answer. I am a newbie so I might be wrong, but after some experiments I tend to believe that the current output.format.string, as it is written in tutorial is wrong. Currently it is: "%1$$s %2$$s %3$$s %4$$s %5$$s %6$$s %7$$s %8$$s %9$$s" I believe it should be: "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s" What makes me think so? I have tried, just for fun and experimenting, inserting a new row in intermediate_access_log table in hive. And the original output.format.string was making the statement to fail. After the change of the format string, the new row was nicely inserted.

vamsi123 · ‎03-07-2017

Thanks for comments.I will do it definately starting from this post.

aervits · ‎01-17-2017

1. Only once 2. Use decision property https://www.infoq.com/articles/oozieexample/

aervits · ‎02-01-2017

@vamsi valiveti you need to escape parenthesis with double forward slashes grunt> a = load 'data' using PigStorage(','); grunt> b = filter a by ($1 matches '{\$\$}'); 2017-02-01 02:45:07,159 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s). grunt> dump b; output Output(s): Successfully stored 1 records (17 bytes) in: "hdfs://sandbox.hortonworks.com:8020/tmp/temp-1129941617/tmp-1428622787" 2017-02-01 02:49:30,801 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2017-02-01 02:49:30,811 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2017-02-01 02:49:30,811 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 (Gietz,{()})

Online	Offline
Last Visited	‎01-29-2018 02:47 AM

Member Since	‎01-12-2016 07:23 AM
Last Visited	‎01-29-2018 02:47 AM
Posts	123
Kudos received	12

Cloudera Community

Re: Pig converting tuple to bag

Re: Hbase complete list of columns in column famil...

Re: Hive and XML Parsing

Re: Map reduce Flow clarification

Re: Spark Learning Clarification

Re: No of Reducers are not working on Hive

Re: List of Functions in a Jar File

Re: Hive SERDEPROPERTIES clarification

Re: Pig Incompatable schema

Re: oozie workflow in Production

Re: Filtering {()} records from a relation