About prklearning

prklearning · ‎09-23-2016

thank you however ,Is there any way to calculate the appropriate number of reducers for a particular operation. I observed that increasing the number of reduces might also bring down the performance . in some cases .

prklearning · ‎09-20-2016

I have a pig job that has 6 joins(5 small tables and 1 large table ) in it . The number of Map jobs spawned for the job are 49 and number of reducers is 13 . The Job is running more than 12 hrs . Is there any formula to set the below properties set default_parallel set mapred.max.split.size set mapred.min.split.size set mapred.task.timeout set mapred.task.ping.timeout set mapred.map.child.java.opts -Xmx4096m; set mapred.reduce.child.java.opts -Xmx4096m; set pig.exec.reducers.bytes.per.reducer i got the above leads for making in faster .. However i am not able to calculate the exact figures to do it .

prklearning · ‎08-03-2016

Thanks Michael Young , i am not able to overwrite into a Hive table using HCatstorer from Pig. However i learned that there HCatalog cant overwrite into hive's existing partition Its nice to have pig directly write into hive's existing partition. Is there any patch ,,, or i

prklearning · ‎08-02-2016

Hi I have a requirement where i need to overwrite (or append )the data to existing partition in hive from Pig . However while storing using HCatStore , the job failed with the error . ob commit failed: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:264) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:262) ... 5 more Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition already present with given partition key values : Data already exists in hdfs://sandbox.hortonworks.com:8020/input/externalHiveData/part=1990, duplicate publish not possible. at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:609) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:565) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:928) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:249) ... 10 more CREATE TABLE testing.emp_tab_int( empid string, name string, year int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS textfile load data local inpath '/somepath' overwrite into table testing.emp_tab_int ; CREATE TABLE testing.emp_tab_part_int( empid string, name string, year int) PARTITIONED BY (part int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS textfile INSERT OVERWRITE TABLE testing.emp_tab_part_int PARTITION(part) SELECT empid,name,year,year from testing.emp_tab_int; A = load '/input/incr_dat' USING PigStorage(',') as (empid: chararray, name: chararray ,year : int); B =foreach A generate $0..,$2 as part; Store B into 'testing.emp_tab_part_int' using org.apache.hive.hcatalog.pig.HCatStorer(); // error is thrown here hadoop fs -cat /input/incr_dat em204,ajay,2005 em205,sikha,1990 em206,satya,1991 em207,krishna,1991 em2000,hello am new data,1990 em2001,hello am too new data ,1990 em20080,hello this is new data,2050

prklearning · ‎04-19-2016

Hi All, Am getting this error cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.vertica.hadoop.VerticaOutputFormat not found org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.vertica.hadoop.VerticaOutputFormat not found However , I have added the third party jar into the /usr/hdp/2.3.4.0-***/hadoop/lib and have also used -libjar option while running the command . can anyone let me know where to set the hadoop classpath correctly , and if we are copying the third party jars into the hadoop source folder there are many folder within like hadoop-hdfs,hadoop-mapreduce,hadoop-httpf,hadoop-yarn do we copy the third party jars here too ????

Online	Offline
Last Visited	‎09-08-2017 12:11 PM

Member Since	‎03-16-2016 10:15 AM
Last Visited	‎09-08-2017 12:11 PM
Posts	8
Kudos received	6

Cloudera Community

Re: Fine tune the PIg Job

Fine tune the PIg Job

Re: How to append or overwrite the existin partiti...

How to append or overwrite the existin partition i...

how to add third party jars for a mapreduce progra...