Member since
03-16-2016
8
Posts
6
Kudos Received
0
Solutions
09-23-2016
06:08 AM
thank you however ,Is there any way to calculate the appropriate number of reducers for a particular operation. I observed that increasing the number of reduces might also bring down the performance . in some cases .
... View more
09-20-2016
10:57 AM
3 Kudos
I have a pig job that has 6 joins(5 small tables and 1 large table ) in it . The number of Map jobs spawned for the job are 49 and number of reducers is 13 . The Job is running more than 12 hrs . Is there any formula to set the below properties set default_parallel
set mapred.max.split.size
set mapred.min.split.size
set mapred.task.timeout
set mapred.task.ping.timeout
set mapred.map.child.java.opts -Xmx4096m;
set mapred.reduce.child.java.opts -Xmx4096m;
set pig.exec.reducers.bytes.per.reducer i got the above leads for making in faster .. However i am not able to calculate the exact figures to do it .
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Pig
08-03-2016
07:13 AM
Thanks Michael Young , i am not able to overwrite into a Hive table using HCatstorer from Pig. However i learned that there HCatalog cant overwrite into hive's existing partition Its nice to have pig directly write into hive's existing partition. Is there any patch ,,, or i
... View more
08-02-2016
05:56 PM
Hi I have a requirement where i need to overwrite (or append )the data to existing partition in hive from Pig . However while storing using HCatStore , the job failed with the error . ob commit failed: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:264)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:262)
... 5 more
Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition already present with given partition key values : Data already exists in hdfs://sandbox.hortonworks.com:8020/input/externalHiveData/part=1990, duplicate publish not possible.
at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:609)
at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:565)
at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:928)
at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:249)
... 10 more CREATE TABLE testing.emp_tab_int(
empid string,
name string,
year int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS textfile load data local inpath '/somepath' overwrite into table testing.emp_tab_int ; CREATE TABLE testing.emp_tab_part_int(
empid string,
name string,
year int)
PARTITIONED BY (part int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS textfile INSERT OVERWRITE TABLE testing.emp_tab_part_int PARTITION(part) SELECT empid,name,year,year from testing.emp_tab_int;
A = load '/input/incr_dat' USING PigStorage(',') as (empid: chararray, name: chararray ,year : int); B =foreach A generate $0..,$2 as part; Store B into 'testing.emp_tab_part_int' using org.apache.hive.hcatalog.pig.HCatStorer(); // error is thrown here hadoop fs -cat /input/incr_dat em204,ajay,2005 em205,sikha,1990
em206,satya,1991 em207,krishna,1991 em2000,hello am new data,1990 em2001,hello am too new data ,1990
em20080,hello this is new data,2050
... View more
Labels:
04-19-2016
08:14 AM
1 Kudo
Hi All, Am getting this error cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.vertica.hadoop.VerticaOutputFormat not found
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.vertica.hadoop.VerticaOutputFormat not found
However , I have added the third party jar into the /usr/hdp/2.3.4.0-***/hadoop/lib and have also used -libjar option while running the command . can anyone let me know where to set the hadoop classpath correctly , and if we are copying the third party jars into the hadoop source folder there are many folder within like hadoop-hdfs,hadoop-mapreduce,hadoop-httpf,hadoop-yarn do we copy the third party jars here too ????
... View more
Labels:
- Labels:
-
Apache Hadoop