Created 04-27-2016 06:49 PM
Hi:
for ETL which is the best tecnologie NOT Streaming to make a script, actually iam using Pig but i dont know if for batching is ok.
please any suggestion??
IN the future ill do it with spark.
Thanks
Created 04-28-2016 10:07 AM
Hi:
I have resolve the problem, i have remove some jars from my script:
--register /usr/lib/piggybank/google-collections-1.0.jar;
register /usr/lib/piggybank/piggybank.jar; register /usr/lib/piggybank/elephant-bird-core-4.1.jar; register /usr/lib/piggybank/elephant-bird-pig-4.1.jar; register /usr/lib/piggybank/elephant-bird-hadoop-compat-4.1.jar; register /usr/lib/piggybank/json-simple.jar; register /usr/lib/piggybank/libfb303.jar; SET mapreduce.input.fileinputformat.split.minsize 107520; SET mapreduce.input.fileinputformat.split.maxsize 276480; SET default_parallel 5; SET exectype tez;
Created 04-27-2016 08:47 PM
@Roberto Sancho Pig is a good tool to use for ETL and data warehouse type of processing on your data. It provides an abstraction layer for the underlying processing engine (MR or Tez). You can use Tez as the execution engine to speed up processing. This Pig Tutorial has additional information.
Created 04-27-2016 08:49 PM
Hi @Roberto Sancho,
You can use Hive or Pig for doing ETL. In HDP, Hive and Pig run on Tez and not on MapReduce. This gives you a much better performance.
You can use Spark too as you stated.
Created 04-28-2016 08:34 AM
Hi:
I execute pig with hive and y receive this error, please any one could help me??
Container exited with a non-zero exit code 255 ]]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:2 killedTasks:62, Vertex vertex_1461739406783_0064_1_00 [scope-67] killed/failed due to:OWN_TASK_FAILURE] Vertex killed, vertexName=scope-94, vertexId=vertex_1461739406783_0064_1_04, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1461739406783_0064_1_04 [scope-94] killed/failed due to:OTHER_VERTEX_FAILURE] Vertex killed, vertexName=scope-92, vertexId=vertex_1461739406783_0064_1_03, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1461739406783_0064_1_03 [scope-92] killed/failed due to:OTHER_VERTEX_FAILURE] Vertex killed, vertexName=scope-82, vertexId=vertex_1461739406783_0064_1_02, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1461739406783_0064_1_02 [scope-82] killed/failed due to:OTHER_VERTEX_FAILURE] Vertex killed, vertexName=scope-73, vertexId=vertex_1461739406783_0064_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:45, Vertex vertex_1461739406783_0064_1_01 [scope-73] killed/failed due to:OTHER_VERTEX_FAILURE] DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:4 at org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:193) at org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:198) at org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:195) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Created 04-28-2016 08:54 AM
Can you share more info on the query submitted, possibly share application logs? What is the version of hive?
Created 04-28-2016 09:16 AM
Hi:
I am using a pig script with this parameter
SET exectype tez;
Created 04-28-2016 09:26 AM
Hi:
The error on the log
2016-04-28 11:18:22,146 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - The url to track the Tez Session: http://lnxbig05.cajarural.gcr:8088/proxy/application_1461739406783_0071/ 2016-04-28 11:18:26,770 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Submitting DAG PigLatin:mario.pig-0_scope-0 2016-04-28 11:18:26,770 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Submitting dag to TezSession, sessionName=PigLatin:mario.pig, applicationId=application_1461739406783_0071, dagName=PigLatin:mario.pig-0_scope-0, callerContext={ context=PIG, callerType=PIG_SCRIPT_ID, callerId=PIG-mario.pig-b0f06568-7cba-4e19-aab2-128bc7afb536 } 2016-04-28 11:18:26,778 [PigTezLauncher-0] INFO org.apache.tez.dag.api.DAG - Inferring parallelism for vertex: scope-92 to be 5 from 1-1 connection with vertex scope-73 2016-04-28 11:18:27,618 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Submitted dag to TezSession, sessionName=PigLatin:mario.pig, applicationId=application_1461739406783_0071, dagName=PigLatin:mario.pig-0_scope-0 2016-04-28 11:18:27,772 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://lnxbig06.cajarural.gcr:8188/ws/v1/timeline/ 2016-04-28 11:18:27,772 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at lnxbig05.cajarural.gcr/10.1.246.19:8050 2016-04-28 11:18:27,782 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Submitted DAG PigLatin:mario.pig-0_scope-0. Application id: application_1461739406783_0071 2016-04-28 11:18:28,007 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - HadoopJobId: job_1461739406783_0071 2016-04-28 11:18:28,783 [Timer-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=RUNNING, progress=TotalTasks: 102 Succeeded: 0 Running: 0 Failed: 0 Killed: 0, diagnostics=, counters=null og Type: syslog Log Upload Time: Thu Apr 28 11:24:42 +0200 2016 Log Length: 1552 2016-04-28 11:24:41,246 [ERROR] [main] |app.DAGAppMaster|: Error starting DAGAppMaster java.lang.NoSuchMethodError: com.google.common.collect.MapMaker.keyEquivalence(Lcom/google/common/base/Equivalence;)Lcom/google/common/collect/MapMaker; at com.google.common.collect.Interners$WeakInterner.<init>(Interners.java:68) at com.google.common.collect.Interners$WeakInterner.<init>(Interners.java:66) at com.google.common.collect.Interners.newWeakInterner(Interners.java:63) at org.apache.hadoop.util.StringInterner.<clinit>(StringInterner.java:49) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2600) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405) at org.apache.hadoop.conf.Configuration.get(Configuration.java:1232) at org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider.getRecordFactory(RecordFactoryProvider.java:49) at org.apache.hadoop.yarn.util.Records.<clinit>(Records.java:32) at org.apache.hadoop.yarn.api.records.ApplicationId.newInstance(ApplicationId.java:49) at org.apache.hadoop.yarn.api.records.ContainerId.toApplicationAttemptId(ContainerId.java:249) at org.apache.hadoop.yarn.api.records.ContainerId.toApplicationAttemptId(ContainerId.java:244) at org.apache.hadoop.yarn.api.records.ContainerId.fromString(ContainerId.java:223) at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:179) at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:2040)
Created 04-28-2016 10:07 AM
Hi:
I have resolve the problem, i have remove some jars from my script:
--register /usr/lib/piggybank/google-collections-1.0.jar;
register /usr/lib/piggybank/piggybank.jar; register /usr/lib/piggybank/elephant-bird-core-4.1.jar; register /usr/lib/piggybank/elephant-bird-pig-4.1.jar; register /usr/lib/piggybank/elephant-bird-hadoop-compat-4.1.jar; register /usr/lib/piggybank/json-simple.jar; register /usr/lib/piggybank/libfb303.jar; SET mapreduce.input.fileinputformat.split.minsize 107520; SET mapreduce.input.fileinputformat.split.maxsize 276480; SET default_parallel 5; SET exectype tez;
Created 04-29-2016 05:27 AM
Hi:
finally it works with this:
register /usr/lib/piggybank/piggybank.jar; register /usr/lib/piggybank/elephant-bird-core-4.1.jar; register /usr/lib/piggybank/elephant-bird-pig-4.1.jar; register /usr/lib/piggybank/elephant-bird-hadoop-compat-4.1.jar; register /usr/lib/piggybank/json-simple.jar; register /usr/lib/piggybank/libfb303.jar; SET mapreduce.input.fileinputformat.split.minsize 107520; SET mapreduce.input.fileinputformat.split.maxsize 276480; SET exectype tez;