Member since
08-05-2016
25
Posts
5
Kudos Received
0
Solutions
04-20-2017
05:57 PM
Hello all, I think I found an error in the documentation of oozie. This line https://github.com/apache/oozie/blob/7c404ad0ea4c61e90e8c86015de25ef196168c29/docs/src/site/twiki/AG_Install.twiki#L893 4. A Loadbalancer, Virtual IP, or Round-Robin DNS.This is used to provide a single entry-point for users and for callbacks from the JobTracker/ResourceManager. Is it not an error? actually it is the ApplicationMaster that makes the callback right? I hope you can help me guys, Thank you very much
... View more
Labels:
04-20-2017
09:39 AM
So the question is made. I have 2 servers master, each with all the services installed. Resource Manager, yarn, HDFS, oozie etc. All in HA My question is, it is a right architecture ? Most at all because the resourcemanager contact oozie server by using the virtual IP of our oozies servers ? Then, highly probably he will redirect the comunication on himself. It is smth wonrg on this architecture? Do we have to have servers only for oozie servers? it is an error to mix all the services on the same server ?
... View more
Labels:
02-10-2017
10:12 AM
@Kuldeep Kulkarni thx for answe me
But it isn't weird that my oozie server number 1 use the load balancer to run a job, and then the load balancer gives back the ip of the oozie server number 1 ( assuming the the oozie server number 2 is down or saturated)? Is it the load balancer or VIP made to give a single point of access to the client so he doesn't need to check which server is alive ? I would like to know what are the disadvantage of the conf, like for example, I have the risk to have 2 jobs scheduled running on same time?
... View more
02-10-2017
09:59 AM
@Venkata Sudheer Kumar M it seems so, yes
... View more
02-10-2017
09:46 AM
1 Kudo
Hello all, I have two kind of jobs. Jobs that runs periodically every night and jobs that come on demand from the client. I implemented the HA for oozie so I have my VIP. I have two machines running one oozie server in each My question is: Does it have sense if I configure OOZIE_URL for the scheduled jobs for each master as localhost:11000/oozie and use the VIP load-balancer:11000/oozie only for the jobs that come from the client on demand ?
... View more
Labels:
02-09-2017
02:15 PM
@Venkata Sudheer Kumar M Hi, acctually no, this is the log of my resource manager. I run a job by oozie
... View more
02-09-2017
01:55 PM
Hi, When I tried to execute a job by oozie I got this error Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1] After take a look to the yarn log I got this permission problem 2017-02-09 13:14:40,746 WARN resourcemanager.RMAuditLogger (RMAuditLogger.java:logFailure(285)) - USER=scf IP=x.x.x.x OPERATION=getServiceState TARGET=AdminService RESULT=FAILURE DESCRIPTION=Unauthorized user PERMISSIONS=
2017-02-09 13:14:40,747 INFO ipc.Server (Server.java:run(2158)) - IPC Server handler 0 on 8033, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from x.x.x.x:37178 Call#0 Retry#0
org.apache.hadoop.security.AccessControlException: User pns doesn't have permission to call 'getServiceState'
at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.verifyAdminAccess(RMServerUtils.java:191)
at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.verifyAdminAccess(RMServerUtils.java:157)
at org.apache.hadoop.yarn.server.resourcemanager.AdminService.checkAccess(AdminService.java:229)
at org.apache.hadoop.yarn.server.resourcemanager.AdminService.getServiceStatus(AdminService.java:350)
at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.getServiceStatus(HAServiceProtocolServerSideTranslatorPB.java:131)
at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4464)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2133)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2131) Can anyone be so nice to tell me how to fix the permission for the user scf ? I
... View more
Labels:
02-07-2017
01:55 PM
Thx @Predrag Minovic one last question, the variable OOZIE_BASE_URL set in oozie-site or oozie-env is global for all the jobs scheduled or each job must define its own OOZIE_BASE_URL?
... View more
02-07-2017
09:18 AM
Hi @Predrag Minovic , thank you for the answer. Can you please tell me if there is something wrong not changing the oozie_base_url for the scheduled jobs, for example leaving it for localhost:11000/oozie ? I think that for the case of scheduled jobs, in terms of HA, if I use or not the load balancer as oozie_base_url nothing will change. This is because having two oozie servers sharing the same info, one of them will execute the job. However the load balancer starts to get relevance when we have a jobs on demand. I am on the right track? Thx @Kuldeep Kulkarni for the very relevant info. I am planing to use kerberos
... View more
02-06-2017
01:25 PM
Hi, I have some jobs that must run every night, these jobs are scheduled in oozie. By the moment I make oozie ha, the oozie servers will share these scheduling, My quetion is, Will these jobs scheduled be executed twice ? ( I suppose this is a no, but why?) And should I change the value of the variable oozie_base_url for these jobs to localhost or to my load balancer adress ?
... View more
Labels:
02-06-2017
01:05 PM
@Laurent Edel what about the jobs are planified to be run? if every oozie server know that there are jobs that must be executed, how to decide which execute the job? If it is done by locking using zookeeper, it does the job submissionpasses trhoughtout the load balancer /dns round robin / VIP?
... View more
01-31-2017
03:46 PM
Thank you, acctually I was missing the package oozie-2-3-2-0-2950-server. I paste the procedure apt-get install oozie-2-3-2-0-2950-server and then hadoop fs -put /usr/hdp/current/oozie-server/libtools/oozie-tools-4.2.0.2.3.2.0-2950.jar /user/oozie/share/lib/lib_xxxxxxx/oozie/
... View more
01-31-2017
10:15 AM
Hello guys I am following the documentation to make oozie HA http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Ambari_Users_Guide/content/_adding_an_oozie_server_component.html And after installing and changing the conf, when I restart oozie I got this error Error: Could not find or load main class org.apache.oozie.tools.OozieDBCLI
Do I forget something?
... View more
Labels:
10-11-2016
08:10 AM
Yes, It was me who created the ticket.
... View more
09-29-2016
09:21 AM
@zyang @Constantin Stanca I created the ticket https://issues.apache.org/jira/browse/TEZ-3451 Thank you
... View more
09-23-2016
02:58 PM
2 Kudos
Hello, I have a table in cassandra, and I use the driver hive-cassandra to do selects over it. This is the table CREATE TABLE table1 (
campaign_id text,
sid text,
name text,
ts timestamp,
PRIMARY KEY (campaign_id, sid)
) WITH CLUSTERING ORDER BY (sid ASC) And I have only 3 partitions At the moment to query my table using hive like that hive -e "select count(*) from table1;" I got this error Status: Failed
Vertex failed, vertexName=Map 1,
vertexId=vertex_1474275943985_0179_1_00, diagnostics=[Task failed,
taskId=task_1474275943985_0179_1_00_000001, diagnostics=[TaskAttempt 0
failed, info=[Error: Failure while running
task:java.lang.RuntimeException:
org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416
actual length: 9223372036854775711
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 12416 actual length: 9223372036854775711
at org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
at org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
... 14 more
So far I understand that in readfields we are getting more data that we are expecting. But considering the size of the table, I dont think the data is a problem. @Constantin Stanca has helped me trying to find the problem, I am re lauching the subjet 🙂 Another thing to add is that if I do select * it works perfectly fine with tez 🙂 . Using the engine mp, select count(*) and select * works fine also. We are using hortonworks version 2.3.2
... View more
Labels:
09-07-2016
01:33 PM
@Constantin Stanca I tried this query using TEZ. hive -e "select max(length(usrlvl)) from pns_fr_bench.core;" And I got the same error but this time with a negative number. Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 7996349 actual length: -128 The column usrlvl has a type of int in Cassandra. For the rest columns I got the same error but with a positive size but for integers. Quite interesting that. Again if I use the engine = mr I got the result without errors. If I run the same query using mr hive -e "select max(length(usrlvl)) from pns_fr_bench.core;" -hiveconf hive.execution.engine=mr I got this Total MapReduce CPU Time Spent: 0 days 6 hours 27 minutes 19 seconds 700 msecOK2 Then this field is not that big. I tried another field that data type is text in cassandra using mr and I got the results of 12. But for tez we are having the same error.
... View more
09-06-2016
09:36 AM
@Constantin Stanca Were you able to run any query against that table, e.g. SELECT anything from TableName LIMIT 1? Yes It works doing hive> select id from pns_fr_bench.core limit 1;
OKID-SPP-100-6qN1vlZ4cMaobNIkrscKaB2lBiDkCYWmSqewVNe7PZA
fetched: 1 row(s)hive> I expected to see TABLEPROPERTIES and SERDEPROPERTIES in the table definition. Sorry 🙂 serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.cassandra.serde.cql.CqlSerDe, parameters:{cassandra.cf.name=core, cassandra.columns.mapping=ise,birthd,civility,creatd,dispname,email,fname,haddr,hcity,hcountry,hfax,hphone,hzip,ia,iainst,interest,lname,marketpref,modd,mphone,mrgdate,mrgstat,mrgusrt,msisdn,mtac,ndip,ndrtc,oaddr,ocity,ocountry,ofax,oidval,om,ophone,ozip,poid,preflang,rollbend,usrlvl, serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{cassandra.ks.password=JV<w7.JzNF3i, cassandra.cf.name=core, EXTERNAL=TRUE, transient_lastDdlTime=1448372689, cassandra.ks.username=run_all_read_bench, storage_handler=org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler, cassandra.ks.name=pns_fr_bench}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE) # Detailed Table Information
Database: pns_fr_bench
Owner: pns
CreateTime: Tue Nov 24 14:44:49 CET 2015
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://master01.net:8020/app/data/pns_fr_bench.db/core
Table Type: EXTERNAL_TABLE
Table Parameters:
EXTERNAL TRUE
cassandra.cf.name core
cassandra.ks.name pns_fr_bench
cassandra.ks.username read_user
storage_handler org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler
transient_lastDdlTime 1448372689
... View more
09-05-2016
01:04 PM
Hello @Constantin Stanca So we use a driver cassandra-hive. this is the table cassandra. This is the table. CREATE TABLE pns_fr_bench.core (
ise text PRIMARY KEY,
birthd text,
civility text,
creatd timestamp
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.SnappyCompressor'}
AND dclocal_read_repair_chance = 0.05
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.01
AND speculative_retry = '99.0PERCENTILE'; Is it partitioned, bucketed, ORC or text SerDe? It is a table cassandra, with a replication factor of 1 What was your expectation of count? 38 millions. We counted it using mr. Not tez. Could you recreate a copy of the table and store it as ORC, then execute count? I will try and I will come back to you. The hive.fetch.task.conversion.threshold is set to 1073741824. If this is a bug; there is a ticket jira in anypart that describe the error?
... View more
09-01-2016
02:55 PM
When we talk about input lenght, we are talking about bytes then no?
... View more
09-01-2016
02:53 PM
Hi @Constantin Stanca; thhx for the response. I got some time off and now I am back on this issue. The hive.fetch.task.conversion.threshold is set to 1073741824. Acctually we are reading from cassandra using hive. Then what we have as a table is a cassandra table. We are using HD 2.3.2. So what I can understand after your comments is that we are giving a huge amount of data (actual lenght) and tez cannot handle? It may be an configuration error? For the moment we are using mr, which run without problems but we want to use TEZ. The result of this count using MR is about 38 millions
... View more
09-01-2016
01:47 PM
1 Kudo
Hello, I am trying to understand these to attributes and how the work. Please someone that tell me if I am wrong but we must set mapred.min.split.sizein our convenience if we are using HDFS files But if we are reading from hive to cassandra we should set cassandra.input.split.size instead? To give a little of context, we have a cluster cassandra and we do our queries using hive to cassandra. We are experimenting some OOM problems with java heap and we think we must modify one or both of these attributes. Thank you
... View more
Labels:
08-10-2016
09:37 AM
1 Kudo
Hello I got this error when I tried to start a select count(*) in Hive. Vertex failed, vertexName=Map 1, vertexId=vertex_1468250226607_1351_1_00, diagnostics=[Task failed, taskId=task_1468250226607_1351_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: org.apache.tez.dag.api.TezUncheckedException: Expected length: 8417166 actual length: 9223372036854775675
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tez.dag.api.TezUncheckedException: Expected length: 8417166 actual length: 9223372036854775675
at org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:128)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:177)
at org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:136)
at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:643)
at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:621)
at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:145)
at org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:109)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:390)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:128)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:147)
... 14 more I dont understand at all what tez means by Expected length and actual length. We use hive 1.2.1, yarn 2.7.1, hadoop 2.7.1.2.3.2.0-2950
... View more
Labels:
08-05-2016
09:25 AM
I want to know if the yarn is disatching all the jobs in differents machines, there is a way to know which machines are working as application master? in any moment
... View more
- Tags:
- Hadoop Core
- YARN
Labels:
08-05-2016
09:21 AM
I got this error on my nodemanager
FATAL yarn.YarnUncaughtExceptionHandler
(YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread
Thread[timeline,5,main] threw an Error. Shutting down now... java.lang.OutOfMemoryError: GC
overhead limit exceeded at org.apache.hadoop.metrics2.sink.timeline.cache.TimelineMetricsCache$TimelineMetricHolder.put(TimelineMetricsCache.java:118)
at
org.apache.hadoop.metrics2.sink.timeline.cache.TimelineMetricsCache.putTimelineMetric(TimelineMetricsCache.java:154)
at org.apache.hadoop.metrics2.sink.timeline.cache.TimelineMetricsCache.putTimelineMetric(TimelineMetricsCache.java:177)
at
org.apache.hadoop.metrics2.sink.timeline.HadoopTimelineMetricsSink.putMetrics(HadoopTimelineMetricsSink.java:193)
at org.apache.hadoop.metrics2.impl.MetricsSinkAdapter.consume(MetricsSinkAdapter.java:186)
at
org.apache.hadoop.metrics2.impl.MetricsSinkAdapter.consume(MetricsSinkAdapter.java:43)
at
org.apache.hadoop.metrics2.impl.SinkQueue.consumeAll(SinkQueue.java:87)
at
org.apache.hadoop.metrics2.impl.MetricsSinkAdapter.publishMetricsFromQueue(MetricsSinkAdapter.java:134) at
org.apache.hadoop.metrics2.impl.MetricsSinkAdapter$1.run(MetricsSinkAdapter.java:88) Is it related to this ticket? https://issues.apache.org/jira/browse/AMBARI-15100 Which is worse, all my cluster got KO becasue of this nodemanager got ko. Is it normal? it seems all the jobs went through this node.
... View more
Labels: