Member since
04-29-2016
10
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5093 | 07-18-2016 03:47 PM |
12-13-2018
04:11 PM
I think I found a way to figure out if there are any lags in job submission and resource allocations, I am not sure if it gives the right figure but it will put me closer to what I am trying to find. Its a 2 part step - 1. To get application attempt ID from application timeserver curl -s -X GET --negotiate <timelineserver:port>/ws/v1/applicationhistory/apps/<app_id> eg. curl -s -X GET --negotiate http://sandbox-hdp.hortonworks.com:8188/ws/v1/applicationhistory/apps/application_1544447997568_0005 {"appId":"application_1544447997568_0005","currentAppAttemptId":"appattempt_1544447997568_0005_000001","user":"root","name":"Spark Pi","queue":"default","type":"SPARK","host":"172.18.0.2","rpcPort":0,"appState":"FINISHED","runningContainers":0,"progress":100.0,"diagnosticsInfo":"","originalTrackingUrl":"sandbox-hdp.hortonworks.com:18081/history/application_1544447997568_0005/1","trackingUrl":"http://sandbox-hdp.hortonworks.com:8088/proxy/application_1544447997568_0005/","finalAppStatus":"SUCCEEDED","submittedTime":1544540416958,"startedTime":1544540416958,"finishedTime":1544541742388,"elapsedTime":1325430,"priority":0,"allocatedCpuVcores":0,"allocatedMemoryMB":0,"unmanagedApplication":false,"appNodeLabelExpression":"<Not set>","amNodeLabelExpression":"<DEFAULT_PARTITION>"} You will get 2 main things from here -
AppAttemptID = appattempt_1544447997568_0005_000001 Job started time = 1544540416958 2. Now I can make 2nd API call to get timelines for containers which were launched with the app attempt ID using the following call curl -s -X GET --negotiate <timelineserver:port>/ws/v1/applicationhistory/apps/<app_id>/appattempts/<appattemptid>/containers Taking above example - curl -s -X GET --negotiate http://sandbox-hdp.hortonworks.com:8188/ws/v1/applicationhistory/apps/application_1544447997568_0005/appattempts/appattempt_1544447997568_0005_000001/containers {"container":[{"containerId":"container_1544447997568_0005_01_000001","allocatedMB":1000,"allocatedVCores":1,"assignedNodeId":"sandbox-hdp.hortonworks.com:45454","priority":0,"startedTime":1544541158902,"finishedTime":1544541742922,"elapsedTime":584020,"diagnosticsInfo":"","logUrl":"http://sandbox-hdp.hortonworks.com:8188/applicationhistory/logs/sandbox-hdp.hortonworks.com:45454/container_1544447997568_0005_01_000001/container_1544447997568_0005_01_000001/root","containerExitStatus":0,"containerState":"COMPLETE","nodeHttpAddress":"http://sandbox-hdp.hortonworks.com:8042","nodeId":"sandbox-hdp.hortonworks.com:45454"}]} I can see the container startedTime using this call which is 1544541158902 I can use this to deduce Wait time between when the job started i.e. 1544540416958 and when the job's first container launched i.e. 1544541158902 Difference between these 2 timestamps i.e. 741944 ms i.e. 741 seconds i.e. 12.30 mins roughly Upon checking yarn application logs of the same application ID, I can deduce the timelines roughly matches -
18/12/11 15:00:19 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED) 18/12/11 15:00:20 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED) 18/12/11 15:00:21 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED) 18/12/11 15:00:22 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED) 18/12/11 15:00:23 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED)
....................................
18/12/11 15:13:12 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> sandbox-hdp.hortonworks.com, PROXY_URI_BASES -> http://sandbox-hdp.hortonworks.com:8088/proxy/application_1544447997
568_0005), /proxy/application_1544447997568_0005 18/12/11 15:13:12 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 18/12/11 15:13:13 INFO Client: Application report for application_1544447997568_0005 (state: ACCEPTED) 18/12/11 15:13:14 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
18/12/11 15:13:14 INFO Client: Application report for application_1544447997568_0005 (state: RUNNING)
Hope I am on right path, and it might help someone who is looking for same.
... View more
12-13-2018
02:40 PM
I even tried yarn application timeserver and try to get something. But Not sure why its not showing the correct details - curl -s -X GET --negotiate http://sandbox-hdp.hortonworks.com:8188/ws/v1/applicationhistory/apps/application_1544447997568_0003
{
"appId": "application_1544447997568_0003",
"currentAppAttemptId": "appattempt_1544447997568_0003_-000001",
"user": "root",
"name": "Spark Pi",
"queue": "default",
"type": "SPARK",
"host": "N/A",
"rpcPort": -1,
"appState": "RUNNING",
"runningContainers": 0,
"progress": 0.0,
"diagnosticsInfo": "",
"originalTrackingUrl": "N/A",
"trackingUrl": "N/A",
"finalAppStatus": "UNDEFINED",
"submittedTime": 1544539343612,
"startedTime": 1544539343612,
"finishedTime": 0,
"elapsedTime": 388755,
"priority": 0,
"allocatedCpuVcores": 0,
"allocatedMemoryMB": 0,
"unmanagedApplication": false,
"appNodeLabelExpression": "<Not set>",
"amNodeLabelExpression": "<DEFAULT_PARTITION>"
.......
When I grepped the yarn logs for the same app I can see the job waited for a while before it was in Running state. I can see there is lag of at least 10 seconds before the state changed from Accepted to Running.
... View more
12-13-2018
02:13 PM
I tried the above link and it leads me to mapreduce history server. Based on the documentation, I don't think I would be able to track yarn applications through it. It is looking for jobIDs not application IDs. It looks more like legacy mapreduce endpoint which would work only for mapreduce applications. This is the link I am refering to here. Am I missing something?
... View more
12-13-2018
12:16 PM
Hi I am trying to find job in wait time ie. when they got accepted and the time they are waiting to get any resources for execution. I know that if we go to yarn application logs, we can get this info. But I was wondering if the same can be answered through any yarn metrics available using yarn rest apis? Any pointers will be helpful.
... View more
Labels:
- Labels:
-
Apache YARN
07-19-2016
11:16 AM
Can you confirm if your mysql query is not resulting duplicates i.e. "select * from emp_add where city='sec-bad'"
... View more
07-18-2016
03:47 PM
Solved the mystery. The issue was the "producer.close()" was missing. Main was exiting before kafka finished writing. Strange that this is not required in older api. Anyways for referenece from javadocs - The producer consists of a pool of buffer space that holds records that haven't yet been transmitted to the server* as well as a background I/O thread that is responsible for turning these records into requests and transmitting them* to the cluster. Failure to close the producer after use will leak these resources.
... View more
07-18-2016
12:45 PM
Hi Ankit I tried the settings you gave, not working. The only way I can get consistency is when I am using old API or making my calls synchronous i.e. producer.send(record).get() Not sure what is happening why it works for one and doesn't work for others. I am using kafka 0.9.0.1 . Hope that should be stable.
... View more
07-14-2016
12:19 PM
Thanks Ankit for the reply. But I have mentioned that in my original post that the behavior is same using callback a well. Also this is random. If I run my code like 10 times in a row, I see this happening once or twice randomly using new Producer API. The same thing if I do with the old API, it runs fine. Not sure what is the issue, I am not finding anything weird in the logs as well.
... View more
07-11-2016
10:16 AM
1 Kudo
Hi I am trying to use new kafka Producer (org.apache.kafka.client.producer.KafkaProducer) on kafka 0.9.0.2.4 on HDP 2.4.0.0-169 sandbox to create a very basic kafka producer/consumer scenario where I am just sending some data on my kafka topic with 1 partition and 1 replication and read it from the kafka-console-consumer. However when I am trying to get data in consumer, I am seeing some data getting lost. Its random, not happening every time and the data lost % is also random. I have done the same thing with the old api (kafka.javapi.producer.Prouder) which works fine on the same sandbox every time. I have tested the same behavior on docker and its giving me similar results. Sometimes the consumer is getting all the records and sometimes it doesn't. Its not consistent. I am not sure if anyone else faced the similar situation. It would be nice to know if I am missing something. PS - I have tried using callback also, its the same behavior. The consumer I am running is "/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test_input --from-beginning" Here is my code which I am testing - import java.util.Properties
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}
object AsyncKafkaProducerWithoutCallBack {
def main(args: Array[String]) = {
val properties = new Properties()
properties.put("bootstrap.servers", "sandbox:6667")
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
properties.put("acks", "all")
// how many times to retry when produce request fails?
properties.put("retries", "3")
properties.put("linger.ms", "5")
val producer = new KafkaProducer[String, String](properties)
val startTime = System.currentTimeMillis()
for (i <- 1 to 2000) {
val kafkaRecord = new ProducerRecord[String, String]("test_input", s"$i")
val start = System.currentTimeMillis()
println(producer.send(kafkaRecord))
println(s"Time taken for - $i -- " + (System.currentTimeMillis() - start) + "ms")
}
println("Total Time taken -- " + (System.currentTimeMillis() - startTime))
}
}
... View more
Labels:
- Labels:
-
Apache Kafka
04-29-2016
02:10 PM
To add to Joe's response, one more thing to note here is that foreach() action doesn't return anything. So if you want to invoke any transformations on your RDDs on executors without returning anything back to driver, you can invoke them using foreach() action which is not normally possible with other actions apart from saveAsxxx() actions. -Prav
... View more