I migrated all the resources that I need from HDFS but now I would like to also migrate the yarn application logs (the ones that we can access in resource manager UI or using yarn logs command).
I tried to use distcp to copy the logs from the old cluster to the new, to the directory that is configurable using yarn-site.xml. However, the migrated logs don't appear in resource manager UI neither using yarn logs -applicationId <app-id>. There is any way to make the logs from the old cluster available in the new cluster and accessible via resource manager or using the yarn logs command?
In this reply, I will focus on how the YARN RM stores data about historical applications, which can be accessed via the RM Web UI.
The RM keeps data about the applications in its state store .
It can be LeveldbRMStateStore, FileSystemRMStateStore or ZKRMStateStore.
We recommend using ZKRMStateStore (this is what we use in YARN HA as well), because it is a more robust implementation. For example, you can migrate in RM HA standby RM while the active RM is still running and keep the state-store intact.
Because the RM Web UI is reading the data from the state-store, it is independent of the presence or lack of YARN Application Logs.
What are your exact migration steps? Do I understand correctly that you upgrade your cluster to CDP or do you need to move services to a new cluster, please?