Created on 08-24-2025 11:15 PM - edited 08-25-2025 12:41 AM
We are trying to optimize cluster provisioning by preloading parcels into our VM images. The goal is to have parcels already available on each host before they are added to the cluster, so we can avoid re-distribution during host onboarding.
Here is what we tested and observed:
We baked parcels into the VM image under /opt/cloudera/parcels/.
When we add a host created from this image into the cluster, Cloudera Manager still triggers parcel distribution.
During this process, CM deletes the preloaded parcels on the new host and redistributes them again.
We have tested the following without success:
Setting Auto Distribution = false
Verifying parcel existence manually
Creating/updating the .distributed_parcels file with the correct versions
Our conclusion so far is that Cloudera Manager does not trust preloaded parcels, but instead enforces consistency against its internal parcel state database, which causes redistribution and deletion even if parcels are already present.
Our question:
Is there any supported way to have parcels preloaded on all hosts (via a baked image or other method) so that Cloudera Manager recognizes them as already distributed?
If not, is the recommended approach instead?
Created 08-25-2025 12:29 AM
Hello @ishashrestha
Thank you for reaching out to the Cloudera community
Are you adding the hosts through the Cloudera Wizard? Can you try adding the host manually to Cloudera Manager which means just install the packages and configure the config.ini
Usually, the parcels will be downloaded based on the configuration in /var/lib/cloudera-scm-agent/active_parcels.json
Created 08-25-2025 11:51 PM
@upadhyayk04 I am adding a host through the API (using Ansible), but even when I add it manually, it starts distributing.
My current /var/lib/cloudera-scm-agent/active_parcels.json contains the required parcel:
{"CDH": "7.1.9-1.cdh7.1.9.p0.44702451"}
Created 08-26-2025 12:18 AM
Thanks, @ishashrestha for the update. What happens if you remove the parcel link from the file I mean, keep is empty ideally, it would pick it up automatically and try the distribute the parcels. The way to skip it would be add the host to Cloudera Manager and not the cluster as adding it to cluster will try to enable the activated parcels
Created 08-26-2025 01:14 AM
Thank you for the response @upadhyayk04 . To clarify, I have a baked image that already includes Cloudera Manager, the agent, and the parcel. The hosts are already added in Cloudera Manager, and after that, they are added to my cluster. However, my requirement is for them to be added as parcel-ready. The problem is that it defeats the purpose when Cloudera Manager starts redistributing the parcel again. I was wondering if there’s a solution to this.
Created 08-29-2025 10:01 AM
@ishashrestha Cloudera Manager tracks parcel state centrally in its DB (AVAILABLE_REMOTELY, DOWNLOADED, DISTRIBUTED, ACTIVATED, etc.).So even if the parcel bits are already present on the host, Agent will redistribute it.Also managing parcels is agents responsibility.It performs a lot of background tasks in parcel lifecycle.
In Distribution phase it compares the .sha file (from the repo) with the .parcel file in the cache.This ensures no corruption or mismatch and then extracts parcel (.parcel is tarball).
In activation phase creates/update symlinks in parcel directory, /etc/alternatives and in /var/lib/alternatives.Also creates service users which is needed service installation.
If any of these don’t line up, CM will re-trigger distribution/activation even if the bits are there.