Occasionally an error may occur during the creation of a load balanced subscription. The overall CreateUpdateDistSubCmd (not exact name) failed at some point leaving the DistSub record in the DB with at least some of the Collectors attached to it. Then the daily MaintainDistSubCmd kicked off. This command compares the DB to what it finds on each assigned collector and it also checks groups and members and repairs anything that isn't match correctly.
All DistSub commands execute on the master collector for the domain (This is the first collector in the domain unless the role was moved. Please don’t confuse this with the manager). It submits commands to other collectors and waits for them to complete. It sees if one of those subcommands fails or doesn’t complete in a reasonable time. If this happens it fails the DistSub command.
The MaintainDistSubCmd runs at least every day. It compares the DistSub records and assigned subscriptions and groups in AD. It’s possible that something that didn’t get done earlier in the original CreateUpdateDistSubCmd could be completed in MaintainDistSubCmd at 1AM. For example, maybe the superset group of all forwarders for the DistSub had just been created in AD and not replicated to the DC used by the master collector due to a reboot or other issue. Or the member additions to the group had not replicated but the group was there. In the latter case the DistSub would be created but no forwarders assigned to each subscription’s group. Later when MaintainDistSubCmd runs and replication had occurred, the members would be there in AD and be distributed.
If you are having issue with a Load Balanced Subscription due to the DistSub command failing here are some common things to look for:
- One of the collectors is down
- AD OU permission or OU or replication problems
Here are some steps to help:
- From the master logs, please determine: Did the CreateUpdateDistSubCmd (not exact name)
- Complete successfully?
- Hang? (unlikely) but log would show neither a or b above
- If it failed – what reason?
- An exception within the logic of the CreateUpdateDistSubCmd itself, or
- A failure of a command it submitted to another collector?
- Timeout waiting on commands to be executed by another collector?
If 1.b or 1.c, get the logs from the other collectors associated with the Distributed Subscription. Should be a failed CreateUpdateSubCmd in one of them. What’s the error?
When CreateUpdateDistSubCmd fails it does not reverse out what it accomplished so far (for good reason). MaintainDistSubCmd will fix whatever didn’t complete – except one type of chance: the addition or deletion of a collector. Examples of what this means:
- You issue a CreateUpdateDistSub to create a NEW sub with 3 collectors assigned a, b ac. Everything goes well except one collector (c) is down or the local CreateUpdateSubscription command fails for whatever reason. What do you end up with? The command will fail but you will have a DistributedSubscription with 2 collectors assigned. Forwarders distributed between collectors a and b. To solve: edit the dist sub and add collector C
- You issue a CreateUpdateDistSub to remove or add one collector. But that collector is down or the command fails for whatever reason. No change to the DistSub is made. Just resubmit the command.
Because of the distributed nature of both AD and load balanced subs, failures like this can occur. But Supercharger is designed so that nothing becomes corrupted or inconsistent. The MaintainDistSubCmd command will always ensure that actual WEC subscriptions and the DB match. You can add or remove collectors that failed to be changed earlier.
Optionally provide private feedback to help us improve this article...
Thank you for your feedback!