Hi everyone,
I'm here due to the recent incident from 08/07 Dec. 2021 which caused problems for automation rules.
In my case specifically, it caused one of my rules to not trigger on time. The rule was scheduled to create a task on 08.12.2021 12:00AM (midnight). The task was not created, and when I checked when the next rule execution was to be expected - it was for next week - 15.12.2021 (it's a weekly Wednesday rule). In addition, there was no audit log showing whether the rule ran or failed.
I checked all of my automation rules because of this, panicking that this might in the future cause us to miss tasks. While I was doing that, the problematic rule triggered at 11AM and created the task...
So here's my question - Was this trigger run manually by JIRA as a correction for all the problems caused by the outage, or is it a cron service which existed before this outage and which makes sure that all rules are run after an outage.
Basically, I want to know whether there is a contingency in place to run these failed rules in case another outage like this happens, so I can sleep at night :D
Thanks in advance!
-Nev
Hi Nev,
My experience is that when Automation comes back up, it does indeed catch up on all of the automations that should have fired. So, in effect, they are just delayed but not lost.
Hi John,
Thanks, that sounds great! Can we be certain it will work every time, though?
The reason I'm asking this because one miss of a task creation would be business critical for us, as we rely heavily on automated quarterly/annual/biennial task creation to keep going things like business continuity, backups etc.
Are there any cases where this catching up of Automation had failed before? What's the logic behind it, and is there anything we can do to decrease chances of failure (for example set trigger time something more in business hours and not midnight?)
Appreciate the swift reply!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Same here - they are critical for us. There have been multiple outages over the months and they have always ended up running when things got cleared up.
I can’t speak to that as a guarantee but that’s my experience.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
As far as I know, Automation puts every triggered execution to a queue even if it cannot start it right away. For example, if it can execute max N automations in paralell, then if there is a spike, then all automations above N will be inserted to the queue and wait.
When there is a new "worker" that is available to execute a rule, it picks out an item from queue and executes it. And so on.
This is a standard scalability, resiliency pattern.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Greetings all!
An FYI to what I am reading this thread:
The Atlassian support team told our company (for a prior outage ticket) that "catching up" on scheduled and triggered rules is subject to the severity and specifics of the outage. The expectation is that rules may eventually run (as Aron notes for queued events) or miss a schedule/trigger, and there is no expectation of when those triggers will happen after the outage.
Kind regards,
Bill
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Online forums and learning are now in one easy-to-use experience.
By continuing, you accept the updated Community Terms of Use and acknowledge the Privacy Policy. Your public name, photo, and achievements may be publicly visible and available in search engines.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.