Skip to content

SLA in SCSM 2012. Part 2. How it’s works

SLA in SCSM 2012. Part 2. How it’s works published on 1 Comment on SLA in SCSM 2012. Part 2. How it’s works

imageIn Part 1 of this series I described objects model required for SLA management system in SCSM 2012. Now it’s time to figured out how all this things works.

As already mention in Part 1 the workflow is a heart of the SLA calculation process. Those workflows created as part of Service Level Objectives. To avoid switching between Part 1 and current article let’s remember them (note: I will use term Rule because all workflow defined as Rules inside of management pack):

    1. Two rules where source is a event of added and removed relationship (one rule for each event) between group 1 and his member. Display name generated by the mask:
      %SLO_NAME%-AddRelationship
      and
      %SLO_NAME%-DeleteRelationship.
    2. Two rules (or workflows in terms of the SCSM console), where source is event of changed the object of the SLO’s target type with filter for first rule:
      “%DatePropertyStart% was equal null and now not equal null”,
      and for second rule
      “%DatePropertyEnd% was equal null and now not equal null”,
      where “%DatePropertyStart% and %DatePropertyEnd% is start and end datetime property from select metric. Display name generated by the mask:
      %SLO_NAME%-StartEvent
      and
      %SLO_NAME%-EndEvent.
    3. Two rules (or workflows in terms of the SCSM console), where source is event of changed the object of the SLO’s target type without any filters and disabled by default (Enabled=”false”). Display name generated by the mask:
      %ИМЯ_SLO%-PauseEvent
      and
      %ИМЯ_SLO%-ResumeEvent.

SLO demo configuration

I will use next SLO demo configuration as example:

  1. Queue: “All high urgency incidents”. Target class – Incident, criteria “Urgency = High”:
    image
  2. Calendar: “Standard Calendar” with settings below:
    image
  3. Metric “Incidents – time of first assignment”, with calculation from Created Date to First assigned date:
    image
  4. Service Level Objective “Test SLO”, with all objects from above with target time 48 hours and warning threshold 2 hours:
    image

SLA calculation

The main part of SLA calculation are two rules: “%SLO_NAME%-AddRelationship” and “%SLO_NAME%-DeleteRelationship”. First of all let’s see when they run. From criteria you see what they run when object added\deleted to SLO group. But when exactly the object added or deleted from a SLO group? The SLO group (display name with mask “SLA Group: %SLO_NAME%”) discovered dynamically and it membership rule looks like “all objects of target class created later than service level objective object AND contained in specified queue”. So here is sequence of events:

  1. Object of target class created (or updated) and this object matched to the queue criteria.
  2. After the queue was recalculated (by default the interval is 60 seconds) the object added to queue and event AddRelationhsip was fired
  3. The next step is recalculation of the “SLA Group: %ИМЯ_SLO%” group. Because the object already contained in given queue and created time of the SLO is early than created time of our object the object will be added to “SLA Group: %SLO_NAME%” group and event AddRelationhsip  will fired. The “SLO_NAME%-AddRelationship” workflow’s source module catch this event and run workflow.

The process of deletion from queue and group is the same instead of event will be DeleteRelationship.

In case of my demo configuration it will look like this:

  1. The incident created with Urgency set to “High” (or updated from Urgency not equal High to High)
  2. This incident will added to “All high urgency incidents” queue
  3. This incident will be added to “SLA Group: Test SLO” group
  4. The “Test SLO-AddRelationship” workflow will run

All of this can be found in History tab:
image

“%SLO_NAME%-AddRelationship” workflow

Let’s see what exact do the “%SLO_NAME%-AddRelationship” workflow. If you look in to settings of this workflow you will see what this workflow starts the “ApplySLAOnGroupInstance” WPF (Windows Presentation Foundation) workflow from Microsoft.EnterpriseManagement.ServiceManager.SLA.Workflows assembly and passed ID of the object what was added to group and ID of the SLO object (of System.SLA.Configuration class):
image

This WPF workflow do the next:

Note: The SCSM engine can handle several types of SLA management but the SCSM 2012 (and SP1) support only one – date-time property based. So the first step for all workflow described below must be “Check the type of SLA management” but I will skip them because in all cases it will be the only one type.

  1. Get the System.SLA.Configuration object by id (“SLAConfigObjectId” property) and get the settings of this SLO (target and warning threshold) and related calendar and metric.
  2. Calculate target time based on metric and calendar.
  3. Create new object of System.SLA.Instance.TimeInformation type (Service Level Instance Time Information) and fill properties (see the table below)
  4. Created at step 3 object added as relationship to the object what has been added to group (i.e. to WorkItem).

Here is list of properties for System.SLA.Instance.TimeInformation class:

ID Object ID. Generate as (“SLAInstanceTimeInformation_” + unic GUID)
DisplayName Always equal to display name of the SLO object
Status Enumeration SLAInstance.Status
IsCancelled True if SLA was canceled. False by default.
TargetEndDate Target End Date
TargetWarningDate Target Warning Date
StartDate The last time when SLA was calculated
EndDate The last time when SLA calculation was stopped
PausedDate The last time when SLA calculation was paused

You can find objects of this type on “Service Level” tab in SCSM console. If we back to my demo configuration then we can see what after a workflow executed the new object of “Service Level Instance Time Information” class was added to incident and display name of this object is equal to display name of the SLO:
image

and we can find this object on “Service Level” tab:
image

And here is all properties of this object (and example how to get this object with PowerShell):
image

Code snippet:

$IR = Get-SCSMObject -Class (Get-SCSMClass -Name "System.WorkItem.Incident$") -Filter "Name = 'IR8579'"
$rel = Get-SCSMRelationshipClass -Name "System.WorkItemHasSLAInstanceInformation$"
Get-SCSMRelatedObject -SMObject $IR -Relationship $rel  | fl

Note: The status of the “Service Level Instance Time Information” object can be different from Active. See the “SLA tracking” section below.

“SLO%-DeleteRelationship” workflow

Now you know what happened when new object added to group and time to learn what happened when object removed from group. In my example I will decrease urgency to Medium. In this case the incident will be removed from queue and after that from SLO group too:
image

and for а “Service Level Instance Time Information” object the “IsCanceled” property will changed to “true”. SLA maintenance for this object will stopped.  Changing “IsCanceled” property will not change the state of the object itself. Here is canceled object with “Met” status:
image
and here is with “Not ready” status:
image

Delete, remove, delete, remove…

The reasonable question is “What happened if we removed object from queue and then added it back?”. In case of my demo configuration it means what we changed urgency to medium, save the incident, wait some time and change the urgency back to High. The answer is simple. The SLO%-AddRelationship workflow will start each time when object added to group. And this workflow knows what if “Service Level Instance Time Information” object for given SLO already existing he must change the IsCanceled property value to false and recalculate target time and status.

SLA tracking

Calculating and setting the target time what is good of course but until this moment we doesn’t see how the SCSM tracking this time. As we know we status of the “Service Level Instance Time Information” object  must be changed when target time or warning threshold time is overdue. To handle time tracking SCSM use another workflow with internal name “ServiceManager.SLAManagement.Library.SLAInstance.TimeInformation.PeriodicRule”. This workflow runs every 3 minutes and get all “Service Level Instance Time Information” objects (System.SLA.Instance.TimeInformation) by filter:

(IsCanceled != true)
AND
((TargetEndDate < [Now]) OR (TargetWarningDate<[Now]) AND Status != Warning)
AND
(Status!=Met AND Status!=Paused AND Status!=NotReady AND Status!=Breached)

and recalculate state for for each founded object.

Now it’s time to talk about “Service Level Instance Time Information” object’s states.

Status “Not ready”

The SLA tracking system is not enough data for calculation. In generally this means what the property what set as Start Date in metric has null value (empty).

Let’s assume what we want to handle with SLA operator’s reaction time regarding to incident. We want to check time between first assign and first response. I changed my demo configuration to:
image

The both this property are empty by default for incidents. But the queue criteria for incident is still the same and if urgency for incident is High then this incident will added to queue => it will added to SLO group => workflow started => new “Service Level Instance Time Information” object will created. But First assigned date is null so the new object will be in “Not ready” state:
image

And only when the First assigned value changed to some value the SLA will start tracking. This situation handled by %SLO_NAME%-StartEvent workflow (see the list of workflow in Part 1 and in the beginning of this article).

Status “Active”

This status means what SLA tracking system is live for this object: the object isn’t canceled, the Start Date is not null and End Date is null.
image

Status “Warning”

This means what the “ServiceManager.SLAManagement.Library.SLAInstance.TimeInformation.PeriodicRule” workflow was found what the End Date property is still null and (TargetEndDate value of the given object minus “Warning threshold” value of related metric) is less than current time.To show this I changed the Target Time for metric to 1 hours and Warning threshold to 20 minutes.
image

The incident was assigned at 02:33:38:
image

But you remember what my calendar has a scheduler from 10:00 to 19:00 so the calculated target date will be “17.02.2013 11:00”
image

Status “Breached”

This means what the “ServiceManager.SLAManagement.Library.SLAInstance.TimeInformation.PeriodicRule” workflow was found what the End Date property is still null and TargetEndDate value of the given object is less than current time.

Status “Met”

This means what the “ServiceManager.SLAManagement.Library.SLAInstance.TimeInformation.PeriodicRule” workflow was found what the End Date property is not null and TargetEndDate value of the given object is more than current time:
image

Status “Paused”

This means what SLA tracking is paused. I wrote article about pausing SLA tracking long time ago. In part 3 of this series I will show how to configure this.

Note: this status is not supported by Microsoft and can’t be set in normal behavior of SLA tracking

Some tips

But what if value for the End Date property will set before the Start Date property? In my example this is possible because the “First assignment date” set by workflow but the “First Response Date” can be set from console. So if the operator open the incident, assigned incident to himself and set the first response then the “First Response Date” will have a value but “First assignment date” will null until workflow fired. The good news what SLA tracking workflows can handle this situation and will will set “Met” as soon as Start Date property get some value.

The First response date set:
image
and First Assigned Date set after that:
image

and “Service Level Instance Time Information” object’s status set to Met.

image

Just the same behavior you can see if Start Date property and End Date property will be set in same time.

Service workflow

With standard workflow what handle normal SLA tracking there is one more service workflow are exists in SCSM. The internal name of this workflow is “ServiceManager.SLAManagement.Library.SLATimeMetric.Rule.Update” and it handle events of metric’s change. When started this workflow updates all criteria in all SLO’s workflows %SLO_NAME%-StartEvent” and “%SLO_NAME%-EndEvent” what have used a changed metric. That means what you can change metric’s properties and this changes will reflect to all object handled by SLO with this metric.

But there is no other service workflows in SCSM 2012. As a result any changes made in SLO configuration will affected only newly created objects.

Summary

Please don’t expect the “real time” reaction of the SLA tracking. As you can see the SLO configuration can be applying up to 3 minutes (one for queue calculation, one for SLO group calculation, one for workflow). Same true for SLA tracking: SCSM will check object only once on 3 minutes.

And few useful links:

SCSM 2012: Service Level Management (overview)
Notifying before SLA breaches
How to Send SLA Notification Information to the Assigned-To User
More transparency in SLO Management by reusing the “Target Resolution Time” field

What is all. I really hope what you can imagine how the SLA tracking system works in SCSM 2012 (SP1). In next part of this series I will show some implicit and hidden features of the SLA tracking. And yes, I will how to configure the SLA pausing. Be patient ))

Appendix: “Why changes made for SLO configuration doesn’t affect the existing object?”

This question I saw many time. As you remember from Part 1 the newly created SLO objects affect only work items what was creaed after the SLO (a SLA Group: %SLO_NAME%” group and discovery). From this article you’ve learned what changin the SLO configuration also doesn’t affect the existing work items. But why so?

To answer on this question we must move out from technical aspects and “read the papers”. The Service Level Agreement is the document. This document is very impotent because in most cases it used as “book of rules” between IT department and business. In case of Service Desk systems the Service Level Agreement tell as how fast we must react on requests from users. In some cases SLA can affect money. But in all cases the SLA is a law in all sense. This is rules of the game – game between IT and business. And also this is law document. And any law document  can’t be changed by “post factum”.

Assume what you take credit in bank for 2 years with 0,1% rate to buy a Ferrari. But after one year bank increase your rate up to 80% and ask you pay in additionally 79,9% for last year. But this is insane. The same for SLA. Any changes must affect only new objects.

Share

%d bloggers like this: