In last two articles of SLA series I was wrote about storing of the SLA objects and describe how it’s works.Today it’s time to talk about some “hidden” features of the SLA management system in SCSM 2012. Most of this features are available out-of-box and supported by Microsoft but some of them are not.
Queues
The main goal of the SLA management system is track compliance of the agreement. And the fundamental criteria of the SLA tracking is which elements must be tracked. If we talk about SCSM 2012 then this criteria handled by queues. All other parameters like calendar, metric and SLO are very simple and has no any interesting parameters. But queues is a powerful instrument for SLA management. The queue sets on which object will applied one or other SLO object. It can be (even must be) written small book about queues and how we can use them in SCSM but in this article I will describe only small part regarding to SLA management system.
Queue is a group
… and you must understand that each time when you use it. This means what::
- Member in queue can be added dynamically (by membership rule) and static (aka “explicit member”)
- You can use exclusion
- Count of the queue and complexity of it membership rule affect performance of the SCSM server and database.
Unfortunately, with SCSM console you can use only dynamic membership rules. But you can always export management pack with queue and change it as you want.
Membership rules
As I just wrote above, for dynamic membership you must define the membership rule. The membership rule is just a set of criteria used by SCSM engine to populate queue with objects. Unfortunately, in most cases, the administrators of SCSM using the simplest criteria like “Urgency = High and Priority = High” or similar. But membership rule allows us to use much more complex criteria. We can use not just a property of the target class, but any relationship for selected target class. Do you have “VIP User” property in your User class and want to have queue with all incident where affected user is VIP user? No problem: just select combination class “Incident (typical)” and use the Affected User relationship and any properties of the User class for your criteria. Do you want to have a queue with all service requests what affect the given Business Service? No problem but little bit more complex. Just create your own combination class with “Affected Configuration Items” relationship with restriction to Business Service class.
With combination classes we can use any nesting level. As example you can create combination class for service request with affected business service and add “Service owner” relationship as child. In this case you can build queue to get all service request where owner of the affected service located in given organization unit or worked in given company i.e. you can use any property of service owner.
Also you must note what you can use relationships not only from Source to target but vice verse too. As example you can create combination class for incident and add “Has parent work item” but set direction from target to source (in this example the Target is child incident and the Source is parent) with SeedRole=”Target”. As result you can use this combination class to create queue with all incident which has parent incident with Urgency=”High” (or you can use any other properties of the parent incident).
A small summary about combination classes: if you see what combination classes provided out-of-box doesn’t contains all necessary relationship then you always can create you own. The high-level overview of process:
- Create combination class (or “type projection” in terms of object model) with necessary relationships and restrictions. You can do this with any text editor or with VSAE.
- Import management pack with created combination class to SCSM
- Create new queue based on created management pack
It’s time for some demo about membership rules. I little bit lazy so I will not show you how to create combination class and will use existing one. Let’s assume what we have some OU in our domain. This OU contains all accounts for our CIO, CEO and other leaderships. We want to create queue with all incident where affected user is user located in this OU. To do that we can:
- Create new queue and select “incident (typical)” combination class as target:
- On “Criteria” tab select “Affected User” and add “Distinguished Name” property to criteria. This property contains full LDAP path of the object in “CN=….,OU=…..,DN=…,DN=” format so we can use “Contains” or “Ends with” operators and full LDAP path as filter. In case of the screenshot below the queue will contains all incident where affected user located in OU “SystemCenter Inc\Chiefs” and lower OUs:
Here is AD layout:
To get queue members you can use my script published early. But note what queue is a group and group recalculated each 60 seconds (by default)
This is all what you can use from UI. But if you will export management pack with queue then you can:
- Set exclusions for queue. This can be helpful if want to exclude some object from SLA management.
- Use complex operators like Contains/NotContains and Contained/NotContained. This allows you to check if some object contains (or contained) in other groups\queues
- Use more than one membership rule for one queue
In other word you can use all features of the group membership engine in OpsMgr\SCSM.
Queue and performance
The queue has same limitation as groups due to fact what queue is a group.The first of all (yes, this sounds little bit strange but…) try to use the fewest possible numbers of the queue:
- Try to use same queue for all things: for SLA, for roles, for notification
- Remove queues as soon as they not used any more
The second rule – use combination classes with fewest possible number of relationships. And you must never use “* (typical)” combinations classes in you queues.
The third rule. If you have a lot of queues and groups then you can try increase calculation interval. But keep in mind what this affect the SLA workflows because as you know from part 2 the SLA’s workflow runs after queue recalculated and object added to queue.
Pausing SLA
ATTENTION! All information below provided as is without any warranties. This approach is totally unsupported by Microsoft and\or by author. The author is not liable for loss of information due to the use of these recommendations.
A long time ago I’ve published article about Pause status of the SLA in SCSM 2012. This article is very popular, but I couldn’t publish any information about how to do that until Microsoft approved it. Now I have a “green light” and it’s time to open Pandora’s box.
If you read carefully Part 1 and 2 then you already note to some properties of metric object and workflows what regarding to pausing SLA. This properties and workflows are not supported by Microsoft but worked in simple scenarios.
Criteria for pausing and resuming
To use pausing SLA in SCSM you must define to criteria: when it should paused and when resumed. This criteria is absolutely the same as in all other workflows (notifications) and allow you to check values what was before change and after change.
But less words and more examples! One of the most popular requirements around SLA pausing is stop SLA calculation when incident change status to “Pending”. From information above we must define two criteria:
- Pause SLA: Status_BEFORE != “Pending” AND Status_AFTER = “Pending”
- Resume SLA: Status_BEFORE = “Pending” AND Status_AFTER != “Pending”
Note: for all below I will assume what your already have working SLO and you want implement pausing only.
The simplest way to create criteria for our SLA pausing workflows is create new subscription with necessary criteria in SCSM console. This notification must be created in same management pack with SLO object and must be disabled right after creation. After that you can export this management pack and copy criteria from subscription rule to SLA workflow.
If you done all of this and looking into your management pack then it must look like this:
<Rule ID="NotificationSubscription_da22181e_fb15_444c_b21b_ef45ca49b11c" Enabled="true" Target="SystemCenter!Microsoft.SystemCenter.SubscriptionWorkflowTarget" ConfirmDelivery="true" Remotable="true" Priority="Normal" DiscardLevel="100"> <Category>System</Category> <DataSources> <DataSource ID="DS" TypeID="SystemCenter1!Microsoft.SystemCenter.CmdbInstanceSubscription.DataSourceModule"> <Subscription> <InstanceSubscription Type="a604b942-4c7b-2fb2-28dc-61dc6f465c68"> <UpdateInstance> <Criteria> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <Property State="Pre">$Context/Property[Type='CustomSystem_WorkItem_Incident_Library!System.WorkItem.Incident']/Status$</Property> </ValueExpression> <Operator>NotEqual</Operator> <ValueExpression> <Value>{b6679968-e84e-96fa-1fec-8cd4ab39c3de}</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <Property State="Post">$Context/Property[Type='CustomSystem_WorkItem_Incident_Library!System.WorkItem.Incident']/Status$</Property> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value>{b6679968-e84e-96fa-1fec-8cd4ab39c3de}</Value> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> </Criteria> </UpdateInstance> </InstanceSubscription> <PollingIntervalInSeconds>60</PollingIntervalInSeconds> <BatchSize>100</BatchSize> </Subscription> </DataSource> </DataSources> <WriteActions> <!-- cuted here -->
Enabling pausing\resume SLA
Now you must copy-paste this criteria to corresponding SLA workflows. To do what you must copy entire <UpdateInstance> element from subscription rule. First of all you must find %SLO_NAME%-PauseEvent rule in your management pack (please see the Part 1 of this series about terms and abbreviation) and replace <UpdateInstance /> element with copied from subscription rule. After what you must enable SLO workflow by replace Enabled=”false” to Enabled=”true”. As result you must get something like this:
<Rule ID="WorkflowSubscription_1ce1859c_73c0_43ad_bd76_87616e00ad96" Enabled="true" Target="SLAWorkflowTarget_bcae34c368294fdcb14f80d4111ee005" ConfirmDelivery="true" Remotable="true" Priority="Normal" DiscardLevel="100"> <Category>System</Category> <DataSources> <DataSource ID="DS" TypeID="SystemCenter1!Microsoft.SystemCenter.CmdbInstanceSubscription.DataSourceModule"> <Subscription> <InstanceSubscription Type="a604b942-4c7b-2fb2-28dc-61dc6f465c68"> <UpdateInstance> <Criteria> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <Property State="Pre">$Context/Property[Type='CustomSystem_WorkItem_Incident_Library!System.WorkItem.Incident']/Status$</Property> </ValueExpression> <Operator>NotEqual</Operator> <ValueExpression> <Value>{b6679968-e84e-96fa-1fec-8cd4ab39c3de}</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <Property State="Post">$Context/Property[Type='CustomSystem_WorkItem_Incident_Library!System.WorkItem.Incident']/Status$</Property> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value>{b6679968-e84e-96fa-1fec-8cd4ab39c3de}</Value> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> </Criteria> </UpdateInstance> </InstanceSubscription> <PollingIntervalInSeconds>60</PollingIntervalInSeconds> <BatchSize>100</BatchSize> </Subscription> </DataSource> </DataSources> <WriteActions> <WriteAction ID="WA" TypeID="SystemCenter1!Microsoft.EnterpriseManagement.SystemCenter.Subscription.WindowsWorkflowTaskWriteAction"> <Subscription> <VisibleWorkflowStatusUi>false</VisibleWorkflowStatusUi> <EnableBatchProcessing>true</EnableBatchProcessing> <WindowsWorkflowConfiguration> <AssemblyName>Microsoft.EnterpriseManagement.ServiceManager.SLA.Workflows</AssemblyName> <WorkflowTypeName>Microsoft.EnterpriseManagement.ServiceManager.SLA.Workflows.ModifySLAOnInstanceUpdate</WorkflowTypeName> <WorkflowParameters> <WorkflowParameter Name="WorkflowMode" Type="string">PauseEvent</WorkflowParameter> <WorkflowArrayParameter Name="InstanceIds" Type="guid"> <Item>$Data/BaseManagedEntityId$</Item> </WorkflowArrayParameter> <WorkflowParameter Name="SLAConfigObjectId" Type="guid">7b1505b3-fd95-06f1-f10e-d69d00dfbe1c</WorkflowParameter> </WorkflowParameters> <RetryExceptions /> <RetryDelaySeconds>60</RetryDelaySeconds> <MaximumRunningTimeSeconds>7200</MaximumRunningTimeSeconds> </WindowsWorkflowConfiguration> </Subscription> </WriteAction> </WriteActions> </Rule>
Also you must copy-and-replace <UpdateInstance> element for %SLO_NAME%-ResumeEvent rule but swap <Operator> (Equal to NotEqual and vice verse).
Let’s remember configuration of my test SLO:
- Name: Test SLO
- Metric: From First Assigned Date to First Answer Date
- Calendar: Monday-Friday, from 10:00 to 19:00
- Target time: 1 hour
- Warning threshold time: 50 minutes
My incident was assigned at 15:54 so the Target End Date time is 16:51:
The Status of the incident was changed at 15:56:
Right after that the PauseEvent workflow fired and SLO will be paused:
Note: the Target End Date will NOT change when object goes to Paused state
The Status of the incident was changed to Active at 16:20:
As result the SLO will be recalculated and Status changed to Active:
As you can see the time when SLO was in Paused status in not included in total time. In other words if time before SLO breached at moment of SLO paused was 30 minutes then it will the same then SLO will resumed. From example above you can see this behavior:
SLO paused at 15:58
SLO resumed at 16:22
16:22 – 15:58 = 24 minutes
Target End Date before paused: 16:51
So expected new Target End Date must be
16:51 + 24 = 17:15
and you see what this is it: Target End Date after resumed: 17:15
The fast way to find pausing and resuming SLA workflows (rules)
To find necessary workflows you must build display name, find this display name in management pack and get ID of the workflow. No easy and little bit complex. To speedup this process I’ve created the PowerShell script::
param([string]$SLADisplayName) if(!$SLADisplayName) { write-host "" write-host "Usage:" write-host "Get-SCSMSLAWorkflows.ps1 ""SLO Display Name""" write-host "" return } import-module SMLets $SLAConfigObject = Get-SCSMObject -Class (Get-SCSMClass -Name "System.SLA.Configuration") -Filter "DisplayName = '$SLADisplayName'" if($SLAConfigObject) { [guid]$SLAConfigObjectId = $SLAConfigObject.Get_Id() $pauseRule = Get-SCSMRule | ? {$_.DisplayName -eq ($SLADisplayName + "-PauseEvent")} $resumeRule = Get-SCSMRule | ? {$_.DisplayName -eq ($SLADisplayName + "-ResumeEvent")} write-host "" write-host "Pause and resume workflows for SLO '$SLADisplayName':" write-host ("`tPause workflow: `t" + $pauseRule.Name) write-host ("`tResume workflow: `t" + $resumeRule.Name) write-host ("`tManagement Pack: `t" + $resumeRule.ManagementPack.DisplayName + " [Name: " + $resumeRule.ManagementPack.Name + "]") write-host "" } else { write-error "SLO '$SLADisplayName' not found!" }
Save this script as Get-SCSMSLAWorkflows.ps1 and run it set display name of you SLO as parameter:
Everything you need is export management pack and find workflow by ID.
Pausing SLA. Epilog.
The high overview of pausing SLA process:
- Get ID (internal name) of Pause and Resume workflow
- Export management pack with SLO and search those workflows in management pack
- Add criteria for Pause and Resume workflows
- Enable workflows
- Import management pack back to SCSM
You can use any criteria for pausing\resuming SLA and that can help you to build any SLA. But keep in mind what SLA pausing is not supported by Microsoft so you must test each criteria before implement in real life.
Note: There is one more way to implement pausing. You can set the PauseEventCriteria and ResumeEventCriteria properties of the Metric object with SDK. In this case this criteria will be used for each SLO created based on this Metric.
Summary
The SLA system in SCSM 2012 (and SP1) are very powerful. The entire book can be written about SLA and this series of article just small part of the SLA system. After you impellent your SLA you should create reports and\or OLAP cubes to analyze you SLA metrics. But you can use SLA system not only for SLA itself. For example, you can use SLA system to set maximum period for approve process (“this review activity must be voted for 3 days or….[do something useful]”). You can run any other workflow when SLO object changed they status to Warning or Breached. For example above you can run PowerShell script if review activities still not approved and auto-approve them.
So don’t be a fear to test any SLA scenarios.
21 Comments
Congratulations Anton,
Great sleuthing work to get to the bottom of Pause / Resume :-)
Chris
Great article, Anton!
I have a question for you: We’re using 5 priorities (5 queues and SLOs), but I’m thinking that instead of adding this feature for the 5 of them, I could create a sixth queue and SLO called “All open incidents” to include all 5 priorities and add the pause/resume functionality there.
Have you tried something like that? Any recommendations?
Thanks!
German
Disregard my previous comment, I’ve just figured out that these workflows are tied to each SLO object id… I’ll build 5 of them.
Thanks again!
German
Why pausing manually with SMLets (setting status to Pause and writing in PausedDate) does not Resume afterwards via resume workflow?
I need it because if someone puts ticket “On Hold” before SLO objects are generated for that ticket, they do not go “On Hold”, as there is no event firing the workflow, so I do it with Orchestrator instead.
If you can describe what exactly you do with SCOrch then maybe I can help you. But keep in mind what Pausing\Resume workflow do the big job around SLA. It’s not just set status\paused date. It’s also the calculate date and other things.
Awesome work!
I think its funny that 1 year ago you were telling me on the technet forums that pausing the SLA clock makes no sense and here you are, having created just that functionality! :)
And I still thinking the same )) I have no “real live” project with SLA pausing.
I have run through this, and I get the paused state, but my target dates do not seem to change. Any ideas what I could have missed?
Audrey, I am facing the same issue, did you manage to solve it at your environment?
thank you
Best Regards
Can i delete the subscription that I made after copying the criteria from it? Or should I just leave it disabled?
Yes, of course you can delete subscription.
I am running that ps script but its throwing the error as below..
Get-SCSMRule : The member “ManagementPack” is already present from the extended
type data file.
At C:\SCSM 2012 Source Files\Get-SCSMSLAWorkflows.ps1:19 char:30
+ $pauseRule = Get-SCSMRule <<<< | ? {$_.DisplayName -eq ($SLADisplayName
+ "-PauseEvent")}
+ CategoryInfo : NotSpecified: (:) [Get-SCSMRule], ExtendedTypeSy
stemException
+ FullyQualifiedErrorId : AlreadyPresentInTypesXml,SMLets.GetSMRuleCommand
Get-SCSMRule : The member "ManagementPack" is already present from the extended
type data file.
At C:\SCSM 2012 Source Files\Get-SCSMSLAWorkflows.ps1:20 char:31
+ $resumeRule = Get-SCSMRule <<<< | ? {$_.DisplayName -eq ($SLADisplayName
+ "-ResumeEvent")}
+ CategoryInfo : NotSpecified: (:) [Get-SCSMRule], ExtendedTypeSy
stemException
+ FullyQualifiedErrorId : AlreadyPresentInTypesXml,SMLets.GetSMRuleCommand
After that it prints blank names for all as below:
Pause and resume workflows for SLO 'MMXX_Incident Resolution Time SLO – P1':
Pause workflow:
Resume workflow:
Management Pack: [Name: ]
Something wring with your SMLets. Try to reinstall or check if native SCSM modules aren’t imported in same PS session
The pause-resume workflow works for some time but later it stops working for some reason. When I checked the MP, I realized that the 2 enabled lines in the XML had no criteria in it. Both of them turned into their default state. I tried this in 3 project but experienced this issue in each of them.
Looks like you changed your SLO using the Console. In this case Pause and Resume workflows will be disabled.
I update the Management pack file, however, when I tries to import back, I am getting error i.e. Cannot resolve identifier CustomSystem_WorkItem_Incident_Library!System.WorkItem.Incident. Import fails with this error
Are you planning to post some MP in Technet Galery with this stuff? Thank you.
No. And what exactly do you want to see at TechNet Gallery?
Maybe a MP that enables the SLA pause feature, it would be awesome.
Anyone have this randomly put tickets in a pause state unexpectedly?
My IRs are paused as they should. I can see the timer stopping while its paused, but as soon as I activate it again, it includes all the time from when it was in a paused state. Thoughts?