ISE Design - Going Above The Configuration

In this blog post, I'm going to get into designing, scaling and deploying ISE. Like any piece of infrastructure, all the best configurations in the world won't help you if it's not design properly. In this post, I'm going to really focus on what I do to make an ISE implementation successful.

Defining the Security Policy

One important thing to remember with ISE is that it's a control for your company's security policy but it's not supposed to write your security policy for you and it shouldn't dictate what your corporate security policy is. You should never start planning your ISE deployment without having a company security policy in mind and stating your goals. Different companies, industries, regulations, auditors, etc might guide each company to have a different security policy so you should deploy your ISE implementation to compliment that security policy.

Some of the questions I would pose include:

What are we trying to protect?
Do we need to restrict access based on roles, endpoint type, etc?
Is complete or partial network segmentation required?
Are we going to be preventing east-west traffic as well as north-south?
Will we allow BYOD? If so, will we allow those endpoints to talk to internal assets? What level of control over those endpoints do we require?
Is guest access a requirement? If so, is there a requirement to track who is coming onto the network as a guest?
Will we be tracking and controlling corporate assets?
Will we be restricting access dynamically based on changes to those corporate assets?
Is there a corporate security policy that governs the use of technology assets and restrictions on them? May I see it if there is? Can we create one if there is now?
Who are the stakeholders in management that will be supporting this project? <- This one is important. As with any security control you put into place that is new, access will change for the user and it's bound to make people complain if they don't have the same level of freedom that they had before. If you don't have top-down support for this going in, there's no easy way to succeed with layer 8 issues.

Gathering more information about the environment

After getting a feel for what the goals are of this ISE implementation, I like to dig in using something like the Cisco ISE High Level Design which you can download from the ISE Communities here. It's a planning document to feel out what the company is hoping to achieve and some technical information

In most cases, no one really knows how many endpoint are out in there network at any given time but it's important to work up an approximate number for licensing purposes and planning how how to size the deployment. When it comes to licensing, remember this: The licensing is done on concurrently connected endpoints on the network. That means if you have a company that engages in shift work, you need to approximate the highest number of endpoints connected to the network at one time - not the total number of endpoints that might be on your network in a 24-hour period.

There are four license types:

Base - These are perpetual licenses. You need one of these for every endpoint that is connected to your network regardless of how it's accessing your network. An endpoint only uses a base-only license if they are connecting using EasyConnect, 802.1x, BYOD without ISE CA, and Guest Access.

Plus - These are subscription licenses and the use case for these is as follows: Access based on profiling, BYOD with ISE CA, and pxGrid context sharing. Very important: If you are deploying ISE for wired access, you will also need some Plus licenses. The reason being is that you will always have devices like access points, phones, printers, etc that you would want to profile. Without Plus licenses, you would be making static MAC lists and this does not scale. Any money you save in Plus licenses will end up costing you more in administration and static MAC lists are not very secure. Note: If you have a single Plus license in your deployment, you will get details from the device profiling feed service and as long as you have profiling turned on in ISE and data being sent to ISE to collect, it should profile the endpoint automatically. Do not worry that this is causing you to use a Plus license. The only time a Plus license is used when you create an Authorization Rule in your policy set to enforce based on that profile and that endpoint hits that rule. Beyond that, don't be afraid to still collect those details about the endpoints even if you are not enforcing on it.

Apex - These are subscription licenses. Not to be confused with AnyConnect Apex licenses. This license is consumed when you use posturing, MDM integration, or mitigate with Threat-Centric NAC. Note: If you are using AnyConnect and not the Temporal agent for posturing, you would need to make sure you have separate AnyConnect Apex licenses.

Device Admin - This is a perpetual license. Think of this as your TACACS+ features from ACS on ISE. One thing to note is this: It's a single license per ISE cube, not a license per managed network device and it's not tied to any other licenses. Unlike ACS where you would buy based on managed devices, ISE does not require this. The only exception where you have to buy an additional license is f you are purchasing ISE for nothing but device ddministration. If you do need to purchase at least 100 Base licenses but that's only because ISE needs at least 100 Base licenses to run - it's not tied to the managed network device count.

A nice graphic of all the different use cases for ISE licenses is displayed below if you want to understand the licensing model a little more:

It's also important to know the business requirements first before learning more about the infrastructure because if you find out the business requirement is to posture all corporate assets but no BYOD assets, it's good to know that and keep that in mind as you're designing this since you won't have to worry about someone's phone or personal laptop.

As I stated above, it's hard to have an exact count of how many endpoints are on the network but this is sort of what I base my guesstimates on:

Corporate users - I would say the majority of them will have 3 endpoints per person if you take into account a desktop, tablet or personal laptop, and phone. Depending on how many users are at work at a given time based on the largest shift, this will give me a generally good idea.
Phones - IP Phones don't generally get disconnected from the network because someone is off shift but the enterprise is using CUCM or some other phone management server, you can get a pretty accurate count. Depending on the phone manufacturer, the phone might also have the ability to use 802.1x using the manufacturer certificate that's built in or another certificate that you load on there. This would be the difference between using a Base license vs a Plus license for each phone while still ensuring security (NO STATIC MAC LISTS!)
Access Points - Most enterprises have a general count of these by looking at their wireless controller and it'll help you to be able to determine how many of these are in the environment. They should always be connected as well. It's usually pretty easy to profile these but depending on the manufacturer, the access points may also do 802.1x as well.
Printers, IoT Devices, etc - This can be a little harder to figure out but there should be a general number that you can find out out there.

Depending on the size of the environment, you could also cheat a little by spinning up an ISE virtual machine and profile via SNMP and NMAP to get a general idea of how many endpoints are out there. It depends on the environment but it could be useful for collecting information even if it's on just one sample site.

Another key thing to consider about the environment is whether or not it's an environment you want to deploy ISE as a virtual machine or appliance. Lots of people usually say VM really quickly but before you jump to that, I want you to think about this because it's very very very very very very very important:

Do you trust whoever manages your virtual machines to never change the ISE VM or move resources away from it?

Once that ISE VM is deployed, if you change any of the resources on it, it will become unstable. Even if the RAM and CPU usage looks fine and you decide to only take 1GB of RAM away from it, DO NOT. It can cause very big problems that are often hard to diagnose. Even if you use the ISO to install ISE on top of a blank virtual machine, do not change it after it's installed. If you think someone might change them when you aren't looking, go with physical appliances.

Note: You can have a mixed environment as well with physical and virtual appliances. You aren't just limited to one form factor.

DEsigning your ise cube

When designing your ISE cube, think about the following questions and keep in mind the information you gathered from previous steps:

How many concurrently connected endpoints are there at peak time? Make sure you take into account 20-30% growth in the future when it comes to the actual design (not necessarily licensing if you don't want to - more licenses can be added later).
How geographically distributed is the deployment?
Are you going to be migrating ACS over to ISE or using ISE for device administration? If so, how large is your deployment?
What kind of high availability and redundancy would your company prefer?

When it comes to factoring in the concurrently connected endpoint, you should understand the ISE personas a little better since will help you with how to scale ISE. There are three main ISE personas you will have in every ISE deployment. They can live on the same appliance or distributed. How you deploy them will determine how you can scale ISE.

PAN - This is where you perform all the administration for the deployment. It's highly recommended to have at least two of these in a deployment whether you are doing two ISE appliances with all the personas on it working as an Active/Standby pair or you have a fully distributed deployment. If one were to fail and you don't have a backup, you have no way to administer to your ISE Cube
MnT - This is the log collector of the ISE deployment and stores all the log messages from your individual PAN and PSNs. The advanced monitoring and troubleshooting tools are built into it. It is also recommended to have at least two of these in your deployment operating as active/standby.
PSN - These are the real workhorses of ISE. They will carry the configuration that is pushed from the PANs and answer all RADIUS requests sent from your network access devices. They also perform PassiveID, SXP, Device Admin (TACACS+) services, profiling, etc if those services are turned on. You can have up to 50 of these in an ISE deployment as of my writing this.

Another optional persona type is pxGrid. This is not a mandatory persona and if you're not integrating ISE with third party systems, you might never need to use it. If you do choose to add a pxGrid node, you can add up to 2 pxGrid nodes in an ISE cube and they run as active/standby.

At this point of your planning, you should have a general idea of how many endpoints are going to be in the deployment so lets look at some of the scalability numbers as of ISE 2.2/2.3:

Distributed Deployment - This is if you choose to separate out your PAN, MnT and PSNs as a fully distributed deployment:
- You can have a maximum of 250,000 concurrently connected endpoints and up to 40 PSNs deployed if you're using the 34xx series physical appliances (or similarly sized virtual appliances)
- You can have a maxumum of 500,000 concurrently connected endpoints and up to 50 PSNs deployed if you're using the 35xx series physical appliances (or similarly sized virtual appliances).
Medium Deployment - In this deployment, you have combined the PAN and MnT personas on the same ISE appliance but deploy separate PSNs
- You can have a maximum of 5,000-10,000 concurrently connected endpoints and up to 5 PSNs deployed if you're using the 34xx series physical appliances (or similarly sized virtual appliances)
- You can have a maximum of 7,500-20,000 concurrently connected endpoints and up to 5 PSNs deployedif you're using the 35xx series physical appliances (or similarly sized virtual appliances).
Small Deployment - This is a deployment where you have the PAN, MnT, and PSN personas on the same appliance. You can still have two appliances in this setup where they are active as PAN/MnT active/standby and active/active for the PSN services to add some high availability
- You can have a maximum of 5,000-10,000 concurrently connected endpoints if you're using the 34xx series physical appliances (or similarly sized virtual appliances)
- You can have a maximum of 7,500-20,000 concurrently connected endpoints if you're using the 35xx series physical appliances (or similarly sized virtual appliances).

Full ISE Performance & Scale Numbers

If you think you are likely to scale past 20,000 endpoints that are concurrently connected in the near future, you really should go with a distributed deployment.

You also have the ability to add nodes and change the personas at a later time. If you are starting small and decide you want to distribute your deployment more, you can spin up a new ISE VM or get a new ISE appliance and migrate a persona over and add that scalability to your existing ISE deployment.

Another thing to consider is the maximum concurrent sessions per appliance type. Depending on how many endpoints you anticipate in your deployment, this can make your decision on what type of appliance to go with. Whether you are deploying ISE from an OVA or as a physical appliance, you're really sizing it the same as a physical appliance would be sized. Based on the physical appliance models, you have a finite amount of resources and need to be aware of each model's limitations:

SNS 3415 (VM or Hardware) - 5,000 concurrently connected endpoints
SNS 3495 (VM or Hardware) - 10,000 concurrently connected endpoints
SNS 3515 (VM or Hardware) - 7,500 concurrently connected endpoints
SNS 3595 (VM or Hardware) - 20,000 concurrently connected endpoints

Knowing these model limitations is important because if you point too many switches towards the same PSN, it won't matter how distributed your ISE deployment is if you overload that single appliance. If you have a large deployment, you might want to consider these ideas to keep a single PSN from being overloaded by too many requests:

Having a local ISE appliance or VM onsite at critical sites to service local clients and then having backup ISE appliances in the data center. If the PSN ever gets cut off from the PAN because the WAN goes down, it would still continue to service the endpoints as it was before. 802.1x and AD logins would still work - you just wouldn't be able to create new guest accounts, profile new endpoints, or make configuration changes to the PSN. When you're configuring this on network access device, you can easily have the local PSN be the first RADIUS server in your RADIUS group in the configuration and have the backups listed after.
Centralizing the PSNs and placing them behind a load balancer. Not only will this insure seamless failover if one of the PSNs were to go down but it would also insure that if there is a failure, one of the remaining PSNs will not become overutilized. Placing load balancers in front of ISE PSNs is a supported design and you can read the details at the ISE Communities by clicking here.
Centralizing PSNs by geographic region and ensure the switch configuration is different per region to point to the correct PSNs. If this is a large scale ISE deployment, I perfer this option less because I like to standardize all my switch configurations where possible and you have to hope that every Jr Network Admin configures the regional switches correctly for that region.

The important thing to consider in a design isn't that you scale with just the right amount of PSNs but you insure that a failure scenario of one or two PSNs does not put you in a situation where one PSN is performing poorly due to being overutilized.

As far as geographic distance and latency, it used to be a bigger issue in earlier versions of ISE but has improved in later versions of ISE. When ISE 1.1 came out, you could only have 100ms latency between the PAN and any other ISE appliance which made large geographic deployments a lot more tricky. Thankfully from ISE 2.1 and up, you can have up to 300ms of latency between any separate ISE appliance. In most cases, this usually works out fine but there might be issues with certain Asian countries to the US so it's important to gauge that out ahead of time. In my experience, it will not break anything in ISE if you go above 300ms in most cases but user experience will start to suffer at this point and you don't want that.

There are certain bandwidth needs between different ISE node that need to be taken into account. To plan for your bandwidth needs, Cisco provides an ISE bandwidth and latency calculator here. If you have limited bandwidth, I would highly recommend implementing some QoS between the different ISE nodes.

When it comes to using ISE for TACACS+ and you have a small-to-smallish-medium deployment, it should be fine to just run it all on one ISE cube along with the rest of your services. If you would consider yourself a medium to large deployment, I would guide you towards a separate ISE cube for those device admin services. The performance per platform for TACACS+ really depends but here are the general numbers from the ISE 2.2+ Deployment Scale and Limits guide found on the ISE Communities:

That may look like it'll scale fine if you only have a couple network engineers but let's say you have some processes that are scripted or network management systems that run processes daily or every couple hours? That will start to add up if you have hundreds or thousands of switches. Also important to note that the PSNs that are performing TACACS+ services also use the same MnT and if you're doing tons of command accounting for everything the scripts or management systems are doing, it may strain the MnT nodes if it's combined with a large ISE deployment for network access. If you worry that your deployment is too large for both to be in the same ISE cube, I would say to err on the side of caution and just stand up a separate ISE cube that's dedicated for TACACS+.

High availability and redundancy is the next thing I want to talk about when it comes to ISE. What is considered an acceptable risk is highly subjective depending on the business. For example, military or defense may have requirements that NOTHING can ever get on the network unless it's identified and authenticated first and in the event of a failure, denial of services is the preferred option while the health industry would prefer a "fail open" strategy because availability of services is more important than locking it down.

Here are some of the common options I usually go over with my clients:

Full Redundancy - If security is key for you and everything must be authenticated to the network even in the event of a WAN failure, this would be something to consider. This is where you would have a PSN deployed at every site or critical sites - possibly along with a domain controller. If the PSN were to every lose connectivity to the PAN and MnT, it would still be able to service it's existing sessions, 802.1x, existing guests, existing profiled endpoints, etc. It won't be able to profile new devices or create new guest accounts but in this situation, security is more the priority than the immediate need to onboard a new guest to the environment.
ISE nodes in centralized locations but still redundant - This accepts some risk if the WAN was to completely go down but is still an acceptable design - especially if you have redundant WAN circuits at the sites which lowers your risk.
RADIUS Dead Scenario: Fall back to VLAN or fail open - If you've centralized your PSNs and the network access device is not able to reach it, they will eventually hit their RADIUS server deadtime. Depending on how you configured the switch, you can have it essentially "fail open" or just fail to a specific VLAN if the RADIUS server is declared dead. When the RADIUS server is declared alive again, the endpoints will go through their regular authentication and change access based on what they should have access to.
RADIUS Dead Scenario: Critical ACL - This was introduced in the IBNS 2.0 deployment guide and you can read more about it here. If the deployment guide is a bit dense for you (it was for me), then I would recommend reading the Chapter 11 of Aaron Woland's BYOD book 2nd edition where he broke it down in under 4 pages beautifully. This option gives you the ability to apply a "critical ACL" in the event that the RADIUS server is unreachable. You can also configure this as an "ip permit any any" and essentially have it fail open.

Depending on the risk that the business wants to accept, this could guide you on how many ISE appliances you deploy more than endpoint count or less if the company is alright with one of the dead scenerios I specified above.

Preparing your environment for the ISe deployment

This is probably the part where I've seen the most issues in any initial ISE deployment. If you've gotten to the point where you've completeed your high-level design, have management buy-in, determined how many endpoints there are on your network, and you've set up the ISE appliances, you're ready to go, right? Wait just a moment! There are still a few things to consider:

#1 - Supplicant Configuration and dependencies on Active Directory- If you are using 802.1x, you will want to make sure your endpoints are configured correctly. If you just turn on 802.1x on the wired without doing anything to the endpoints, you're going to have a very bad day, my friend. When I do a deployment, I like to make this as transparent to the user as possible so I would get your server administrators involved to create a group policy to push out the supplicant configuration to the PCs. You can also push the SSID settings so they immediately jump on the right SSID seamlessly. Discuss with your admins on what kind of 802.1x settings you will be deploying (EAP-TLS, PEAP-MSCHAP, etc) and make sure the policy is deployed first before you start testing any enforcement.

For instructions on how to configure the certificate template, please read the following blog post: Server 2012 Configuration - Certificate Templates

For instructions on how to configure the Group Policy Object and make all this configuration completely seamless to the end user, please read the following blog post: Server 2012 Configuration - Group Policy Creation

And last but not least, I would hope that your enterprise has some sort of system to ensure that the corporate owned PCs are getting some sort of regular patching. 802.1x is hardly anything new but you might want to consider checking your drivers on the ancient PCs in your environment to ensure they are up-to-date and support 802.1x. If not, then make sure you update the drivers. I don't see this being a big show stopper since most organizations have some sort of lifecycle management for their corporate PCs but every once in awhile I come across some ancient corporate-owned PC that was somehow passed up for patching and it's the one endpoint with issues. Usually an update of the drivers fixes the issue.

If you are going the route of PassiveID instead, I would make sure that you are communicating with your server team, have the agent deployed on the domain controllers or the applicable WMI settings pushed. Test to make sure you are seeing logon events from the ISE Live Logs from PassiveID before proceeding.

#2 - Network Devices - While I would love to say that the 15 year old code on that ancient CatOS switch is going to work perfectly with ISE or any another NAC, it's probably not and let's not try it. I know I sound like I'm being silly here but these are things I've seen attempted to deploy in production before and I urge anyone who is about to deploy ISE to live and love the ISE Compatibility Matrix which gift wraps the supported platforms for your version of ISE, minimum code release and validated OS. My recommendation would be to standardize on validated OS or as close to it as possible so you are close to a version of code that the ISE business unit has fully tested and validated.

My system for determining what kind ofcode to use in an ISE deployment usually involves pulling up the ISE Compatibility Matrix and then comparing on the Cisco.com Downloads page for that platform. If there is a general TAC recommended version of code for that platform and it's close to the validated version in the compatibility matrix, that's what I will deploy in production. In the below picture, the Catalyst 3850 states that the minimum version of IOS to run ISE with is IOS-XE 3.3.5.E but the validate OS versino is 3.6.5E. Over on the Downloads page, IOS-XE 3.6.6E(MD) is the TAC recommendation and it's extremely close to the validated version. In this scenario, I would standardize all my IOS-XE 3850s on 3.6.6E(MD).This kept me from hitting any weird switch bugs in the past.

Another reason code isimportant because depending on the release or train of code, you might have certain features in the recommended code vs the non-recommended code. For example, some switches minimum version of code to support ISE is 12.2 train but device sensor isn't introduced until 15.x. Having to use SNMP on an older train of code adds to CPU cycles.

I would like to say that in any large enterprise, you usually have some standardization for code versions but I've been disappointed in the past. If you aren't, I would recommend to do so before ISE is being deployed to reduce any likelihood of issues with different network access devices. There are also tools out there such as Prime Infrastucture or Solarwinds which can give you your switch code versions and push upgrades for you. Another option is a tool called the ISE Deployment Assistant (IDA) which you can trial for 5 days which will reach out to your infrastructure to check the model numbers, code versions, etc to see if your network access devices are ready for ISE. If you would like to trial the ISE Deployment Assistant, you can download it here.

#3 - Standardize Your Network Access Device Configuration - It could be because there are vastly different trains of code in the environment because recommendation #2 was not followed or because someone started deleting configs to troubleshoot an unrelated issue but create a template for how you will configure all your switches and stick to it. I would even use my network management tools to make that they are staying compliant.

Note: It may not be possible to standardize every switch. If you have a mixed Cisco and non-Cisco environment for example, you might have to utilize SNMP on the non-Cisco switch and device sensor on the Cisco switches for profiling. I would still create standard templates for like groups of network access devices.

#4 - Know the hardware limitations of your network access devices - Depending onyour business requirementsand how granular you need to get, you might find yourself having certain hardware limitations if you go with a certain deployment strategy. For example, if you have a strict policy to ONLY allow access to certain domain controllers or servers on 50+ different ports and your policy is to define the 20 or so server IPs in that access list, you might find yourself running out of TCAM space with DACLs. If your business requirements are for greater east-west segmentation where simple VLAN segmentation won't cut it and you have TCAM limitations, you might have to consider a different implementation strategy utilizing Security Group Tags (SGTs). Knowing these limitations before you start actually deploying ISE is important because a mass redesign in the middle of a deployment is never pretty.

#5 - Start Profiling ASAP - Before any sort of enforcement, I recommend configuring the network access devices to send all the profiling information back to ISE. Configure and turn on RADIUS, DHCP, Active Directory, DNS, etc probes and get as much rich detail about the endpoints on your network as possible so you can start to group like endpoints together to craft your policy. Half of being able to create an effective policy is knowing what's on your network first.

#6 - Know your supplicants - No, I don't mean profiling. For example, if you have Macs or Linux machines in your enterprise, you should account for them. Unlike Windows PCs which can easily get their supplicants provisioned via Group Policy, certain corporate endpoints might need to be onboarded through something like the BYOD feature or simply have their configuration pushed through something like Casper to insure that you are providing scalable and consistent configurations across all your endpoints.

#7 - Know the limitations of your endpoints - A popular segmentation method is VLANs but one thing I seldom see taken into account is DHCP changes. With the AnyConnect posture module, you can have it trigger a DHCP release and renew on PC endpoints but for your non-PC devices that don't know that you just changed the VLAN on it, how will they know that they should issue a DHCP request? Phones should be an issue since it's only getting an IP on the voice VLAN and you can allow voice domain permissions but your other endpoints might have a problem. You can configure ISE to do a port bounce after successful authentication which should trigger DHCP on the endpoint but it might cause your PoE devices to have to reboot on initial connection. Food for thought and something to consider when designing your implementation.

#8 - Always Be Testing - ISE has the wonderful ability to place policies in monitor mode and the ability to migrate endpoints into enforcement mode on switch-by-switch or even port-by-port basis. With monitor mode, you have the ability to test your policies as if they were enforcing without disrupting a single endpoint. This gives you the ability to really fine tune your policies before you move to enfrocement. After testing, I would start with a pilot group of users for enforcement and based on the results, start slowly rolling it out to the rest of the environment.

#9 - Communicate with your end users - Whenever you restrict access to resources, it's bound to cause someone to notice. If ISE is deployed and you suddenly started blocking user's personal devices, they will notice rather quickly. Communication is key. Let them know you are going to be making changes and what the new policy they will be adhering to is. This helps keep expectations consistent and prevents as many calls to your helpdesk.

During and After the deployment

Now you've started to roll out ISE in enforcement mode. What do you do now? First I would recommend having a day 2 plan before you get there. Who's going to be supporting any issues if they arise? If you haven't properly planned this out, it will be you getting every call to the helpdesk for a mistyped password.

I would recommend creating a troubleshooting guide for the help desk to use to vet simple issues such as a mistyped password or misconfigured supplicant. Force them to use it before calling you so you are not stuck supporting a fat fingered password that is passed off as an ISE issue. Here is a good template I have used several times in the last few years:

Helpdesk Troubleshooting Guide

Another recommendation I would make is the thoroughly document your ISE deployment and diagram how the deployment is set up. Use Visio to create a diagram of how the individual ISE appliances are set up in the network for others to be able to troubleshoot when you are not around. Cisco is kind enough to also provide Visio stencils which can be downloaded here. I recommend doing both a physical diagram of how any equipment is stacked and a logical diagram of the design.

Another document I would recommend creating is an AS-build of your ISE deployment and policies. This document should detail the policies, the setup, architecture, DACLs, profiles, certificates, etc to let someone know how it was built and why it was built. This should be a deliverable as part of a successfully completed project and part of a hand-off to operations. Here is one template I've used in the last few years:

ISE AS-Build