SD-Access without DNAC and ISE

In 2019, we purchased a set of hardware and licenses to deploy Cisco SD‑Access on several thousand ports to replace our legacy infrastructure while delivering the many technology benefits that SD‑Access provides.

SD-Access in a nutshell

For those who have not delved deeply into the details of how SD-Access works, I’ll tell you about its main components:

  • L3-underlay with dynamic ISIS routing within the network and BGP routing outward from L3 overlay networks (VRF);

  • Support for L2 and L3 overlay networks with BUM multicast replication;

  • Control-plain and data-plain are based on LISP;

  • Support for 802.1x and Identitybased micro-segmentation based on TrustSec technology;

  • WiFi network management (but we did not use this function, and I will not write about it further).

The combination of all these properties makes it possible to build reliable, secure and relatively easy-to-use networks.

So, for example, an entire office network can be built on the basis of one large IP network for several thousand hosts. During the 802.1x authorization process, each domain user will be placed into one of the network security groups based on those users' AD group memberships. These user groups are assigned access policies externally, within these groups and/or between groups. Moreover, when using the right candidates, a specific PC at a specific point in time can be placed, for example, in the group “office PC to which no one is currently logged in,” and such a PC, for example, will only be able to go to the update server and will not be able to access to the Internet. When, for example, an accountant logs into this PC (directly or via RDP), the PC will immediately be switched to another group “accountants”, which, for example, will be able to connect to 1C and will not be able to communicate with its neighbors in the group.

Those connections that do not support 802.1x, using MAB and profiling, will fall into unprivileged groups, such as “phones” – which can only communicate with the telephone exchange, “printers” – which can only send something for printing, etc. .d. If the device has not been profiled in any way, then it can simply be quarantined.

Moreover, as we have already said, all these groups, including the “quarantine”, can have addresses from our single IP network, but at the same time, the “quarantine”, due to micro-segmentation, will indeed not be able to communicate with anyone, including other quarantine machines , but will be able to interact, for example, with an antivirus.

If you use Cisco AnyConnect NAM as a supplicant, the group will also change during remote login, for example, via RDP. In remote settings this is very convenient.

To deploy, maintain and operate the SD-Access network, according to the manufacturer, a cluster of DNAC servers and a cluster of ISE servers are required, but more on that later.

LISP in a nutshell

LISP, as a technology for building overlay networks, deserves special words. Unlike its most common competitor, VXLAN/BGP/EVPN, LISP was originally developed as a technology to address the lack of scalability and limited address space of IPv4. Besides the built-in host mobility, its main advantage is that the routers participating in the LISP infrastructure do not necessarily have a complete view of all end devices connected to the network. As necessary, routers can request routing information about hosts from special MS/MR services and cache it, or register locally connected hosts with the same service. Pretty much the same way DDNS works. This allows you to build networks of almost unlimited scale from routers with fairly modest resources.

TrustSec in a nutshell

The main idea of ​​TrustSec is to mark all packets at the entrance to the network with some integer SGT (Security Group Tag) label. In fact, the tag is assigned to the source of the packets and is therefore sometimes called the Source Group Tag. In this way, we can significantly reduce and simplify the security policies of our network, since now, instead of a large set of all IP addresses of our network, we can build policies based on a limited, reasonable set of groups with dynamic membership.

The label can be assigned to the source: based on the port or VLAN through which the packet entered our network, based on the IP address of its source, or, most interestingly, based on the results of user authorization using the 802.1x protocol

The SGT tag, as we have already said, is placed at the entrance to the network and is transmitted along with the packet inside the Ethernet or LISP header. For cases where encapsulation does not allow the label to be transmitted inside the packet, a special SXP protocol is used, running on top of TCP. SXP essentially transmits and maintains IP-SGT mapping tables to intermediate devices that need this data, for example, a FireWall.

Result of SD-Access implementation

After a long and painstaking implementation/migration to SD-Access, we actually got our “dream network”)): L3-underlay! The time and cost of connecting new users has been significantly reduced. The “transparency” of connections has increased. Now we know with greater confidence Who, Where and When we are connected. The security of the office network has been improved. Now we have less chance of horizontal spread of possible malware and it is much more difficult to gain the privileges of another user by simply copying his MAC and plugging into his port in his absence. Well, and a lot of other advantages…

But happiness, as usual, does not last long).

Support and licenses are over….

What to do without support and licenses?

Let's first list what components the SD-Access network is initially built from and what licenses are needed for this.

  • Switches with Cisco Catalyst that support SD-Access. The required LISP and TrustSec functionality is licensed there on an RTU basis and is present in the CLI by default. The DNA license is installed on the switches, but does not actually affect their functionality. The presence of this license is checked by the DNAC server and, if available, it takes control of the device, allowing you to manage the full range of functions through yourself. But if this license does not exist, then no one forbids us from setting up all the same functions manually via the CLI. DNA licenses are issued for a specific period or a permanent PLR license can be made through Smart-Account Reservation.

  • Cluster of three DNAC servers. The main task of these servers is to configure switches (including ZTP), collect statistics and monitor the state of the network. In fact, it doesn’t do anything fancy, but without a cluster of these servers, the manufacturer does not support SD-Access functions on switches. (“Hmm, but now we don’t have this support anyway…”) To tell the truth, when we had support for these servers, we didn’t really like them then. A couple of times they broke down for us, and Cisco engineers literally spent weeks restoring them, and all this time we could not manage the network. In fact, they could not add new overlay networks.

  • Deploy (cluster) of ISE servers. ISE performs several tasks at once within the SD-Access infrastructure:

    1) AAA server as part of the 802.1x implementation;

    2) profiling of devices connecting to the network;

    3) Pasture – checking hosts for compliance with policies, for example, for the presence of an antivirus and its updates;

    4) SXP hub, distributing IP-SXP mapping to all interested devices;

    5) Management of TrustSec policies and their distribution.

    Just like with the DNAC server, we were not delighted with the reliability of the ISE servers; they periodically broke down, causing us a lot of problems.

    ISE servers require licenses for the number of connections they support, separately for the number of devices they profile, and separately for Posture functions.

As a result, without support and licenses, we actually have to lose our SD-Access network (((, but let's see what options we have to preserve all the functions we need.

Of course, many people know that all the necessary licenses have long been broken, both PAK based flexlm, and PLR of the first and second versions. (Probably only PLRv3 is not broken yet, but it is not used anywhere yet.) But in any case, broken licenses are not our way. These licensed products are so unreliable that you don’t want to contact them even with support, and even more so without it.

So, what do we need to keep our SD-Access network, but at the same time abandon the complex, licensed and unreliable DNAC and ISE products,

  1. The first and most important thing we can’t do without is the Cisco Catalyst switches themselves, they are excellent in themselves, and we won’t do anything with them. All the necessary functions are present on them, and if desired, you can find all the necessary documentation, information about bugs, updates and spare parts.

  2. A function for quite complex configuration of LISP on devices (performed by DNAC).

  3. Authorization and Accounting Authentication function for 802.1x, MAB and profiling (performed by ISE).

  4. The function of distributing information about the current IP mapping to SGT via the SXP protocol (performed by ISE).

  5. Function of centralized management and distribution of TrustSec policies (performed by ISE and/or DNAC).

Replacing the configuration function

Everything here is quite simple and prosaic. We have implemented centralized creation of reference configurations for switches based on jinja2 and ansible templates, plus our own development of pre-calculation of incremental updates from the created reference configurations and rolling them out according to a schedule. This allows us to pre-check increments before rolling and apply updates at a time convenient for us. However, given that rolling a configuration to switches is an idempotent operation, the calculation of increments can be neglected.

AAA Feature Replacement for 802.1x

This task was successfully transferred to freeradius. All that is required of it is to authenticate the user in the domain using MSCHAP2. If authentication is successful, the LDAP server will take the new user's AD group membership and, based on this data, prepare the correct set of radius attributes for the new connection.

Freeradius has a fairly convenient and flexible unlang language, which allows you to implement flexible logic for assigning radius attributes similar to the one described in policy-set in ISE.

Here is an example of a piece of configuration on unlang that assigns VLAN, VRF (VN) and SGT for 802.1x connections

update reply { 
	Tunnel-Medium-Type := IEEE-802
	Tunnel-Type := VLAN
	Tunnel-Private-Group-ID := 1040 
	cisco-avpair += cts:vn=MAIN_VRF
}
if ( LDAP-Group == "grp_developers" ) {
	update reply {
		cisco-avpair += cts:security-group-tag=0101-00
	}
}
elsif ( LDAP-Group == "grp_hr" ) {
	update reply {
		cisco-avpair += cts:security-group-tag=0102-00
	}
}

Here, everyone is first assigned VLAN:1040 in the standard Tunnel-Private-Group-ID attribute and the name VRF:MAIN_VRF is specified in the vendor attribute cisco-avpair/cts:vn.

Next, based on an analysis of the user’s membership in a particular AD group, an SGT tag is assigned in the cisco-avpair/cts:security-group-tag attribute.

Accordingly, everyone who is a member of the development group will receive label 101 and all network access rights corresponding to this group; and everyone who is not a developer and is in the HR group will receive label 102.

The suffix “-00” at the end of a line with a label is the serial number of the configuration of this label, and it can be specified everywhere equally as “-00”.

As you can see, the policies are quite simple and clear.

Replacing the profiling function

Profiling can also be entrusted to freeradius, but it will need a switch as an assistant.

The “Device Sensor” functionality built into IOS XE profiles connected devices quite well and sends the recognized type of connected device to the freeradius server in radius-accounting packages.

The algorithm for connecting devices that do not support 802.1x can be as follows:

  1. When connecting a new device that does not support 802.1x, the switch creates an authentication session, assigns it a unique number and sends the first radius request with the access-request type. This request contains all the information that is currently known to the switch: a unique session number, the name of the switch and its port number, and the MAC address of the device is indicated in the user name field. If the IP address is already known at that time, then it is also included in the request;

  2. Having received such a request, freeradius can generate a positive access-accept response, but place a quarantine label in the attributes. If, based on the MAC address or switch port number, we can immediately issue a target label, for example, a “printer” label, then this can be done already at this step, but in most cases at this stage we still do not have enough data, and the “quarantine” label will be a good decision;

  3. The switch, having received a response from the server, will quarantine the new device and allow it to begin “communicating” with the network in a controlled manner. At this stage, the device sensor begins to work and recognize the device type based on its OUI, data from DHCP, LLDP and CDP;

  4. When the device sensor function detects the type of connected device, the switch will send new data to the freeradius server in the accounting-request packet;

  5. Based on the new device data, freeradius can assign a target label and send a request to the COA switch to change the attributes of an existing connection;

  6. The switch, upon receiving the COA, will move the device from quarantine to the target group.

Here is an example of processing an accounting-request and sending a COA:

if ( &Acct-Status-Type == "Stop" ) {
    return
}

update request {
     &Tmp-String-0 := "NONE"
     &Tmp-String-1 := "NONE"
}

foreach Cisco-AVPair {
    if ( "%{Foreach-Variable-0}" =~ /^cts:security-group-tag=([0-9a-f]{4})/i ) {
        update request {
             &Tmp-String-0 := "%{1}"
        }
    }
    if ( "%{Foreach-Variable-0}" =~ /^dc-profile-name=(.+)$/ ) {
        update request {
            &Tmp-String-1 := "%{1}"
        }
    }
}

if ( "%{Tmp-String-1}" =~ /ip-phone/i && &Tmp-String-0 != "0076" ) {
    #	  
    #	IP Phone
    #
    update coa {
                Calling-Station-ID = "%{Calling-Station-ID}"
                NAS-IP-Address = "%{NAS-IP-Address}"
                Packet-Dst-IP-Address =	"%{NAS-IP-Address}"
                Packet-Dst-Port = 1300
                cisco-avpair += cts:vn=VOICE_VRF
                cisco-avpair += cts:security-group-tag=0076-00
        }
}

First we check the status of the received accounting-request. If it is equal to “Stop”, then this means that the session is completed, the device has disconnected from the network, and we simply stop further analysis of this request.

Next, we initialize two temporary variables Tmp-String-0 and Tmp-String-1 (these are predefined names that are already described in freeradius, if you want to create variables with other names, then first look at how these are described).

In these temporary variables we place the name of the profile from the dc-profile-name attribute, which should contain the result of the device sensor, and the current tag of this session from the cts:security-group-tag attribute.

Next, we check if the profile recognized by the switch has the value “ip-phone” and the current label does not correspond to this profile, for example, there is a quarantine label, then we send a COA with new attributes for this connection.

Replacing the SXP hub function

In the vendor version, this function is performed by ISE. He “knows” who he gave which mark to. All static mappings are also defined on it. Accordingly, he acts as a golden source of this information. All firewalls can connect to it and retrieve the necessary information. Freeradius, unfortunately, cannot do this, and here we will have to resort to an undocumented trick.

In the SXP protocol, a device has two roles: listener and speaker, which can either coexist together on one device or live separately. In the vendor version, access switches do not participate in SXP exchange at all, but if you configure an SXP speaker on an access switch, it will provide IP-SGT compliance for those devices that are directly connected to it.

The problem is that the SXP table and sessions are tied to the VRF in which the corresponding devices live. Those. if all our end device connections live in MAIN_VRF, then SXP sessions must be built within this VRF, but SD-Access is designed in such a way that only default VRF lives on uplinks, and it is not possible to build a session.

NHRP comes to our aid, which allows us to build tunnels from the VRFs we need inside the default vrf and inside these tunnels to raise SXP sessions from all access switches, for example, to two aggregation switches, which will now also take on the role of SXP listener & speaker, i.e. .e. will essentially become SXP hubs.

The only inconvenience is that if we have several VRFs, then NHRP tunnels and SXP sessions will have to be configured for each VRF separately.

Replacing the TrustSec centralized policy management function

In the vendor version, this function is also performed by ISE. Policies can be changed through the GUI DNAC, but they are still stored on the ISE and distributed from it, usually in the radius attributes, but they can also be used through rest requests. You can read more Here

trustsec policies consist of three interrelated parts.

  1. Environment data. Here we store a list of SGT tags used in our network with their mnemonic names and parameters of all policy servers that can distribute policies in our network.

  2. List of access sheetswhich in this case are called SGACL.

  3. Access Matrix. A square table, in the rows and columns of which the Source SGT and Destination SGT are located, and in the cells there is a list of the corresponding SGACLs.

REST requests are quite primitive.

curl --user-agent "cisco-IOS" -u "User:Password" -k\
     -H "Content-Type: application/json" \
     -H "Connection: Keep-Alive" \
     -H 'Accept:' \
     -d '{"CTSEnvData":{"deviceName":"Switch","sgTablesFilter":[],"capability":["ENV_DATA_BASIC","SG_TABLES","SGTAGS","SERVERS"]}}' \
     -X POST https://1.1.1.1:9063/ers/config/ctsapi/ctsenvdata | json_xs

curl --user-agent "cisco-IOS" -u "User:Password" -k\
     -H "Content-Type: application/json" \
     -H "Connection: Keep-Alive" \
     -H 'Accept:' \
     -d '{"CtsMatrix":{"dstSgtArr":["ffff"],"srcSgtArr":[],"offset":"0","limit":"500"}}' \
     -X POST https://1.1.1.1:9063/ers/config/ctsapi/ctsmatrix | json_xs

curl --user-agent "cisco-IOS" -u "User:Password" -k\
     -H "Content-Type: application/json" \
     -H "Connection: Keep-Alive" \
     -H 'Accept:' \
     -d '{"SGACLs":{"names":[]}}' \
     -X POST https://1.1.1.1:9063/ers/config/ctsapi/sgacls | json_xs

We can use the existing ISE server to study the request format, and then repeat this functionality on our own simple server.

Thus, we reproduced all the functionality we needed and completely got rid of the DNAC and ISE servers in our network. At the same time, we now have key network management functions in our hands, and are no longer dependent on vendor support engineers and secret knowledge of the internal structure of ISE and DNAC.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *