Wednesday 17 January 2018

Spanning Tree interoperability: PVST+ and IEEE RSTP/MSTP

First post for a while — Happy 2018!

We're planning on reconfiguring the way we do Spanning Tree on our network at the moment: we don't use it on a large scale, having banished VLANs from our core, in favour of a fully-routed network with VLANs constrained to a single building/department (although there's a smidgeon of EoMPLS for the odd layer 2 point-to-point virtual circuit).

We do, however, want to use Spanning Tree on our PoP (Point-of-Presence) switches: these are our CPE (Customer Premises Equipment) layer 2 switches that we put on the border of institutional (department, college, etc.) networks.  They act as the demarcation point between our backbone and the institutional network and try to protect the next layer of the network (the distribution routers) from layer 2 issues within institutions (since as loops creating ARP or DHCP storms)

When we replaced the PoP switches back in 2008, I did some interoperability tests between the Cisco PoP equipment we were installing (Catalyst 3560G-24PS) and various other vendors that institutions may use (since they choose their own kit and pick a variety of different vendors and models).  Aside from Spanning Tree not really being geared up to work across administrative boundaries, there were questions over how (say) HP's implementation of IEEE-standard RSTP (Rapid Spanning Tree Protocol) would interoperate with Cisco's Rapid-PVST+ (that we use, and used to relied on, since we still had cross-site VLANs back then).

I did various tests and found it all didn't play well together and it all misbehaved in difficult/unpredictable ways, so ended up blocking it (via "spanning-tree bpdufilter", with "spanning-tree portfast trunk" to avoid long delays on ports going into the forwarding state) on the ports feeding institutions and having to say it was unsupported, leaving LACP or FlexLink ("switchport backup ...") as the only mechanisms for redundant links.  However, we do get the occasional loop in institutional networks (sometimes between two ports on the PoP) and I want to try and do a better job of detecting these and stopping them impacting the network further up.

Given all that, I got a new Cisco PoP switch (we selected various models from the Catalyst 3850 range) and tried hooking it up to various other bits of kit (Cisco and non-Cisco) in odd ways to see how it behaves together.

Cisco Rapid PVST+ to Rapid PVST+


OK - obvious one first: Cisco Rapid PVST+ to Cisco Rapid PVST+.  This works fine, as you'd expect: the spanning trees on each of the VLANs find each other and interconnect, ports come up quickly when bridges are linked together, redundancy/loops are detected and it all works well.

[Also, note here, when I say "Rapid PVST+", the probably all applies to (non-Rapid) PVST+, but I haven't done any testing with that and I don't think it's worth spending time on nowadays.  That said, although no-one should be using it, it is still the default in Cisco IOS!]

The PVST+ BPDUs differ from IEEE (STP/RSTP/MSTP) BPDUs in that they are of frame type SNAP (Subnetwork Access Protocol) vs a dedicated STP BPDU type (aside from a different destination multicast MAC address).  The SNAP type is then organisation "Cisco" and vendor private protocol ID of "PVST+").  Once you get into the actual frame, it looks much the same, except that it has an "Originating VLAN (PVID)" field, identifying the VLAN ID on which it's operating:

Wireshark packet capture of Rapid PVST+ BPDU
Cisco Rapid PVST+ BPDU - VLAN 1 trunk native
IEEE RSTP BPDU

If the VLAN is tagged on the port (i.e. trunk, non-native) then the PVST+ BPDU frame will also have an 802.1Q VLAN header for the appropriate VLAN:

Cisco Rapid PVST+ BPDU - VLAN 100 tagged
(There is nothing special about this being VLAN 100 here: if the VLAN is not the native/untagged VLAN, the 802.1Q header will still be included, even if VLAN 1.)

I suppose, if you do weird things like transport one VLAN over another, you might get the PVST+ PVID differing from the one in the 802.1Q header, but I don't have that situation and I can't be bothered to trying and force it.

There are a couple of gotchas, though...

PVID Inconsistent with Rapid PVST+


The PVID field in the PVST+ BPDU is what allows two switches with mismatched native VLANs configured on a port to detect that they've been incorrectly linked together and block the VLAN on that port with "PVID_Inc"[onsistent], and the VLAN on the port to block in a "BKN" (broken) state:

cat3850#show spanning-tree vlan 100

VLAN0100

  Spanning tree enabled protocol rstp
  Root ID    Priority    16484
             Address     c4b9.cd48.1980
             This bridge is the root
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec

  Bridge ID  Priority    16484  (priority 16384 sys-id-ext 100)

             Address     c4b9.cd48.1980
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec
             Aging Time  300 sec

Interface           Role Sts Cost      Prio.Nbr Type

------------------- ---- --- --------- -------- --------------------------------
Gi1/0/1             Desg BKN*4         128.1    P2p *PVID_Inc

This presents us with a problem, since some institutions have never configured their main data VLAN with the "correct" 802.1Q tag (i.e. the one we use on the backbone) and just use VLAN 1 (this goes back to when only we, on the backbone, needed VLANs — institutions didn't typically have more than one, so just used the default VLAN; now they have voice, wireless, Building Management, etc.).

I don't think there's a way to override this checking, on Cisco switches.  HP ProCurve switches running Rapid PVST+ have a "spanning-tree ignore-pvid-inconsistency" command but that only fixes the HP end.

If we want to interconnect these two VLANs, I think the only solution is to filter out the BPDUs on the link between the switches with bpdufilter:

interface GigabitEthernet1/0/1
 spanning-tree bpdufilter enable

... this only needs doing at one end (since it stops BPDUs being both sent and received), although this prevents Spanning Tree from doing anything useful on the port, as well, so we can't detect loops, etc.: the port will always forward (but will still go a listening/learning phase, taking 30s to bring up the VLAN on the port, even though it's effectively deaf to other switches).

Other vendors' Rapid PVST+


Some switch vendors support Cisco Rapid PVST+.  I've done some brief testing with HP ProCurve switches in Rapid PVST+ mode ("spanning-tree mode rapid-pvst") and they've worked fine, negotiating separate spanning trees for each of the VLANs working exactly as you'd expect a Cisco to.  The HP ProCurves also handle VLAN 1 in the same way as a Cisco, including sending the additional IEEE RSTP BPDUs, as well as the Rapid PVST+ BPDUs (see below).

When I last looked at Extreme XOS-based switches (e.g. Summit X450/X460), a few years ago, these also worked perfectly fine.  However, there was a note in the documentation about VLAN 1 not being handled correctly — I'm unsure exactly what that was, but it's probably related to the VLAN 1 dual-BPDU situation that Cisco support (see below).  We don't use them enough now (and don't use VLAN 1, either), so I haven't researched this further.

Cisco Rapid PVST+ to RSTP/MSTP


Ignoring VLAN 1 (see below!), mixing Cisco Rapid PVST+ and IEEE standard RSTP/MSTP results in the switches sending BPDUs that each other ignore — there are essentially two completely separate protocols operating independently.  However, the Rapid PVST+ BPDUs appear to just flow through the non-Rapid PVST+-aware bridge like any normal frame (since they're just sent as tagged frames on their VLANs) and will end up returning to the originating switch (or another Rapid PVST+-aware bridge) for processing (a bit like having an old-fashioned repeater!).

Obviously, the BPDUs from the IEEE bridge won't get through the Rapid PVST+ bridge network, unless the ports interconnecting them back to the IEEE bridge(s) have the VLAN presented untagged/native.

If there are multiple connections from the same Rapid PVST+ switch into a segment without a Rapid PVST+-aware bridge, the redundant connections will be treated as "Backup" ports in the Spanning Tree protocol, and blocked (similar to Alternate ports).

For example, if I take a Cisco Catalyst 3850 running Rapid PVST+ and an HP ProCurve 2920 running IEEE MSTP with three ports configured: port 1 has VLANs 100 and 200 tagged/trunk, 2 has 100 tagged/trunk and 3 has 200 tagged/trunk.  I then linked the ports to their equivalent number: 1 on the Cisco to 1 on the HP, 2-2 and 3-3.  I also set the spanning tree priority on the Cisco to 16384 for both VLANs 100 and 200.  I now get this on the Cisco:

cat3850#show spanning-tree vlan 100

VLAN0100

  Spanning tree enabled protocol rstp
  Root ID    Priority    16484
             Address     c4b9.cd48.1980
             This bridge is the root
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec

  Bridge ID  Priority    16484  (priority 16384 sys-id-ext 100)

             Address     c4b9.cd48.1980
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec
             Aging Time  300 sec

Interface           Role Sts Cost      Prio.Nbr Type

------------------- ---- --- --------- -------- --------------------------------
Gi1/0/1             Desg FWD 4         128.1    P2p 
Gi1/0/2             Back BLK 4         128.2    P2p 

cat3850#show spanning-tree vlan 200


VLAN0200

  Spanning tree enabled protocol rstp
  Root ID    Priority    16584
             Address     c4b9.cd48.1980
             This bridge is the root
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec

  Bridge ID  Priority    16584  (priority 16384 sys-id-ext 200)

             Address     c4b9.cd48.1980
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec
             Aging Time  300 sec

Interface           Role Sts Cost      Prio.Nbr Type

------------------- ---- --- --------- -------- --------------------------------
Gi1/0/1             Desg FWD 4         128.1    P2p 
Gi1/0/3             Back BLK 4         128.3    P2p 

This situation should work fine, although if the VLAN doesn't actually go into the same switch on the IEEE side but, instead, through a number of other switches, then you get a failure between the two halves connecting back to the Rapid PVST+ bridge, it will likely take several seconds to reconverge, since this will only be detected when the BPDUs fail to arrive on the backup port.

I did see some problems with this, however, with HP Comware switches (5500-EI, running version 3 of Comware in MSTP mode): sometimes the Rapid PVST+ BPDUs wouldn't get through and both ports would stay forwarding.  This makes me think that we shouldn't rely on this as a way to build redundant topologies but more as a way to try and detect loops.

Rapid PVST+ VLAN 1 special handling


VLAN 1 gets some special treatment with Rapid PVST+: if it is present on a port, not only do you get PVST+ BPDUs, with the PVID field and sent with a 802.1Q header, if presented tagged/non-native, you also get a regular IEEE RSTP BPDU.  The to packet captures shown first, above, were actually sent consecutively by the same switch, on the same port:

IEEE RSTP BPDU and Cisco Rapid PVST+ BPDU - VLAN 1, sent consecutively

The IEEE RSTP BPDU is always sent untagged/trunk-native (without an 802.1Q header), regardless of whether the VLAN itself is untagged/trunk-native or tagged/trunk on that port.

This BPDU allows interoperation with IEEE-standard, non-PVST+ switches, such as an HP ProCurve running MSTP (Multi Spanning Tree) or regular RSTP (non-PVST+ IEEE Rapid Spanning Tree) and for a redundant topology to be built.

However, this only works for VLAN 1 — the IEEE BPDUs will never be sent if VLAN 1 is not present, regardless of whether one is untagged/native.  In addition, if other VLANs are also present, the outcome of negotiation with a partner IEEE-standard bridge will only affect the forwarding status and spanning tree of VLAN 1.  For example, the following port on a Cisco, connected to a similarly configured HP ProCurve (in terms of VLANs), with the HP running MSTP and a priority of 8192, only reports a root bridge for VLAN 1:

cat3850#show running-config interface g1/0/1
interface GigabitEthernet1/0/1
 ! (default: switchport trunk native vlan 1)
 switchport trunk allowed vlan 1,100
 switchport mode trunk

cat3850#show spanning-tree interface g1/0/1


Vlan                Role Sts Cost      Prio.Nbr Type

------------------- ---- --- --------- -------- --------------------------------
VLAN0001            Root FWD 4         128.1    P2p 
VLAN0100            Desg FWD 4         128.1    P2p 

... VLAN 100 will also take 30s to go through listening/learning and drop into forwarding, as if there is no partner bridge.

In our case, we don't actually use VLAN 1 anywhere on the network: we treat it as a kind of dumping ground where things end up if we haven't configured them (such as an access port with a voice VLAN but no data VLAN — there's no useful service on it, it's just used because often there has to be a VLAN of some sort specified).

[Extreme switches are nice in this area as they're able to have ports which have no VLANs present on them, and not have any untagged/trunk-native VLAN.  We had a case recently where we wanted VLAN to be tagged/non-native on an HPE Aruba wireless access point, but could only do that by making an arbitrary other, unused VLAN the untagged/native one.]

Second connection between Rapid PVST+ and IEEE RSTP/MSTP bridges with VLAN 1


If we plug in a second connection between the same two switches, using two ports configured in the same way as described above (VLAN 1 untagged/native, VLAN 100 tagged/trunk, Spanning Tree priority on IEEE bridge of 8192), VLAN 1 forms a spanning tree, but VLAN 100 just flows the BPDUs through the HP ProCurve and the Cisco sees it as a link into the same multiaccess segment and blocks it as a Backup role port:

cat3850#show spanning-tree vlan 1  

VLAN0001

  Spanning tree enabled protocol rstp
  Root ID    Priority    8192
             Address     d4c9.efb6.a680
             Cost        4
             Port        1 (GigabitEthernet1/0/1)
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec

  Bridge ID  Priority    32769  (priority 32768 sys-id-ext 1)

             Address     c4b9.cd48.1980
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec
             Aging Time  300 sec

Interface           Role Sts Cost      Prio.Nbr Type

------------------- ---- --- --------- -------- --------------------------------
Gi1/0/1             Root FWD 4         128.1    P2p 
Gi1/0/2             Altn BLK 4         128.2    P2p 

cat3850#show spanning-tree vlan 100

VLAN0100

  Spanning tree enabled protocol rstp
  Root ID    Priority    32868
             Address     c4b9.cd48.1980
             This bridge is the root
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec

  Bridge ID  Priority    24676  (priority 24576 sys-id-ext 100)

             Address     c4b9.cd48.1980
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec
             Aging Time  300 sec

Interface           Role Sts Cost      Prio.Nbr Type

------------------- ---- --- --------- -------- --------------------------------
Gi1/0/1             Desg FWD 4         128.1    P2p 
Gi1/0/2             Back BLK 4         128.2    P2p

The HP forwards on all ports (since it's the root):

hp2920# show spanning-tree 

 Multiple Spanning Tree (MST) Information

  STP Enabled   : Yes
  Force Version : MSTP-operation
  IST Mapped VLANs : 1-4094
  Switch MAC Address : d4c9ef-b6a680
  Switch Priority    : 8192 
  Max Age  : 20
  Max Hops : 20   
  Forward Delay : 15

  Topology Change Count  : 6           
  Time Since Last Change : 11 secs     

  CST Root MAC Address : d4c9ef-b6a680
  CST Root Priority    : 8192        
  CST Root Path Cost   : 0           
  CST Root Port        : This switch is root

  IST Regional Root MAC Address : d4c9ef-b6a680
  IST Regional Root Priority    : 8192        
  IST Regional Root Path Cost   : 0           
  IST Remaining Hops            : 20          

  Root Guard Ports     : 
  Loop Guard Ports     : 
  TCN Guard Ports      : 
  BPDU Protected Ports :                                         
  BPDU Filtered Ports  :                                         
  PVST Protected Ports :                                         
  PVST Filtered Ports  :                                         

  Root Inconsistent Ports  :             
  Loop Inconsistent Ports  :             

                  |           Prio              | Designated    Hello         
  Port  Type      | Cost      rity State        | Bridge        Time PtP Edge
  ----- --------- + --------- ---- ------------ + ------------- ---- --- ----
  1     100/1000T | 20000     128  Forwarding   | d4c9ef-b6a680 2    Yes No  
  2     100/1000T | 20000     128  Forwarding   | d4c9ef-b6a680 2    Yes No  

This situation should work fine, I think, albeit a little confusing.

Now, if we set the VLAN 1 priority on the Cisco Rapid PVST+ switch to 4096 (lower/better than the HP ProCurve running IEEE MSTP), the root bridge moves over to the Cisco:

cat3850#show spanning-tree vlan 1 

VLAN0001
  Spanning tree enabled protocol rstp
  Root ID    Priority    4097
             Address     c4b9.cd48.1980
             This bridge is the root
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec

  Bridge ID  Priority    4097   (priority 4096 sys-id-ext 1)
             Address     c4b9.cd48.1980
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec
             Aging Time  300 sec

Interface           Role Sts Cost      Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/1             Desg FWD 4         128.1    P2p 
Gi1/0/2             Desg FWD 4         128.2    P2p 

cat3850#show spanning-tree vlan 100

VLAN0100
  Spanning tree enabled protocol rstp
  Root ID    Priority    32868
             Address     c4b9.cd48.1980
             This bridge is the root
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec

  Bridge ID  Priority    32868  (priority 32768 sys-id-ext 100)
             Address     c4b9.cd48.1980
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec
             Aging Time  300 sec

Interface           Role Sts Cost      Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi1/0/1             Desg FWD 4         128.1    P2p 
Gi1/0/2             Desg FWD 4         128.2    P2p 

However, the HP treats both ports without regards to VLANs and so blocks one of them:

hp2920# show spanning-tree

 Multiple Spanning Tree (MST) Information


  STP Enabled   : Yes

  Force Version : MSTP-operation
  IST Mapped VLANs : 1-4094
  Switch MAC Address : d4c9ef-b6a680
  Switch Priority    : 8192 
  Max Age  : 20
  Max Hops : 20   
  Forward Delay : 15

  Topology Change Count  : 4           

  Time Since Last Change : 10 mins     

  CST Root MAC Address : c4b9cd-481980

  CST Root Priority    : 4097        
  CST Root Path Cost   : 20000       
  CST Root Port        : 1                  

  ...


                  |           Prio              | Designated    Hello         

  Port  Type      | Cost      rity State        | Bridge        Time PtP Edge
  ----- --------- + --------- ---- ------------ + ------------- ---- --- ----
  1     100/1000T | 20000     128  Forwarding   | c4b9cd-481980 2    Yes No  
  2     100/1000T | 20000     128  Blocking     | c4b9cd-481980 2    Yes No  

This situation could create some confusion, if there were different VLANs configured on each of the ports, but with VLAN 1 present: only one port would forward and the others would block, potentially breaking some of the other VLANs.

Joining two PVST+ VLANs in the IEEE STP bridge


This one allows you to get in a bit of a muddle!  Here's an example: a Rapid PVST+ switch with port 1 having VLAN 100 untagged/native and port 2 having VLAN 200 untagged/native is connected to an IEEE Spanning Tree bridge with both ports in the same VLAN untagged/native (some VLAN ID as one of the ones on the PVST+ bridge, or different - it doesn't matter), e.g.:

Cisco Rapid PVST+ bridge (assuming the VLANs are already created):

interface Gi1/0/1
 switchport mode trunk
 switchport trunk allowed vlan 100
 switchport trunk native vlan 100
!
interface Gi1/0/2
 switchport mode trunk
 switchport trunk allowed vlan 200
 switchport trunk native vlan 200

HP ProCurve (isn't this side easy!):

vlan 50 untag 1,2

The corresponding ports are then connected together (1-1, 2-2).

Once connected, the Rapid PVST+ BPDUs for the two different VLANs are interconnected through VLAN 50 in the IEEE STP bridge and find their way back to the Cisco switch, where you get a "PVID_Inc" (VLAN ID inconsistent) and both ports go into the "BKN" (broken) state and block, as described above, breaking both VLANs!

It makes no difference whether the IEEE STP bridge is running Spanning Tree or not, since the IEEE STP BPDUs pass by the Rapid PVST+ BPDUs and don't interact.  The only change in behaviour is that the broken state won't be detected until the ports on the IEEE bridge move into the Forwarding state (which will take 30s normally, or 4s if the HPs "auto-edge" mode is enabled), meaning the problem will initially not be noticed (unless there are other effects) and probably things will be OK for 30s (on the port that was connected first, at least), after which both ports will move into the Broken state and block, when the Rapid PVST+ BPDUs get bridged between the ports/VLANs on the Cisco.

This situation could confusing because both VLANs will break, when this situation occurs — perhaps by someone accidentally connecting a data VLAN to a voice VLAN.  Without spanning tree, both VLANs may continue functioning, to a certain extent.  However, this is probably a good thing as the problem is detected immediately, rather than weird things going on until the root cause is determined.

(One thing we do is have different HSRP group numbers for each SVI, meaning that the MAC addresses and other messages don't clash, when two VLANs are connected together by accident.  This also helps us spot what's happened, because the virtual MAC address of [say] the voice VLAN will appear on the data VLAN.)

Cisco Rapid PVST+ to No Spanning Tree


Last but not least, what about if there is no Spanning Tree running on the partner bridge?

There is obviously a distinction between it gobbling up the BPDUs (perhaps by running bpdufilter) and one which just lets the BPDUs flow through without processing with them, treating them like any other traffic.

HP ProCurve switches have spanning tree turned off by default and require it to be turned on with "spanning-tree" (that's it — no arguments!) command.  When turned on, it defaults of IEEE MSTP (at least on a ProCurve 2920; other switches may vary, especially the older ones, but HP have been pushing MSTP for a long while, as they're fairly keen on IEEE standards, even when they're not particularly great!).

Without Spanning Tree enabled, the situation, at least with HP ProCurve, is that the BPDUs flow through like normal traffic, creating a situation the same as the "Rapid PVST+ and RSTP/MSTP" interaction, above.

For Cisco Catalyst switches: if spanning tree is explicitly disabled on a VLAN with "no spanning-tree vlan ...", the switch will not gobble PVST+ BPDUs but allow them to flow through and the bridges on ports into that VLAN will see each other, in terms of Spanning Tree.  However, this doesn't appear to be true for IEEE STP BPDUs: they do seem to be filtered, even if a VLAN is presented untagged/native.