Routing, Addressing, and Subneting - Oh My!

iamkarlp · October 21, 2016, 3:54am

Hey Guys! OneSeventeen asked a seemingly simple question about how to route a subnet through a new Layer3 core switch he got. While he got some great advice on his actual technical question over here: Original Thread it occurred to me upon reflection that this is a question that I have answered many times before.

Specifically, I came to realize that not many people understand the long term effects of subnet layout in the event you need to move to a multi-routed internal network. For some this will be old information, for others this may be extraneous information, but I hope this post will give at least a few people enough information to help them either better prepare for their organizations future, or at least understand when they need to call in additional help.

I’lll admit right up front that there will be errors in this document. I have spent more than a few hours putting together my thoughts, but certainly not enough for this to be error free, muchless considered an authoritative post on this subject.

Additionally, I know some of my terminology here might not be entirely standard. Even I don’t agree with everything I have written here, but I am trying to make this both digestible and approachable.

Finally, I am a people - and an occasionally messy one. I make mistakes. I am certain I have done so within. If you see something that needs correction - please say so!

###With that said… Let’s go!

##Prerequisites

Before we get started - We need to make sure we can communicate effectively. To that end here’re some terminology you will see often in my post:

###Layers:
There are 7 layers in the OSI model. We are going to be discussing Layers 2 and 3 for the purposes of this conversation.

A simplified chart explaining these 7 layers can be found here: Simplified OSI Model
A digestible video on the topic, should you decide you want more, can be found here: The OSI Model Demystified

TL;DR Version:

Layer 1 is the physical layer - this is where your wires are, your cable plant.
Layer 2 is the data link layer - this is the layer that switching happens within. Two devices which sit on the same Layer 2 data link could theoretically get information between one another.
Layer 3 is the network layer - this is the layer that routing and networking as is commonly considered happens within. Two devices which sit on the same Layer 3 network can communicate with one another. Routers connect multiple L3 networks together in such a way as to be able to move information between them.
Each successive layer builds on and depends upon the layer under it.

###Terms
Network - Context specific term. If talking about Layer 1, it means the structured cabling system. If Layer 2 it means the switches, circuits and vlans. If layer 3 it means subnets and broadcast domains.
Subnet - A layer 3 network of specific size and scope. Important to note that subnets can roll up or condense, that is to say one large subnet may include multiple smaller subnets: more on this later.
vLan - An isolated layer 2 network, potentially running over the same Layer 1 network as other layer 2 isolated vLans/networks
Netmasks & CIDRs & /xx - All ways to express the size and scope of a subnet.
L1 & L2 & L3 - Short hand for Layer 1/2/3
ACL - Access Control List, For the purposes of this post, I will use ACL to refer to controlling access in a broad-spectrum sense. IE: I will use an ACL to block this subnet from ever talking to that subnet, no ifs ands or buts. ACLs are computationally easy and have relatively minor overhead.
GAC - Granular Access Control, Think of this like fine grained ACL. I don’t want everything on this subnet to be accessible to everything on that subnet, but i do want services X, Y, and Z, to be accessible. GACs are computationally intensive, and have relatively high overhead.
TCF - Traffic Control Functions, Abstract. General bucket that includes everything from Intrusion Prevention Services to bandwidth shaping and limiting.
NAT - Network Address Translation, A function provided by certain gateways which allow them to rewrite one IP address space into another. While used in a wide variety of roles, in this post we will be restricting our usage to its most common use as that of a service that allows many interfaces on a private network to use relatively few interfaces on a public network for the purposes of providing access to that network. (eg: using single public IP address to serve the internet connectivity needs of 100 private network hosts.)
RA - Route Advertisements, A function provided by most routers that allows them to communicate to other routers what L3 subnets they have access to in a standardized fashion.
Route - A specific path between two or more subnets.
Routing The act of exchanging information between 2 or more subnets
Router - A device or service which uses individual Routes to accomplish Routing. While a router typically does not look at the content of the information it is relaying, it is possible to use ACLs in specific cases.
Firewall - A device or service which may or may not be configured to act as a router, but does provide Granular Access Control, as well as additional network TCF.
Gateway - Whether a router or firewall, this is the relay point to get information in or out of a specific subnet
AIO-FR All-In-One Firewall Router, a commonly used device providing a full range of ACL,GAC,TCP,NAT,RA, & Routing Services. You may know it as a sonic wall, an ASA, or even a TP-Link.
Termination - The ending point of a network. This might be “nowhere” if we don’t want a network to be able to communicate with other networks, but it will typically be the gateway of that subnet.
Interface - Context specific term. In layer 2 context it refers to the physical port under discussion. (eg: the 3rd port of the second switch in that stack) In layer 3 context it refers to the IP address(es) of a specific system as they are exposed to a specific L3 network.

###With that out of the way… Let’s begin!

#Introduction

Contrary to popular belief, most people who “do” networking didn’t go to school for it, rather they fulfilled a need to accomplish a goal. Thus, rather than starting with a long introduction to the finer points of network design theory, they started by picking up what they had available and making it work.

Typically this looks like a collection of switches ontop of which rests a single L3 network. These switches eventually connect back to a AIO-FR device which acts as the gateway to a larger world (typically the internet).

###So you are knocking all of us who didn’t go to networking school?
Not at all! This is how I myself started - but unless further training and knowledge is learned, as your environment scales out you will quickly find yourself with a few opportunities that you are ill-equipped to address.

Some of these potentional opportunities include running out of available IP addresses, needing to reduce your broadcast domain for performance reasons, multicast scope limiting or evolving access control and isolation requirements.

That is quite a list.

Indeed it is, but somewhat amusingly, regardless of what your opportunity is, the answer typically involves creating more L3 subnets, usually (but not always) on top of L2 networks which are dedicated to holding that L3 subnet.

But creating more L3 subnets instantly puts you in a position where you will need to manage the interactions between those networks…

Right, so how do we do this?

#Terminating L3 subnets
As a consequence of the common way that most people learned networking, the most common way to see a L3 subnet terminated would be adding a corresponding L3 interface onto their existing AIO-FR. Doing this has a variety of consequences though. Both positive and negative.

####Positive

It is easy - Typically there will be no changes to any existing configurations
It is inexpensive - Whether you are using a virtualized L2 network or a new physical L2 network, by keeping your AIO-FR the same and preserving existing configurations, the cost to implement will be relatively minimal.
Centralization - Whether we want to talk about Policies, Layout Conceptualization, Troubleshooting or Management - With everything happening in one place, it’s easier to know what is going on.

####Negative

Scaling Constraints - No matter how large the AIO-FR, there will always be a point at which it will no longer be sized to handle the aggregated traffic flow of an entire network.
Bandwidth Limitation - By functioning as the sole router in your organization, using an AIO-FR as the termination point effectively throttles the entire network to the aggregate performance of your AIO-FR, whether that is the hardware capacity itself, or the capacity of the physical interfaces it uses to connect to the network.

###Ok, that doesn’t sound great, so what other ways are there?

Well, I’m glad you asked!

Conceptually, there are lots of ways to connect L3 subnets together. But for the sake of this post, let’s distill it down to the most common ways, which conveniently sit within two separate classes.

Out-Of-Plane Router - This is routing which happens outside of a switch. You may commonly know this as using a sonicwall, a fortinet, or an ASA that is attached to physical network ports somewhere to enable connectivity to your various networks. This type of router is typically focused on features, providing routing functions within or alongside a device which these days typically also provides firewall and NAT functionality, thus providing a complete ACL/GAC/NAT/RA/TCF package. Essentially the AIO-FR devices I have mentioned above.
In-Plane Router - This is routing which happens inside of a switch itself. You may commonly know this as a “Layer 3” Switch, and it is relatively common in advanced offerings from the likes of Cisco, HPE(Aruba), Brocade, Netgear, and many more. This type of router is typically focused on speed, often times eschewing traditional AIO-FR features such as NAT, GAC and TCF.

So, to summarize: using an Out-Of-Plane router is almost always needed due to demands for GAC/NAT/TCF features. However In-Plane Routing, as a virtue of being built into the switch fabric itself, can bring higher performance to the table, both in terms of bandwidth as well as reduced latency.

###So you are saying there isn’t a single answer?
Correct.

###So which do I choose?
When you are relatively small, there is nothing wrong with using a well-specified AIO-FR as the sole gateway for your various networks. Feel free to add some more L3 subnets and L2 vLans - all is well.

But as you begin to grow to the point where the physical connections prove to be a network bottleneck, or you are going to needlessly oversize an AIO-FR just so you can shuttle information around your internal network - STOP - it’s time for a new approach.

###How do I know it is time for a new approach?
Honestly, it is different for everyone. I have seen organizations as small as 400~500 devices need to use both in and out of band routing. Conversely I have seen organizations of 1500+ devices happily sitting on a single AIO-FR, with room to grow. It all depends on your unique circumstances.

The important thing is to know your network, treat reports of poor performance with respect and investigate. Employ at least a rudimentary level of reporting so you can visualize at some level what is happening on your network. Think through the impacts of new applications and services. Never be scared to ask for a 2nd (or 3rd) opinion.

###Well that makes sense, Thankfully we aren’t yet at a place where I need to worry about…
Hold up. We aren’t done yet. Yes - while it is true you may not be in a position where you need to worry about leveraging both In-Band and Out-Of-Band routing side by side, decisions that you make today will directly affect the ease which you will be able to transition to this hybrid routing model down the road should the need arise.

#Subneting

Remember way up at the top of the page under terms where I said that one subnet can include within itself multiple smaller subnets?

Well, that might be the single most important thing to learn in this whole post. If you don’t currently understand how a subnet can include multiple smaller subnets, go stare at this chart for awhile and come back and ask some questions. For the purposes of this post, this concept is crucial. Subneting Chart

Using this chart you can see that a single Class B /16 network could either have 65534 hosts on it Please Don’t (: or it could have 256 separate /24 networks, with each of those networks containing 254 hosts.

More importantly, you could have a single /16 network which itself contained 8 separate /19 networks. You could then take 7 of those 8 /19 networks and further break those into both /22 and /24 networks. Then you could take the remaining /19, break it up /24s and then further brake those /24s into /29s and /30s. Even after being entirely broken up, we could still refer to this whole grouping as a single /16.

To visualize this in another way, take a look at this chart which shows you how you can break a single /24 down into anything from 2 separate /25 subnets with 126 hosts each to 64 separate /30 subnets with 2 hosts each. Or any combination of /25~/30 subnets you like, so long as no two subnet ranges overlap. /24 subnet chart

###Ok enough already, my head hurts, I get it, subnets can be broken into smaller subnets while still existing themselves - why is this important?

It’s important because of routes, ACLs, and GACs

In order to easily leverage multiple routers in a network, you need to be able to use relatively condensed subnets that in turn refer to relatively large cross sections of your network.

You can’t (or rather, you really don’t want) to have a /22 untrusted guest network slap in the middle of a /19 worth of other trusted networks. In order to route this appropriately we will now have to create dozens (or more) individual Routes, ACLs and GACs where you could have just had two or three. And that is just for a simple oversight of putting an untrusted /22 inside of an otherwise trusted /19.

If you have multiple of such scenarios, and begin multiplying that sort of issue out, you can quickly come to a situation where you could have an absolutely unmanageable amount of routes and ACL/GAC policies where you otherwise may have only needed a handful.

###So you need to plan your subnets?
Yes! Right away. Ideally from the first network you create, but if not from the first then certainly by the second.

There are a variety of ways to do this, but for churches I like something that looks like this:

Typically I will consider where to terminate what type of network according to the types of routes/ACLs/GACs they might need and then group accordingly.

Example: networks which are “trusted” might all terminate in the a core switch as their L3 gateway to give them quick access onto other trusted networks. Any isolation I may need between these trusted subnets can be done using the (relatively) simple ACL functionality present in the core switch. In contrast however, “untrusted” vlans, or vlans with narrow GAC/traffic monitoring needs will be terminated directly into an Out-Of-Band Router / Firewall interface so I can have much finer access control back into the rest of the network.

In practice this might mean something like this:

I allocate each campus a /16 to be further broken down into trusted networks.
I take another /16 and break it into 8 separate /19 networks, allocating one of those /19 to each campus to break down into their untrusted networks.
I then take another /16 and break it into 16 separate /20 networks, one of which will be given to each campus for its management networks
Finally I set aside another /16, broken into 32 seperate /21 networks, each campus getting a /21 to further break down into routing and backhaul interface subnets.

By constructing my subnet map like this, I can do some really cool things.

I can build a simple rule to block the trusted /16s from my untrusted network /20s and know that i have actually “gotten everything”.
I can route my management traffic over alternate routes or circuits than my production traffic, or build failover policies that correctly prioritize such access.
With a hand full of routes (either manual, or via RAs) I can route between campuses trusted networks (presuming I have connectivity) I can even (simply) ACL the routes to stop trusted networks from crossing those routes needlessly
I can take a look at traceroute reports and know that things are being routed correctly just by looking at the overriding subnet blocks that the interfaces are contained within.
Put simply, I can more easily, clearly, and securely deliver and support the high performance networks that facilitates the success of the organizations I serve.

###Really, my brain is melting here.

I know. I really appreciate you hanging on this long. I also know that you are going to be having some thoughts right now. While I can’t pretend to know everything in your head, there is a good chance that most of them distill down into a single phrase: I am never going to be big enough to have to worry about this.

Not only is that completely understandable, it may even be true. However - and this is a big ask - are you in a place where you can say with certainty that you will never be in a position that you can in some way benefit from being built on a subneting foundation that looks something like this? While it is certainly possible to come back later and overhaul the subneting of an organization, it is something you really want to avoid. (I should know, I have had to do it)

I want to be clear - it is not as if I think every corner coffee shop needs to start off with a routing template and subneting plan that would allow it to seamlessly grow to become starbucks. My heart in writing this is to give you enough understanding of an area I find many people lacking insight in so that you may know when to take action, or identify when you need further help.

In wrapping up I just wanted to say thanks for sticking through this and I hope it helps someone learn in a condensed time frame concepts which often allude people for years. As I said up top, I am sure I have missed, overlooked or plain messed some things along the way. Feel free to ask questions or give (loving) correction.

Thanks for reading to the end! (: -Karl P

tylerturner · November 1, 2016, 1:49pm

Thanks for the write-up. So your IP subnet example might look something like this?

Campus 1: 10.1.0.0/16
Campus 2: 10.2.0.0/16

Untrusted: 10.200.0.0/16
10.200.0.0/19, 10.200.32.0/19, etc

Management: 10.250.0.0/16
10.250.0.0/20, 10.250.16.0/20, etc

Routing/Backhaul: 10.251.0.0/16
10.251.0.0/21, 10.251.8.0/16, etc

Please correct me if I am wrong.

My current subnet setup is on the /24 level with only 3 subnets per campus (office, public and VIP). I like your idea of further breakdown and might plan on adjusting.

iamkarlp · November 1, 2016, 2:53pm

Your request reminded me that I really needed to do a sample drawing with real subnets and such.

It’s going to take a few days before I can set aside the time, but I have put it on my schedule and should get to it shortly.

I think this is one of those cases where a picture will be worth a thousand (or more…) words.

-K

iamkarlp · November 6, 2016, 5:06pm

So I sat down a few days ago and began to flesh out a mockup documentation package. A few hours into that project I realized that while my effort may have ended up being informative in its own right, it wasn’t likely to actually convey information that would have directly answered your question.

So I set that all aside and instead crafted something else.

I really appreciated your response @tylerturner. It displayed an obvious understanding of my original content. Thank you for that. Unfortunately it also highlighted a gap in my communication.

Namely, I failed to talk about needing intentionality behind subnet layout.

#Subnet Packing / Layout considerations.

When creating subnet layouts we need to balance a few things:

Capacity For Individual L3 Network Growth
Avoiding Overall Wasted Address Space
Encouraging Human Recognition

Now, the exact scale of the networks in question will go some length towards calibrating the realistic range within which each of these variables can live.

For example, If we were running the public internet as a whole - the scale of that conversation would force us to prioritize avoiding wasted address space above all else.

Similarly, If we were running a small SMB network then the scale is probably small enough that wasting address space simply isn’t a concern.

The networks most likely under discussion on this forum however live somewhere between these two extremes.

#Back to the Beginning.

Taking what I talked about above and applying it to your example, It seems likely that we are out of balance. Now, I can guess why you may have spread your networks across the whole /8 - Doing so really encourages Human Recognition. But by subnetting like that we have taken capacity for L3 network growth to a fanciful extreme, and thus failed pretty hard on avoiding wasted address space.

Let’s take a look at two other ways we could do this instead.

A somewhat more realistic scenario where human recognition is prized and address waste isn’t a concern would have us fitting 32 campuses inside of a /10. (Reference Attachment)

An actually realistic scenario that is more balanced would see us putting those same 32 campuses into an overall /12. (Reference Attachment)

While both of the these examples are more balanced than the original /8 example, they still waste a lot of address space in the name of human recognition and L3 network expansion capabilities, most of which is likely to never be used.

In the overall scale-context of this conversation, that is fine. We can afford some waste.

But what if we couldn’t? What if we actually needed to pack this in for some reason? What would that look like?

#Taking it Farther.

While I didn’t go through the work of making a reference sheet for it, a few minutes on the back of a napkin shows that we could fit those same 32 campuses fairly easily into a single /13, or even a /14 with a little bit more intentionality and avoidance of needlessly issued /24s on access networks.

The opportunity present with packing into a /13 or /14 in this example however is that we begin to expend effort with little potential for payback on that effort.

We aren’t likely to ever need to scale to 1000+ campuses, so why continue to sacrifice human recognition and potential L3 network expandability just for the sake of unneeded scale?

I write this section purely to illustrate that balance goes both ways. While I don’t think we should just assume the use of an entire /8 just to support a few campuses, I also don’t think that we need to hyper-optimize something in the name of scaling potential when that scaling will never be needed.

#Wrapping it up

At the end of the day it is my passion to equip and facilitate people making their own informed decisions. I hope that this dive into subnet packing and layout has given you a deeper understanding into some of the considerations present in subnet layout balance.

As always, feel free to ask questions and/or suggest opportunities for improvement.

Thanks,

Karl P

Note: Thanks to @tylerturner for pointing out that on the 32 campus /12 breakdown sheet I accidently typed 10.1.64.0/18 for Campus 2 under Trusted Subnet Breakdown. It should read 10.0.64.0/18

tylerturner · November 6, 2016, 9:14pm

Thanks. Your reference document is helpful. I was on the right track with my example, but have wasted address space as you stated. I’m thinking on how I can change my subnet plan to optimize the networking side as we add campuses. I primarily have been prioritizing the Human Recognition aspect of my subnet plan and device address assignment. I have some questions.

Do you typically have a DHCP server on the Dante VLAN, or let them self assign addresses? Currently I do not have a DHCP server on my Dante VLAN. I didn’t see the need since they self assign and as far as I know the Dante side of the equipment doesn’t need the internet. I do put the audio console on my normal network for use with iPads and computer access. This might be a question for another thread.
What might practical uses be for the management subnet?
What might practical use of the LAN and WAN routing subnets be? I use IPSEC VPN’s between my AIO-FR (running pfSense) to connect my campuses, would these subnets be used there?
How do other churches provide network access across campuses?

A bit about me regarding networking. About 13 years ago in my senior year of high school I took and passed Cisco Networking Academy, which prepped me to get a CCNA, but I never took the test. What I learned helped with networking of small businesses and homes, but routing and VLANs I haven’t used very often over the years. About 3 years ago I figured out and implemented VLAN subnets to provide separated public wireless access at my church. Until that point any business I assisted with had no need of VLANs. Bottom line is I understand the most of the theory, but don’t have much practical application.

Thanks for your posts. They are a wealth of knowledge and gives me something to think about. I’m also enjoying the fact that our church is growing and I am getting the opportunity to implement some of the things I learned in school, and here.

iamkarlp · November 7, 2016, 2:39am

There is no trouble prioritizing human recognition, I certainly have done that on more than one occasion

Answers to your questions

I typically put a DHCP server on every VLAN - Dante included. While Dante can auto assign, I have had to do some troubleshooting in the past that required interacting the interfaces on the dante network that began with discovering the addresses. Since then I have a DHCP server on the network.
Management subnet holds all management interfaces for your core equipment (switches, RF controllers, routers, firewalls, server iLOMs, etc etc) - Essentially all of the equipment which runs your network. By sitting in a separated subnet it makes it easier to build firewall rules or even alternate routes for this traffic.
The LAN and WAN routing subnets are the networks used between your firewall and your core switch(es) on the LAN side, and between your firewall and your WAN/MPLS/VPN Routers on the WAN side.
Churches typically provide network access by means of direct internet connections to each site and VPN between sites, or, in some cases, Internet connections for external connectivity, and MPLS connections for cross-site connectivity.

Seems like you are doing great on understanding these concepts!

Your absolutely welcome. Feel free to ask more

Karl P

iamkarlp · November 8, 2016, 10:17pm

I thought I should come back and provide a little more context to your question and my response about WAN/LAN Routing networks.

Most people are probably familiar with the simple “flat” network - one where the entire inside is a single /24 or /23 and the singular firewall/router/gateway has a single connection to the public internet. (See drawings for common L2 and L3 maps of such a network). This topology is simple, common, and to a certain point works great.

From here (assuming you have a capable gateway and switching) you can easily add some vlans and subnets to provide L3 separation or growth capacity as needed. (L2 and L3 maps of such a network)

The opportunity inherent in this design, where the firewall/router/gateway sits in the middle of all traffic, is that the aggregate capacity of the network quickly becomes limited to the processing and/or interface speed of that piece of equipment. Additionally (and just as importantly) that single firewall/router/gateway also becomes a very significant potential failure point.

In any growing network, sooner or later, you will hit a point where the capacity of the firewall/router/gateway becomes a significant bottleneck to your organization, or you need to integrate high-availability into your designs. It is at this point that the fun really starts.

In order to alleviate all traffic going through the firewall to cross between trusted L3 subnets, we will often use a core switch with routing abilities. In order to alleviate single points of failure we will often have redundant or high-availability core switching, firewalls, and internet connections, all which bring along their own technical dependancies. (see these examples of a highly available core L2 and L3 network)

As you can see in that last map, as you build more complexity into the core network, you all the sudden start needing routing subnets for LAN and WAN equipment to help facilitate communications between internal and/or external routers and their interfaces.

I hope this sheds a little more light onto the need for routing networks.

Also, major shoutout to Jaakko Rautanen & PACKETPushers for helping me to learn to properly draw L2 and L3 networks a few years ago, and giving me a great resource to crib from for this little exercise.

Karl P

akdale · November 23, 2016, 10:33pm

So one thing I have always tried to communicate to folks is to remember that routers route and switches, well…Anyway one issue in large networks can also be the hardware (cpu and memory) limitations of your devices. We saw this over and over with the Cisco 6500 series of switches. (I know I just dated myself) So what I appreciate about your post is the issue of PLANNING. In the early years VOIP got a bad rap. Much of it due to poor network planning. So when your doing your planning don’t forget to look at the load on the device. In short, just because it “can” do it does not mean it “should”.

codatory · November 24, 2016, 12:27am

Of course, no matter how well you plan you might still find yourself running into issues down the line. Like the crazy way HP’s decided to implement QoS on their switches. Or with port buffering and PPS limits with certain features enabled.