In this post, I want to talk about how I think about designing software systems, starting from the first and most important principle. This is inspired by my experience conducting interviews at Redox, mentoring engineers over the years, as well as designing complex systems myself.

The intended audience for this post is someone who already understands the basics of programming and how to make software at a “tactical” level, and wants to build more of a “strategic” perspective.

Or, to put it differently, this is a kind of crash course on the principles and concepts that I’m looking for candidates to understand when I’m conducting an interview for experienced software engineers, but not necessarily something I’d expect an early-stage career engineer to have already grokked.

The First Principle
The puzzle piece analogy
The function optimization analogy
Context changes over time
Applying the First Principle
Possible future posts
Further reading

The First Principle

Illustration of a cliff near the water with a cut-out spot for a lighthouse.

Perhaps you’ve heard that all software is created to solve problems. The user’s problems, the company’s problems, or even the programmer’s problems.

However, the “solving a problem” formulation is lacking in that it’s overly specific, which can make thoughtful people try to poke holes. What is a “problem” and what isn’t? Surely not all software is solving a “problem”, maybe it’s just making a good situation better. Does that count? Does “making a profit” count as solving a problem? And so on.

So, instead of speaking of problems, perhaps it’s better to speak of requirements: All software is created to meet some requirements. But even that is too specific. After all, what if there’s a desire to keep the costs of the project as low as possible? Is that a requirement? No, it’s something “softer” — a vague type of desire.

To avoid spinning out on what is or isn’t a problem or a requirement, I like to adopt the terminology from Notes on the Synthesis of Form, which speaks of a design being expected to fit into a context.

The “context” encompasses the “problem” plus the “requirements” plus everything else around it. The context is the whole world; or the relevant bits, at least:

Budget
Personnel constraints
Knowledge constraints
Time constraints
Users’ perceived needs
Users’ actual needs
Rules and regulations
Orders from on high (but be prepared to push back if these are problematic)
Operational needs (observability, etc.)
Maintainability
Security (really a subset of “Users’ actual needs” — the user needs their data to be protected from unauthorized access)
etc.

That’s all part of the context.

Now that we’ve formulated the idea of the abstract “context” of the project, we can say that all software is created to fit into a certain context.

This is the First Principle of designing software systems, the most fundamental thing to remember at all times:

All software is created to fit a context.

There is an essential correlary that makes this principle actionable:

Every element of a software system must be justified by the context it’s in.

This principle (and correlary) applies at every level of software design, from the system architecture as a whole, down to each individual function and line of code.

Imagine you’re at an interview where you’re presenting a system you’ve designed, and the interviewer asks, “Why did you include a caching layer between your application and the database?”

How do you respond? (Think about it what answer you’d like to hear before continuing!)

Remembering the First Principle, you justify the existence of the caching layer by explaining how it fits the context:

“Since we’re building an ecommerce site with, ideally, millions of users across the globe, having each new visitor to the homepage load data from the database would be expensive for us and slow for our users. By adding a caching layer, we can avoid recomputing the queries for popular products and so on. Also, we can have multiple copies of the cached data in various regions to improve response times for our users. Having a slow-loading homepage is shown to reduce conversions. The caching layer helps us avoid that.”

In this case, the context is that we’re building a site with some fairly static queries that benefit from being cached. Also, we have enough users to make caching worthwhile. Also, since our users are located around the globe, we’re concerned with minimizing latency when loading data. Because of that context, a cache makes sense.

Notably, this answer also hints at when a caching layer would not be appropriate. What if we only had a few hundred users (e.g. for an internal tool or a B2B product)? Then the caching layer becomes harder to justify, and should not be included in the software design.

Imagine you’re pair programming with a coworker and they ask “Why did you use a class to represent DataLoaders, instead of a plain object/record?”

How do you respond?

In this example, we’re thinking at a more granular level, but we still need to justify our decision using the context:

“Well, I figured since we only want to allow a DataLoader configuration to be saved if it’s been validated, we should use a class to encapsulate the raw configuration object and enforce validation for updates. Also, we may want to override the way the DataLoader connects to the filesystem for tests, which we can easily do with classes using a dependency injection pattern.”

Perhaps your pair agrees with your decision, perhaps not. The point isn’t to always be right, but to be able to explain your why by relating it to concrete elements of the context.

Now that we’ve formulated the First Principle, it becomes clear that an essential part of designing a software system is understanding, as deeply and thoroughly as possible, the context that the system needs to fit within.

However, the world is complicated, and software systems are too. To design a system that perfectly fits the context is impossible. In fact, just give up on even understanding the full context of any (useful) software system. The best we can hope to achieve is an approximate fit, at least to start with.

Because the idea of a “context” is so (deliberately) abstract, it can be useful to think about it by analogy. I’ll present a couple analogies that should help build intuition around the idea.

The puzzle piece analogy

A useful analogy is that designing a software system is like cutting out a piece to fit into a complex puzzle.

Illustration of your software design fitting into a puzzle space.

If you don’t understand the puzzle (or at least the parts near the empty space where your piece will go), then you can’t cut the piece properly.

But because this is the real world, we’re not talking about a 2D puzzle here. The puzzle might actually have dozens or hundreds of dimensions, each with varying complexity and unique concerns.

Worse, the boundaries between each piece of the puzzle are actually fractal-esque. The more you zoom in, the more detail is revealed.

We can rough-cut our custom piece, but as soon as we try to fit it in the puzzle, these zoomed-in details become apparent. We adjust our piece to match, then fit it again, and yet more, even tinier zoomed-in details become obvious when the piece doesn’t quite fit.

(For the sake of the analogy, imagine we have no magnifier or microscope to see these tiny, zoomed-in details with our eye. We can only “see” them by trying to fit our puzzle piece into the space.)

This iterative pattern of producing better and better fits eventually bottoms out when you decide it’s good enough. As a rule of thumb, you will never achieve a perfect fit, and that’s OK.

If you think your fit is perfect, you probably haven’t actually tried to fit the piece into the puzzle yet (or you just pressed it in hard without paying attention to any tiny imperfections, like shipping a project when the requirements are met without understanding how the user responded to it).

As an aside, the need to iterate towards better fit is, in my opinion, the underlying insight behind the Agile Manifesto as it’s commonly interpreted, and also the reason the Waterfall method is frowned on.

However, even the design process itself should be adapted to fit the context. No design process is better or worse in absolute terms, just better or worse fit for a particular context. So naively deciding that “waterfall == bad” is artificially limiting the options you have available as a designer to fit the context you’re working with.

Having said that, there are very few contexts where a “pure” waterfall method is truly suitable — in my experience, the best projects have at least some flexibility baked in, to respond to the fractal-esque details that get uncovered as the project proceeds.

The function optimization analogy

Thinking of a puzzle piece is fine, but there is another analogy that mathy people might like, which is the idea of “context” as a set of functions that the software design is trying to optimize.

The power of this analogy is that it reveals that, in reality, many of the context’s dimensions are not binary — not pass/fail — but are instead focused on the idea of minimizing or maximizing something:

Minimize budget, complexity, etc.
Maximize user satisfaction, profit, maintainability, etc.

If you think of each dimension $i$ of an $n$ -dimensional context as a function $F_i(X)$ , where $X$ is our software system, then the goal of the design is to pick the $X$ such that the outputs of $F_{1..n}(X)$ are as optimal as possible.

Don’t let this formulation trick you into thinking you can “solve” these problems with actual math, though. This is just a mental model; the actual functions are unknown (except by sampling random values), they’re not continuous, and they’re possessed of a fractal-like complexity in the details.

Of course, there is no truly optimal solution to begin with. But if we tweak $X$ enough, we can end up with a solution that at least falls within an acceptable range for all of the dimensions.

Context changes over time

One thing these analogies gloss over is that the real world is constantly changing. Every day, the context is shifting underneath you.

Customer expectations shift, relevant regulations or standards are revised, the stakeholders’ opinions change (or the stakeholders themselves change!) — the possibilities are endless. And the context doesn’t stop shifting when the project is shipped, either — it’s changing all the time, forever.

So, not only are you struggling to understand something too complicated for a person to ever fully understand, but that “thing” is itself in flux at all times. And you have to design a solution that doesn’t just fit the current context, as you understand it, but also fits potential future contexts as much as possible.

There is a terminological consideration here as well. In a sense, the need to fit possible future contexts is part of the present context. Since those future contexts don’t actually exist yet, there is nothing to actually fit against. Instead, we fit against the current desire to avoid future changes when the context inevitably shifts.

The fact that context changes is another essential reason to bake flexibility into your projects, especially if they take longer than a week or two. (I say “another,” because even if we assume context is fixed, the fact that fractal-like details get revealed as the project proceeds means we need flexibility baked in anyways.)

If we can’t design a system to fit perfectly against a present context, then it goes without saying there is no way for us to anticipate exactly how the context may shift in the future, either.

Building a system that’s flexible in the right ways, without being too flexible, is an art form that comes with experience, and is one of the strongest signals when evaluating an experienced engineer.

Applying the First Principle

You may wonder, how should this principle be applied in practice?

This is a difficult question, actually, because as soon as we start getting into concrete details, we’re assuming a certain context, and the practical application inevitably becomes coupled with that assumed context.

In other words, there is no such thing as a one-size-fits-all, concrete application of the first principle. That’s why the principle is expressed in such abstract terms to begin with.

But setting that problem aside, I’ll give an example of how I think about applying this principle at each stage of designing a complex software system, for example when acting as a project engineering lead:

Begin by building a rough understanding of the context, before thinking about designing a solution.
- Talk to stakeholders. Ask questions.
- Research any hard requirements, rules and regulations that might apply. GDPR is a classic example, or HIPAA in the US healthcare space.
- Understand which requirements are hard, and which are soft. HIPAA compliance is a hard requirement. Using the latest recommended tech stack is a “soft” requirement.
- Consider engineering needs (observability, maintainability), not just business needs.
- Consider the scale that the system will need to meet (# of users, amount of data, etc.)
- Write a document that describes the context as you see it, and share it with stakeholders before getting too far into “solutioning.”
Write a document describing a solution that fits the context as you currently understand it after step (1).
- Send to stakeholders, discuss with engineers, gather feedback.
- Be aware that this solution will not fit the context at the “fractal detail level” yet, because you can’t even see those details at this point. So, there is a certain level of granularity that it’s just not worthwhile including in the initial design document. Understanding exactly where this threshold is… well, that takes a lot of experience.
- If there is no apparent solution that fits the full context, you may need to cut scope, discard soft requirements, push back on deadlines, etc. — always in close communication with stakeholders.
While the team is implementing the solution, pay attention to pain points and mis-fit spots. These are the fractal details that you couldn’t see until the rubber hit the road.
- Adjust the solution as needed (while actively communicating with stakeholders and other engineers, and keeping in mind timeline and budget constraints).
Once the solution is “finished,” give it a once-over from a user’s perspective. Almost certainly, you’ll uncover yet more fractal details where the solution doesn’t quite fit. A confusing UI, poor localization, missing an obvious feature, irritating latency, etc.
Finally, get feedback from actual users (and other stakeholders). Do this at a point where you still have time — say, a couple weeks — to resolve any further issues with the solution’s fit.

Not every project will smoothly progress according to the above plan. Nor should we expect them to. Each project is unique and warrants a unique approach.

That’s why the most important thing is to think hard about how to apply the first principle to each specific project.

Possible future posts

In future posts I hope to get into detail about the concrete tools that we as software engineers and system designers have available when building software systems for a context.

For example:

Building abstractions that fit the context.
Organizing code (modularization) in a way that reveals the context fit and minimizes coupling between context dimensions.
Writing useful design documents at the right level of detail.
The benefits and drawbacks of patterns to handle future context changes at every abstraction level, e.g. higher-order functions, hooks, event busses, microservices, etc.

The first principle of software design