The Test Ascent

Evolving Your Skills

05 June 2020

When trying to get better at testing generally, developers tend to cry out for guidelines and goals. How do I assess whether I’m improving or not? How do I make recommendations to my teammates? How do I know I’ve tested enough, or not enough? How do I balance the expense of taking time to write tests against time I could use to do other things?

And the industry has jumped to provide answers, especially in pre-digested formats that provide a sense of certainty: if you follow this cookbook precisely, you have done it correctly, and you lose points for deviating. These guidelines have value! But, as they currently exist, they usually don’t tell the whole story. This is especially true of the most buzzworded guidelines, as becoming popular will naturally trend them toward degenerate understandings not intended by their original authors - for example, many developers understand BDD (Behavior Driven Development) to be about hierarchical testing (nested ‘describes’) or mere phrasing (‘Given’, ‘When’, ‘Then’), which is not what its proponents choose to emphasize.

As such, the following is not particularly meant to be a criticism of various preferred testing super-structures, but rather an expression of the goals and principles that undergird my personal approach to testing. I suspect that most testing architectures can be bent to meet my needs anyway… which is easy for me to say, as, despite strong buzzword game, most of these architectures are fundamentally describing the same underlying themes.

To start, I believe very strongly that any testing guidelines worth their salt need to be completely accessible to developers at every skill level. That is to say, the bar for entry to writing a valuable test must be extremely low, only requiring being able to write basic functions in a language.

Why? Well, because writing a test is a primary expression of intent. I need my most junior developers to start practicing expressing how they want things to work, and when I come across their code I need these clues to understand what their code-under-test is supposed to do… after all, by definition these developers will be the worst at expressing intent in code itself!

As such, I answer the question “what kind of test should I write” with “the test that best reflects how you think of the system”. For junior developers, this will almost always become a test that targets a single, small function (after they spend enough time cleaning their thinking in order to write a coherent test). For developers of maturity, this may mean writing a test that, instead of calling a function, calls an endpoint (not always, of course). The only wrong answer here is writing a test that does NOT reflect how you think of the system. After all, if you are writing a test that does not represent your point of view, it is unlikely you will express it well.

This approach to testing works well with test-driving (the technique that requires tests to be written first). You think about the problem, you write the test based on how you’re thinking about the problem, and then you implement your solution to the test specification. I’ll note that this works well for testing at every level of a system.

Over time, mature developers will note that, say, testing some things at the endpoint boundary feels strange and suboptimal. For example, take a complex set of rules about what constitutes a valid username. Testing the many scenarios involves in these rules would both read somewhat clumsily at the endpoint boundary, while also occupying precious processor time. As such, a mature developer will test this at two boundaries: one that proves the endpoint boundary uses a validation system, and another that tests the validation system itself. The latter can add a multitude of test scenarios at relatively low cost.

And so, improving your ability to test is directly related to improving your expression of boundaries within the system. That is to say, improving the system architecture. Testing, by its nature, creates bounded contexts, and as your thinking of the system becomes cleaner, “the test that reflects how you think of the system” will naturally map to your systems architecture. Thinking about how things should be tested will encourage you to create more consistent and meaningful boundaries within your system as well, guiding you towards better architectural practices over time.

When assessing the quality of a test suite with a team, I have one question that gets right to brass tacks:

How would you feel if every change made to your system was immediately deployed to all of your users?

It is an emotional metric, and the question is most interesting when posed to the technical team (not to the wide web of other roles surrounding them).

If the team expresses extremely high confidence that everything would work well (and the team is sufficiently mature to understand the gravity of the proposal), then you’ve probably got a strong level of testing… and likely a team that understands its value, because getting to this point is no accident.

More likely though, this question will prompt various expressions of cringing, possible wincing, and the occasional bout of tears.

So the question at that point becomes “what additional automated testing would it take for your team to be comfortable with that level of deployment?” There are many paths to take that can lead a team to that level of trust, but that is the ultimate goal - to have a test suite that is trustworthy, maintainable, and fast enough to provide that much trust.

This goal will push a team past its self-imposed limits (perhaps based on cookbook recommendations about testing) and make them take seriously testing the parts of the system that they’ve ignored or declared unimportant. And taking those things seriously will encourage them to tighten and clarify their architecture to make the testing boundaries themselves safer and more trustworthy. A team will need to develop practices that they trust for testing their user interfaces, domain layers, and persistence… and as such, their thinking will grow and become more inclusive of these scenarios. Hitting performance bottlenecks will prompt stronger modularization, caching, and strategic test-doubling… but all within that #1 priority of keeping the test suites trustworthy.

Using test-driving as a strict requirement is a simpler, more every day technique for creating test sufficiency. As long as your team adheres to the rule “add no features that is not required by a test”, then definitionally you know that the features intended by the programmer are demonstrated by a test… any features that exist that have no associated test are, by definition, unintended, and therefore subject to being disabled, broken, or removed without consequence. Following this rule strictly, along with evolving these tests as the application’s boundaries strengthen and mature, will set you on solid ground for test sufficiency.

Finally, there is always a balancing act involved in writing automated tests - the cost of writing the automated test vs the cost of manual feature validation (vs the cost of no validation, a valid choice!). Like many things in life, the cost of the first test at a given systemic level will be higher, and over time, the cost of adding new tests should decline due to system maturation and capitalization on the previous infrastructural investments. So in order to make informed decisions about testing, you have to estimate each of these costs. How long will it take to set up the test scaffolds? How expensive will it be to manually test a feature (keeping in mind both the hourly cost and the cost of a slower integration and release process). How expensive will broken features be if they make it to users?

There are cost curves here that, depending on your situation, may suggest different choices. High testing-infrastructure setup costs (usually associated with walled-garden platforms) may be prohibitive for testing at the end-to-end level… especially if development of the application is expected to be short-lived and will never recoup those costs (and, of course, we should be realistic in our estimations of the development cycle of an app, and not merely accept the intentions of stakeholders as if they were estimates).

What is to be avoided, of course, is delaying an investment in testing that is highly likely to pay off later… the more development a system undergoes without creating an automated statement of intent, the more difficult it will be to backfill that intent after the fact. There really is no substitute for a team learning and maturing a test suite themselves over time, and transitioning a build process that includes manual commodity-level QA testing to a more automated process includes substantial transition costs that have long tails. There is much to say on this subject, but at its root, a software build process is a cultural artifact, and large cultural transitions are painful, especially when it involves a job being manually performed by a human. So be sure to include these transition costs in your back-of-the-envelope cost sizing.

Alright, some rapid fire question and responses to wrap this whole thing up:

It is an unalloyed good to make communication between the people who best know the user’s requirements and the development team as smooth as possible. That said, it is the responsibility of the development team to translate the human-business language into something mechanically reliable. I don’t think any of my answers above preclude BDD or ATDD, but I do suggest you be careful not to make perfect the enemy of the good… do not create bottlenecks or wait for perfect requirements in order to start testing. With context and wisdom, the lessons BDD and ATDD are trying to teach fit well with what I’ve described.

Its important to remember that the test pyramid is an extremely loose metaphor that attempts to describe test expectations for all circumstances. It can be useful as a discussion prompt when a team has a distrusted test process and is looking for options that will create trust. It is not terribly useful as a way of assessing the test-health of a system without further context. That is to say, if the team has high trust levels for their test suite, having a test “shape” that is “wrong” compared to the test pyramid is utterly irrelevant. Confusion about how to correctly apply the “test pyramid” concept has persisted as time goes on, so I personally have difficulty recommending it. Even with those reservations, it can be useful to help construct a team “ultimate goal” when it is expected to be a long journey toward building a trusted test process.

  • When should I be concerned about test performance?

You should be concerned about test performance at the same level as you are concerned about readability. Which is to say, it is a problem you will never solve. Maintaining, improving, and further automating your test suites is important. When done well, the features you build into the system for testing will also make other kinds of testing easier (automated scenarios setup for manual testing, simulator backends, general application performance problems). Learn the skill of grooming and moving tests between layers without sacrificing the team’s confidence (which means communicate what you’re doing with the team). Remember, if your test performance crosses a few thresholds, then maintainability and trust plummets. But test performance is a problem to worry about a tiny amount every day - if you ever have to take a work week to just focus on fixing outstanding test performance issues, you’ve really failed at day-to-day maintenance.

  • Should I use x,y,z test framework, assertion library, mocking framework?

I tend to rate most testing libraries based on cost of entry, and cost of exit - if it’ll be nightmarish to transition to a different one, the features provided had better be really hot stuff. Ultimately this boils down to what is my team capable and interested in maintaining and investing in… having team buy-in is more important in most cases than using the right tool.

  • But what about automated exploratory testing? Perturbation models? Chaos monkeys? Test coverage metrics?

These are fine tools! But there is a different in purpose behind these tools and testing as expression of intent. Using these tools to automatically find bugs is excellent, and having found a bug writing a test permanently eliminates the problem is good. That said, the purpose of a test suite as an expression of intent is NOT to have perfect, impenetrable test coverage of the source code. Rather, the goal is to eliminate problems derived from errors that human programmers are likely to introduce, and provide ongoing requirements for the programmer-of-the-future. Trying to capture every conceivable scenario in an automated test will substantially weaken their value to that future-programmer, and may result in the suite being too fragile to be maintained, ultimately leading to it being discarded. Again - do not make the perfect the enemy of the good.