2.3 Code Coverage#

In the previous section, we learned about a set of strategies for choosing test cases based on properties of the inputs to the function being tested. This approach is a form of “black box testing”, which means that it does not take into account how the function has been implemented. One of the strengths of black box testing is that we can develop test cases based just on a function’s description and its inputs, without having to worry about how it has been implemented.

However, for more complex function implementations, using just black box testing may miss some subtle or very specific cases in the code that we’ve written. In this section, we’ll introduce a test concept called code coverage, which is known as a “white box” testing principle, because the concept is fundamentally about the function’s implementation. To be clear, the concept of code coverage isn’t meant to replace or negate the strategies we learned in the previous section! Rather, we hope that this idea will become a new tool in your “testing toolbox” that you can use to help design test cases throughout this course.

What is code coverage?#

Code coverage is a measure of the number of lines of code in a program that were executed at least once when a test suite is run.[1] This measure is often reported as a percentage, for example, “90% of lines were covered by this test suite.” Now, this may seem like a fairly obvious measure: shouldn’t we expect to test every line of code that we write, so that the code coverage is always 100%? Well, yes—but if we aren’t careful in choosing test cases, it is possible to miss out on some lines, especially as our function bodies get more complex.

Let’s consider the following example.

def shortest_string(strings: list[str]) -> str | None:
    """Return the shortest string in <strings>.

    If there is a tie, return the string that is considered smaller
    when comparing using <.

    If <strings> is empty, return None.
    """
    if strings == []:
        return None

    shortest = strings[0]  # Set the accumulator to be the first string
    for string in strings:
        if len(string) < len(shortest):
            shortest = string
        elif len(string) == len(shortest) and string < shortest:
            shortest = string

    return shortest

If we focused only on properties of the input strings, we might identify the following cases:

  • strings is empty

  • strings is non-empty and has no ties for the shortest string

  • strings is non-empty and has a tie for the shortest string

Here are three test cases for this function:

def test_empty() -> None:
    """Test shortest_string on an empty list."""
    actual = shortest_string([])
    expected = None

    assert actual == expected


def test_no_ties() -> None:
    """Test shortest_string on a non-empty list with no ties for shortest length."""
    actual = shortest_string(['cat', 'a', 'computer'])
    expected = 'a'

    assert actual == expected


def test_ties() -> None:
    """Test shortest_string on a non-empty list with a tie for shortest length."""
    actual = shortest_string(['cat', 'a', 'b'])
    expected = 'a'

    assert actual == expected

While these test cases certainly cover different possibilities, they are not yet complete. Take a moment to see if you can spot why.

With all three of these test cases, the second shortest = string statement, inside the elif branch, never executes! To see evidence of this, try modifying this line of code so there’s an error (e.g., change string to strng) and run the three tests. They’ll still all pass!

Improving test code coverage#

Once we’ve identified a line of code that isn’t being executed, how do we fix it? We need to study the code to figure out what kind of input we can give to shortest_string to make that line run. Let’s take a closer look at that for loop:

    shortest = strings[0]  # Set the accumulator to be the first string
    for string in strings:
        if len(string) < len(shortest):
            shortest = string
        elif len(string) == len(shortest) and string < shortest:
            shortest = string

In order for the elif branch to execute, we need there to be a tie (so len(string) == len(shortest)), but also for the current string string to be less than smallest.[2] This is why our third test case input ['cat', 'a', 'b'], which did have a tie in shortest length, didn’t trigger this code: 'b' is not less than 'a'. Here is a new test case that will cause the elif branch to execute:

def test_ties_2() -> None:
    """Test strings on a non-empty list with ties for shortest length,
    where the smaller string comes second in the list."""
    actual = shortest_string(['cat', 'b', 'a'])
    expected = 'a'

    assert actual == expected

Now, there is another way we could have come up with this additional test case. From the previous chapter, we learned that list ordering is often a useful property to vary across test cases, and the only difference in our new test case is the relative order of the 'a' and 'b'. However, considering code coverage gave us an alternate way of discovering a gap in our test cases, and prompted us to think more deeply about the function’s code.

Running tests with code coverage#

Code coverage is not just a theoretical concept—modern testing libraries like pytest have ways of tracking code coverage automatically when tests are run.

For example, using Coverage, we need to do the following steps:

  1. Create a Coverage object

  2. Start recording coverage

  3. Run the code (e.g. the test suite) that we want to check the coverage of

  4. Stop recording coverage

  5. Save the coverage results

  6. Report the results

For example, suppose our code is in a file called my_functions.py, and all four of our tests are in a file called test_my_functions.py. Then we can execute these tests by adding the following main block to my_functions.py:

if __name__ == '__main__':  # pragma: no cover
    import pytest
    import coverage

    # This creates a Coverage() object and starts recording information
    # about which lines have been run in my_functions.py
    cov = coverage.Coverage(include=['my_functions.py'])
    cov.start()

    # This line runs the pytest cases in test_my_functions.py
    pytest.main(['test_my_functions.py'])

    # These lines stop recording information and saves it
    cov.stop()
    cov.save()

    # The line below will print the report to the Python Console.
    cov.report()

    # The line below will generate a folder called htmlcov
    # Open the index.html page to see the coverage report. You can
    # click on the "my_functions.py" module there to see
    # which lines might be missing.
    cov.html_report()

When we run this, we’ll see the standard pytest output, but also a new folder called htmlcov will be created. Inside this folder if we open up the file index.html, we’ll see a webpage with the following information:

Screenshot of code coverage webpage, index.html

If we then click on the my_functions.py link, we’ll be taken to a new page that shows us exactly which lines of code were missed when running our tests:

Screenshot of code coverage webpage, my_functions.py

Note: with this way of running pytest, we have excluded the main block from the code coverage analysis by using the special syntax # pragma: no cover. You can try removing that comment to see that the main block will now show as being missing. In any case, what’s important is that every line of code in the body of the function is run at least once!

As an exercise, try commenting out the additional test we added above and run my_functions.py again to easily identify which line wasn’t being covered previously.

The limits of code coverage#

While code coverage is a useful metric for evaluating test cases, attaining “100% code coverage” should not be confused with having a high-quality test suite. In practice, it may be very cumbersome to execute every line of a software’s source code through automated testing alone, such as when working with algorithms involving randomness or programs involving complex interactions between computer systems and/or human users. Attaining 100% code coverage does not necessarily mean that the test suite covers all possible cases, or that the function’s implementation is error-free. Just because each line of code is executed at least once doesn’t mean that different possible combinations of lines of code all execute correctly.[3]

So as you proceed in CSC148, please keep code coverage in mind when designing your test cases, but don’t treat it as the one and only factor when testing. There are many strategies and considerations you’ll use to design your test cases, and code coverage is just one of them.