7.2 Nested Lists: A Recursive Data Structure

7.2 Nested Lists: A Recursive Data Structure#

In the previous section, we ended by articulating a fundamental limitation of our sum_list functions: they cannot handle heterogeneous nested lists like

[[1, [2]], [[[3]]], 4, [[5, 6], [[[7]]]]]

In this section, we’ll overcome this limitation by using a new strategy: breaking down an object or problem into smaller instances with the same structure as the original.

To make this more concrete, let’s first identify how the object we are working with, a nested list, can be broken down into smaller instances with the same structure. We will define a new structure that generalizes the idea of “list of lists of lists of … of lists of ints”. We define a nested list as one of two types of values:

  • A single integer.

  • A list of other nested lists ([lst_1, lst_2, ..., lst_n]). Each lst_i is called a sub-nested-list of the outer list.

    We allow n == 0, which represents an empty list—this counts as a “nested list”.

This is a recursive definition: it defines nested lists in terms of other nested lists.[1] It may seem a bit odd that we include “single integers” as nested lists; after all, isinstance(3, list) is False in Python! As we’ll see a few times in this chapter, it is very convenient to include this part of our recursive definition, and makes both the rest of the definition and the subsequent code we’ll write much more elegant.

The depth of a nested list is the maximum number of times a list is nested inside other lists, including the outermost set of square brackets. For example:

  • The depth of a single integer like 10 is 0, since there are no lists.

  • The depth of [1, 2, 3] is 1. The depth of the empty list [] is also 1.

  • The depth of [1, [2, [3, 4], 5], [6, 7], 8] is 3.

Summing up a nested list#

We can use this definition to guide the design of a function that computes the sum on a nested list of numbers:

def sum_nested(obj: int | list) -> int:
    """Return the sum of the numbers in a nested list <obj>.
    """
    if isinstance(obj, int):
        # obj is an integer
        return obj
    else:
        # obj is a list of nested lists: [lst_1, ..., lst_n]
        s = 0
        for sublist in obj:
            # each sublist is a nested list
            s += sum_nested(sublist)
        return s

This is our first example of a recursive function: a function that calls itself in its body. Just as we defined a recursive data structure—nested lists—we have now defined a recursive function that operates on nested lists. Notice how the structure of the data informs the structure of the code: just as the definition of nested lists separates integers and lists of nested lists into two cases, so too does the function sum_nested. And as the recursive part of the definition involves a list of nested lists, our code involves a loop over a list, binds sublist to each inner nested list one at a time, and calls sum_nested on it to compute the sum.

We call the case where obj is an integer the base case of the code: implementing the function’s behaviour on this type of input should be very straightforward, and not involve any recursion. The other case, in which obj is a list, is called the recursive case: solving the problem in this case requires decomposing the input into smaller nested lists, and calling sum_nested on these individually to solve the problem. The example above is the simplest type of recursive function, one that has just one base case and one recursive case. Later on, we’ll look at more complex recursive data structures and functions.