Chapter 5 Iterating with Loops

One of the main benefits of using computers to perform tasks is that computers never get tired or bored, and so can do the same thing over and over and over and over and over and over again. This is a process known as iteration. Iteration represents another form of control flow (similar to conditionals), and is specified in a program using a set of statements called loops. In this chapter, you will learn the basics of writing loops to perform iteration; as well as how to use loops to iterate through sequences such as lists.

5.1 For Loops

Loops are control structures (similar to if statements) that allow you to perform iteration. These statements specify that a block of code should be executed repeatedly—the block is executed statement by statement, and then the flow of control “loops” back to the top to execute the statements again. Programming languages such as Python support a number of different kinds of loops, which differ primarily in how they determine whether or not to repeat the block (though this difference is reflected in the syntax).

In Python, the most common loop used for iteration is known as a for loop, which is usually used to repeat a block once for each value in a sequence. A Python for loop has the following structure:

for LOCAL_VARIABLE in SEQUENCE:
    # lines of code to run for each element in the sequence

For example, you could have a for loop that iterates over a range of numbers:

numbers_list = [1,2,3,4,5] # a list of numbers
for number in numbers_list:
    print(number)

When the Python interpreter encounters a for loop, it begins the following process:

  1. It takes the first element of the sequence (e.g., of numbers_list‐so a 1) and assigns that to the local variable (e.g., number). Thus implicitly the interpreter is executing the code number = numbers_list[0]
  2. It then executes the block (the indented code after the colon). In this case, it prints out the number variable—which is currently 1.
  3. At the end of the block, the interpreter loops back around to the top of the loop.
  4. It will take the next element of the sequence and assign that to the local variable. So implicitly, the interpreter executes number = numbers_list[1].
  5. It then again executes the block of code, in this case printing out the number variable—which now has a value of 2!
  6. At the end of the block, the interpreter loops back to the top of the loop.
  7. … and the cycle continues until it runs out of elements in the sequence.

Thus in the end this loop prints out the values 1 to 5.

A for loop thus performs the block once for each element in the sequence, assigning that element temporarily to a local variable that you provide. So a for loop can be read as for each item (which I’m calling local_variable) that is in the sequence, do the block”.

Be careful with variable names when using loops! Name your list or sequences using plurals (e.g., numbers_list) and name your local variable using a singular (e.g., number). This will help you to keep track whether you are referring to a lot of values (the list) or just a single value from that.

The block in the for loop (also called the body of the loop) is a block of code just like the ones used in functions or if statements. That means that they can include multiple lines of code, including other control statements. As with functions, nested control statements such as if are indented an extra step (4 spaces or 1 tab):

# A list of numbers
numbers = [3.98, 8, 10.8, 3.27, 5.21]

# A for loop that includes multiple lines of code, including an if statement
for number in numbers:
    if number > 5:
        print str(number) + " is big"
    else:
        print str(number) + " is small"

5.1.1 Variables and Loops

When writing loops, you need to remember that the body of the loop will be executed repeatedly—it’s almost like those lines of code are being “copy-and-pasted” over and over again. This means that you need to be careful about assigning variables inside of loops, since subsequent iterations may cause that variable to be “re-assigned”.

For example consider the below code that wants to count the number of large numbers in a list:

# A list of numbers
numbers = [5.21, 8, 10.8, 3.27, 3.98]

# A loop that tries to "tally" how many big numbers are in the list
# THIS WILL NOT WORK!
for number in numbers:
    big_number_tally = 0
    if number > 5:
        big_number_tally = big_number_tally + 1
    print(big_number_tally)

The first statement of the loop body is to assign a 0 to the big_number_tally variable. So the first time through the loop (when the value 5.21 is assigned to number), the number > 5 condition will be true, causing the big_nubmer_tally to increase by 1 (from 0 to 1). However, the second time through the loop (when 8 is assigned to number), the big_number_tally is reset to 0—in effect, the algorithm “forgets” that it had already counted one big number and tries to start over! Moreover, because the print() statement is inside the loop, it means that the tally will be printed out for each element in the list: so it will print out the “tally” 5 times!

Instead, if you want a variable to “save” its value across loop executions, you’ll need to declare it outside of the loop (before the iteration starts).

# A list of numbers
numbers = [5.21, 8, 10.8, 3.27, 3.98]

# Initialize variables BEFORE (outside) the loop!
big_number_tally = 0

# A loop that will tally how many big numbers are in the list. This will work!
for number in numbers:
    if number > 5:
        big_number_tally = big_number_tally + 1

# Use result AFTER (outside) the loop!
print(big_number_tally)

5.2 Lists and Loops

One of the most common places to use a loop is to iterate through a list—do run some code once for each element of that list. This kind of process can be seen in the previous examples, with more specific algorithms discussed in later chapters. However, there are a few points to note about using loops with lists.

The simplest way to loop through a list is to use the structure noted previously: for local_element in a_list:. However, while this structure will let you access each value in the list, it doesn’t help you know which index those values are at. In effect, each run through the loop is “independent”—there is nothing that records which element you’re at, what the previous or next elements may be, or how many elements there are total. There is no information about where the value is positioned in the list; you only know what the value is.

If you need to know anything about the value’s position in the list, a better structure is to loop through the indices of the list (rather than through it’s values).

  • To go back to the “hotel” metaphor: you loop through the room numbers, not through the room occupants!

In order to do this, you’ll need to get a sequence of indices; the most idiomatic way to do this is to use a range:

list_indices = range(len(my_list))

The range() function will produce a sequence of numbers from 0 to the given argument—and that’s exactly what the indices to a list would be! Because the “last” index of a list is one less than the length and the ending value of range() is exclusive, using len(my_list) as the argument to range will produce a sequence with the right number of values.

Once you have this sequence of indices, you can loop through them, accessing each list value at its room number (think: going to each hotel room number and knocking on the door):

# A list of numbers
numbers = [5.21, 8, 10.8, 3.27, 3.98]

# Loop through all of the indices of the list
for current_index in range(len(numbers)):
    # prints e.g., "Value at index 0 is 5.21"
    print("Value at index", current_index, "is", numbers[current_index])

Note that you use current_index between the brackets (in numbers[current_index]) in order to access the value at that (variable) index.

Be sure to remember the range() function! The structure is for current_index in range(len(my_list)), not for current_index in len(my_list). The later won’t work; you can’t iterate through a number (which is what len(my_list) produces).

It’s common for programmers to use the variable i (or sometimes idx) as a shorthand for more descriptive current_index. This is one of the few places it’s acceptable to use a single-letter variable name. By convention, i is a variable that refers to the “current index” of a loop:

# Loop through all of the indices of the list
for i in range(len(numbers)):
    print("Value at index", i, "is", numbers[i])

You’ll need to loop through the list indices if you want to modify the contents of the list—without those “room numbers” you won’t be able to change who lives in the hotel!

# A list of numbers
numbers = [5.21, 8, 10.8, 3.27, 3.98]

# Loop through all of the indices of the list
for i in range(len(numbers)):
    rounded_value = round(numbers[i]) # get a round version of that value
    numbers[i] = rounded_value # put that rounded value back in the list

# Print the (now-rounded) list
print(numbers)  # [5, 8, 11, 3, 4]

Note that this process of applying some change to each element in a list is known as a mapping (each value “maps” or goes to some transformed version of itself). We will discuss mapping more in a later chapter.

There is another, shortcut syntax for looping through lists (to create other lists) called list comprehensions. These are discussed in more detail in below.

5.3 Nested Loops

Remember that loops can contain other control structures—including other loops! These are referred to as “nested loops”:

for letter in ['a', 'b', 'c']: # "outer" loop through list of letters
    for number in [1, 2, 3]: # "inner" loop through list of numbers
        print(letter, number) # prints e.g., "a 1"

This code prints the following:

a 1
a 2
a 3
b 1
b 2
b 3
c 1
c 2
c 3

Notice the order in which they are printed. The interpreter does the following:

  1. It begins the first loop (called the “outer” loop), assigning 'a' to letter.
  2. It then begins the second loop (called the “inner” loop), assigning 1 to number.
  3. It prints out the current letter and number (a 1)
  4. It then goes back to the top of the inner loop and assigns 2 to number, then prints out the next combo (a 2)
  5. The interpreter continues the inner loop until it is finished. Only once that block of code is done does it get to the “end” of the outer loop.
  6. Once it has gotten to the end of the outer loop, it goes back to the top of the outer loop and assigns b to letter. Then the process continues.

In short: it completes the inner loop before going to the next run of the outer loop (which involves doing the inner loop again).

Just as the most common place to use a loop is with a list, the most common place to use a nested loop is with a nested list:

# A simple table of values
table = [ ['a1','a2','a3','a4'],
          ['b1','b2','b3','b4'],
          ['c1','c2','c3','c4'] ]

for row_index in range(len(table)): # go through each row (with index)
    for col_index in range(len(table[row_index])) # go through each col of that row (with index)
        print(table[row_index][col_index]) # access element at specified row and column

Because table is a list of rows, you can iterate through that list (the outer loop, using row_index). And then for each row, because that row is itself a list, you can iterate through its entries (the inner loop, using col_index). And in the end, you can access individual elements of the table using bracket notation.

The best way to work about nested loops (or loops in general) is to walk through them step by step, in order to trace what is happening at each iteration. Drawing out a diagram or writing down sample output is a great way of helping you track what is going on!

5.4 List Comprehensions

The previous “block” syntax for writing for loops is the most basic and flexible approach, and similar syntax is used across a wide variety of programming languages. But Python also includes a shortcut syntax for writing these same loops, called List Comprehensions. A list comprehension is a special syntax for looping through lists (to create other lists). It is considered a more idiomatic and “Pythonic” approach (preferred by language developer Guido van Rossum), so are more used and abused in Python code.

A basic list comprehension has the following syntax:

new_list = [output_expression for value in sequence]

This is a shortcut for the following basic loop:

new_list = []  # a new list (to be filled)
for value in sequence:
    new_list.append(output_expression) # add expression to the new list

List comprehensions are written inside square brackets [] and use the same for ... in ... syntax used in basic loops. However, the expression that you would normally append() to the new list is written before the for. This causes the comprehension to be read as “a list consisting of output_expression for each value in sequence—it’s almost English!

For example, the following loop can create a list of rounded numbers can be written either using a basic block loop or by using a list comprehension:

## As a basic loop
numbers = [1.3, 2.7, 3.14, 4.8, 5.99]  # original list
rounded_numbers = []  # a new list (to be filled)
for n in numbers: # loop through the original list
    rounded_numbers.append(round(n)) # add expression to the new list
print(rounded_numbers)  # [1, 3, 3, 5, 6]

## As a list comprehension
rounded_numbers = [round(n) for n in numbers]
print(rounded_numbers)  # [1, 3, 3, 5, 6]

The two versions of the code do exactly the same thing! But the list comprehension is takes less space to write and doesn’t require the creation of the initial empty list. In a way it is more readable, and hence preferred in Python.

List comprehensions are used when you want to create a new list out of a previous one: as in the above example which created a new list of rounded numbers out of an initial list. (This is called a mapping operation, and will be discussed in more detail in a later chapter). It is not appropriate to use a list comprehension if you’re not creating a new list; a loop to print out every element of a list would be better written using the basic loop syntax. It is possible to “modify” a list using a comprehension by having the new list replace the old one:

numbers = [1.3, 2.7, 3.14, 4.8, 5.99]
numbers = [round(n) for n in numbers]
print(numbers) # [1, 3, 3, 5, 6] -- they are rounded!

(This isn’t actually modifying the list; rather it is creating a brand new list and then assigning that value to the original variable. It is a different list).

Use list comprehensions only when creating a new list based on the elements of an old one!

Note that the expression that is being added to the new list can be as complex as you want—but if it gets too complex you may be better using a basic loop for readability:

List comprehensions can also include if conditions to determine whether or not an expression should be included in the new list. This is done by an if condition after the sequence:

new_list = [output_expression for loop_variable in sequence if condition]

Or as a specific example (remember: we filter for elements to keep!):

numbers = [2, 7, 1, 8, 3]
evens = [n for n in numbers if n%2 == 0]
print(evens)  # [2, 8]

This can be read as “a list consisting of n for each n in numbers, but only if n%2 == 0. It is equivalent to using the for loop:

evens = []
for n in numbers
    if n%2 == 0:  # check the inclusion criteria
        evens.append(n)  # append the expression

This use of an if statement is called a filtering criteria (and will be discussed in more detail in a later chapter). Note that even though it’s called a filter, the if statement is used to determine whether the element should be included in the new list (“who passes the filter”, not who is kept out).

Finally, it is possible to include multiple, nested for and if statements in a list comprehension. Each successive for ... in ... or if expression is included inside the square brackets after the output expression. This allows you to effectively convert nested control structures into a comprehension:

entrees = ["chicken","fish","veggies"]
sides = ["potatoes", "veggies"]

# Get all "meals" if the entree and side are not the same
meals = [ entree+" & "+side for entree in entrees for side in sides if entree != side]
print(meals)  # ['chicken & potatoes', 'chicken & veggies', 'fish & potatoes',
              #  'fish & veggies', 'veggies & potatoes']
              # Note: no "veggies and veggies" !

This is equivalent to the nested loops.

meals = []
for entree in entrees:
    for side in sides:
        if entree != side:
            meals.append(entree+" & "+side)

Again, the list comprehension almost can read like English (“a list consisting of each entree & side, for each entree in entrees and for each side in sides, but only if the entree and the side are different”), but it can get harder to parse as more nested levels are added.

Overall, list comprehensions are considered a better, more Pythonic approach to creating new lists out of old ones (which happens to be one of the most common uses of loops). However, block-style for loops have analogs in multiple different languages and contexts, including other data-processing languages such as R, Julia, and JavaScript. Thus it is good to be comfortable with both approaches!

5.5 While Loops

For loops are the most common kind of loop in Python. However, for loops are used for when you have a distinct sequence of values to loop through (or a specific number of times you want to perform the loop that you can define with a range). On the other hand, sometimes you want to loop until a certain set of conditions are met—the interpreter don’t know how many times the loop will repeat until it’s already cycling.

You can perform this kind of “indefinite iteration” using a while loop. A Python while loop has the following structure:

while CONDITION:
    # lines of code to run if the condition is True

The structure looks much like an if statement, and is similar in many regards. As with an if statement, the CONDITION can be any boolean value (or any expression that evaluates to a boolean value), and it determines whether or not the loop’s block will be executed.

Take as an example the following more concrete example:

# Define a current count -- the "loop control variable"
count = 5

while count > 0: # loop if the count is positive
    print(count)
    count = count - 1 # adjust the loop control variable

print("Blastoff!")
  • (This example would actually be more properly written as a for loop, because the loop will always run exactly 5 times. Indeed, all for loops can also be written as while loops).

In this example, when the interpreter reaches the while statement, it first checks the condition (count > 0, where count is 5). Finding that condition to be True, the interpreter then executes the block, printing the count and then decrementing that variable. Once the block is executed, the interpreter loops back to the while statement and rechecks the condition. Finding that count (4) is still greater than 0, it executes the block again (causing count to decrement again). This continues until the interpreter loops to the while statement and finds that count has reached 0, and thus is no longer greater than 0. Since the condition is now False, the interpreter does not re-enter the loop, and proceeds to the following statement (“Blastoff!”).

Importantly, the condition is only checked when the block is about to execute: both at the “start” of the loop, and then at the beginning of each subsequent iteration. Having the condition become True in the middle of the block (possibly temporarily) will have no impact on the control flow. It is also possible for the interpreter to “never enter” the loop if the condition is not initially True.

5.5.1 Counting with While Loops

The above example also demonstrates how to use a “counter” to determine whether or not the loop has run a sufficient number of times: this is known as a loop control variable (LCV). The “standard” counting loop looks like:

count = 0              # 1. initialize the counter
while count < 100:     # 2. check if the counter has reached its target
    print(count)       # 3. do some work (this may be multiple statements)
    count = count + 1  # 4. update the counter

In order for a counted loop to work properly, you need to be careful about steps 2 and 4: the condition and the update.

First, recall that the condition is whether to run the loop, not whether to stop:

count = 0
while count == 100:  # bad condition!
    print(count)
    count = count + 1

In this case, the condition is not initially True, so the interpreter never enters the loop.

When writing while loop conditions, think “do we keep going” rather than “are we there yet?”. In loops (as in life), the journey is more important than the destination!

Second, consider what happens if you forget to update the loop control variable:

count = 0
while count < 100
    print(count)
    # no counter update

In this case, the interpreter checks that count (0) is less than 100, then runs the loop. Then checks that count (still 0) is less than 100, then runs the loop. Then checks that count (still 0) is less than 100, then runs the loop…

This is known as an infinite loop: the loop will run forever, never being able to “break out” and reach the next statement.

If you hit an infinite loop in Jupyter Notebook, use Kernel > Interrupt to break it and try again. If running a .py file on the command line, use Ctrl-C to cancel the script.

There are lots of ways to accidentally produce an infinite loop, including:

  1. Having a condition that is “too exact” can cause the loop control variable to “miss” a particular breaking value:

    count = 0
    while count != 100:  # if we aren't at 100
        print(count)
        count = count + 3  # this will never equal 100

    Thus it is always safer to use inequalities (e.g., < or >) when writing loop conditions.

  2. Resetting the counter in the body of the loop can cause it to never reach its goal:

    count = 0
    while count < 100:
        count = 0  # this resets the count!
        print(count)
        count = count + 1

The best way to catch these errors is to “play computer”: pretend that you are the compiler, and go through each statement one by one, keeping track of the loop control variable (writing its value down on a sheet of paper does wonders). This will help you be able to “trace” what your program is doing and catch any bugs there may be.

5.5.2 Sentinels

Consider the following example (which includes a nested if statement inside of the while loop):

# flip a coin until it shows up heads
still_flipping = True
while still_flipping:
    flip = random.randint(0,1) # pick randomly 0 or 1
    if flip == 0:
        flip = "Heads"
    else:
        flip = "Tails"
    print(flip)
    if flip == "Heads":
        still_flipping = False

In this example, the still_flipping boolean variable acts as the loop control variable, as it determines whether or not the loop is repeated. Using a boolean as a LCV is known as using a sentinel variable. A sentinel (guard) variable is used to control whether or not the program flow gets out of the loop: as long as the sentinel is True, the loop continues to run. Thus the loop can be “exited” by assigning the sentinel to be False. This is particularly useful when there may be a complex set of conditions that need to be met before the program can carry on.

(It is of course possible to design a sentinel such as done_flipping, and then have the while condition check that the sentinel is not True. While this may be useful depending on how you’ve structured the algorithm, I suggest you always use a “positive” boolean for readability).

5.5.3 Difference Between For and While Loops

The main difference between for loops and while loops is:

for loops are used for definite iteration; while loops are used for indefinite iteration.

A for loop is appropriate when the interpreter does know in advance (before the loop starts) how many times the block will be executed—a definite number of iterations. Note that the number of iterations may be a variable so not determined until runtime; however, the value of that variable will still be known when the for statement is executed. On the other hand, while loops are appropriate when the interpreter doesn’t know in advance (before the loop starts) how many times the loop block will be executed: the loop does not have a definite number of iterations.

All iteration can be written with a while loop; but when performing definite iteration, it is easier, faster, and more idiomatic to use a for loop!

5.6 Iterating over Files

For loops can be used to iterate through any sequence (technically any “iterable” type). One of the more useful sequences when working with data are external files (e.g., text files). Files can be treated as a sequence of lines (each divided by a \n newline character), and thus Python can “read” a text file using a for loop to iterate over the lines of text in the file.

In order to read or write text file data, you use the built-in open() function, passing it the path to the file you wish to access. This function will then return an object representing that particular file (e.g., it’s location on the disk), with methods that you can use to read from and write to it.

After a file has been opened, it will remain “open” until it is explicitly closed. This can cause problems when trying to access the file through other programs, or otherwise lead to memory issues. The recommended way to handle this problem and “close” the file is to only open a file using a with block. This will make sure that the file is closed either when it is done being read (at the end of the block) or if some kind of error occurs:

# Always use this structure when opening files!
with open('path/to/myfile.txt') as my_file:
    # read my_file here

Once you have opened a file, you can use a for loop to iterate through its line (as in the example above). You can almost think of the file as a “list of lines”:

# open a file and loop through its lines
with open('path/to/myfile.txt') as my_file:
    for line in my_file:
        print(line)

Remember to always use relative paths. Note that when using a Jupyter Notebook, the “current working directory” is the directory in which you ran the jupyter notebook command to start the server.

After you have finished reading through a file, that file is considered “read”–there’s nothing else to loop through! So if you wanted to go through the file a second time, you would need to re-open it anew:

# You can't lopo through a file twice in a single opening
with open('path/to/myfile.txt') as my_file:
    for line in my_file:
        print(line)

    # this loop will never activate, because the file is already read
    for line in my_file:
        print(line)

# Instead, you need to open the file multiple times to read it multiple times
with open('path/to/myfile.txt') as my_file:
    for line in my_file:
        print(line)
with open('path/to/myfile.txt') as my_file:
    for line in my_file:
        print(line)

In general, you should only read through a file once in any program you write! You also can’t go “back” through a file that has been read—to plan your file processing carefully so you get all of the data you need in one pass.

Printing the lines of the file isn’t very common; more likely you would take each line and do something with it… such as append it to a list in order to create an actual list of lines (a list of strings). However, an easier way to create a list of lines in the file is to call the built-in readlines() method on the file object:

# open a file and read all its lines into a list
with open('path/to/myfile.txt') as my_file:
    list_of_lines = my_file.readlines()

# ... do something with the list_of_lines here

Note that after you have read the file, you’ll want to do any processing outside of the with block: let the file be closed so it isn’t using up memory while you do further analysis.

Calling readlines() on a file will read the entire file into memory. This can be a concern if the file is really big (if you’re working with “big data”). It also may be the case that you don’t actually care about all of the lines in the file, just certain ones (for example: maybe only lines that are relevant to your data). In these situations, you can alternatively use the readline() (singular! No s at the end!) method to read only a single line at a time. This is useful for only reading a certain number of lines, or reading lines in “batches” (e.g, 2 at a time):

# open and read the first 10 lines of a file
with open('path/to/myfile.txt') as my_file:
    for line_number in range(10):
        line = my_file.readline()
        print(line)

# open and read a file 2 lines at a time
with open('path/to/myfile.txt') as my_file:
    for odd_line in my_file: # will read the first line in each pair
        even_line = my_file.readline() # will read the next line in each pair
        print(even_line) # print only even lines

It is also possible to write out content to a file. To do this, you need to open the file with “write” access (allowing the program to write to and modify it) by passing w as the second argument to the open() function. You can then use the write() method to “print” text to the file:

# "open" the file with "write" access
with open('myfile.txt', 'w') as my_file:
    my_file.write("Hello world!\n")
    my_file.write("It's a mighty fine morning\n")

Unlike the print() function, write() does not automatically include a line break at the end of each method call; you need to add those yourself!

For some specialized text files, such as .csv files (comma-separated lists—spreadsheets), there are other specialized modules that can simplify reading the file content. For csv files, check out the csv module.

5.6.1 Try/Except

File operations rely on a context that is external to the program itself: namely, that the file you wish to open actually exists at the location you specify! But that may not be the case—particularly if which file to open is specified by the user:

filename = input("File to open: ")  # ask user which file to open

with open(filename) as my_file:  # "open" the file
    for line in my_file:
        print(line)

If the user provides a bad file name, your program will encounter an error through no fault of your own as a programmer:

$ python script.py
File to open: neener neener
Traceback (most recent call last):
  File "script.py", line 4, in <module>
    my_file = open(filename)
FileNotFoundError: [Errno 2] No such file or directory: 'neener neener'

Since it’s possible for the user to make a mistake, we could like the program not to simply fail with an error, but to instead be able to “recover” and keep running (e.g., by asking the user for a different file name). We can perform this kind of error handling by using a try statement with an except clause:

filename = input("File to open: ")

try: # this might break (not our fault)
    with open(filename) as my_file:
        for line in my_file:
            print(line)
except:  # catching FileNotFoundError
    print("No such file")

A try statement acts somewhat similar to an if statement; however, the try statement checks to see if any errors occur within its block. If such an error occurs, rather than the program crashing, the interpreter will immediately jump to the except class and execute that block, before continuing on with the rest of the program. This is an exception to the general rule that conditions are only checked at the start of a block; a try block effectively tells the computer to keep an eye out for any errors (“try this, but it might break”), with the except clause specifying what to do if such an error occurs.

try statements are used when a program may hit an error that is not caused by programmer’s code, but by an external input (e.g., from a user or a file). Do not use a try statement to fix broken program logic or invalid syntax: instead, you should fix those problems directly!