Python fundamentals

This is a quick summary of the materials in the introductory workshops. It's intended for programmers learning Python, or as a recap. If you've just done the introductory workshops, you don't need to do this one.

In this section we'll run through the most important Python data structures, and some fundamental programming syntax like loops, conditionals and function definitions. This content will probably be covered too quickly if you are really new to programming - you might want to try the introductory workshops instead, which introduce most of the same concepts, with more explanation.

Getting help

If you're using Jupyter, you can always use the built-in help() function or the ? operator to see documentation on a function or type. For instance, help(range) or ?range will give you documentation on the built-in range() function.

You can try this while going through the below. If you want to get help on a Python keyword such as for or if, or on an operator, you'll need to make sure to put quotes around it, like help("for") or help("!=").

Setup

In this workshop, we'll use Python 2. However, we'll use Python 3-style print statements and division operators, which will make our code more sensible, and pretty easy to move to Python 3 in the future.

You don't need to worry about the details of this right now, but, before running anything else, run the following statement:

In [1]:
from __future__ import division, print_function

Data types and structures

Values can be assigned to variables in Python like so:

In [2]:
# Assign the value 5 to variable x
x = 5
In [3]:
print(x)
5

The line that starts with # is just a comment, which will be ignored by Python.

You can check the type of a variable with the built-in type() function:

In [4]:
print( type(x) )
<type 'int'>

In this case, x is an integer (a numeric type).

Python is dynamically typed, which means we don't usually explicitly tell it what type our variables are. Instead the type just depends on what kind of data was assigned.

Here we'll run through numeric types, strings, and some basic data structures like lists, tuples, sets and dictionaries.

Numbers

Above, we saw that x was an integer, because we assigned the value 5, which is a whole number. You are also likely to encounter floating-point numbers, which have a decimal part:

In [5]:
x = 7.4
In [6]:
type(x)
Out[6]:
float

In general, adding a float to a float will produce another variable of type float, even if the answer has no decimal part:

In [7]:
x = 7.4
y = 2.6
z = x + y
print(z)
print( type(z) )
10.0
<type 'float'>

We can, however, convert this number into an integer with int():

In [8]:
z = int(z)
print(z)
print( type(z) )
10
<type 'int'>

Other built-in numeric types are long and complex, which handle very large integers and complex numbers respectively.

Exercise: What happens if you use int() on a number like 7.3? What happens if you use float() on a number like 10? Try it and see.

Strings, indexing, and slicing

Strings in Python are written using quote characters, like so:

In [9]:
x = "GATTACA"
In [10]:
print(x)
GATTACA
In [11]:
type(x)
Out[11]:
str

Both single and double quotes are ok ("GATTACA" or 'GATTACA'), but the quote character at the end of the string must match the one at the start. Quote characters of the other type will be treated as just part of the string:

In [12]:
"Apostrophes aren't a problem!"
Out[12]:
"Apostrophes aren't a problem!"

There is also a triple-quote operator """ for creating multi-line strings. We'll talk about this later, as it's particularly useful for adding documentation to our programs.

Like int() and float(), we can try to convert something to a string with str().

In [13]:
number = 5
str(number)
Out[13]:
'5'

Strings are considered a sequence type because they are made of a sequence of characters. We can access individual characters by indexing the string, which means using square brackets to specify the character. Indexing in Python starts from zero: the first character is at 0, the second at 1, etc.

In [14]:
x = "elephant"
In [15]:
# First and fourth characters
print(x[0])
print(x[3])
e
p

Negative numbers count back from the end of the string:

In [16]:
# Last and fourth-last charaters:
print(x[-1])
print(x[-4])
t
h

We can also get multiple letters from a string at once using slicing. For instance, s[2:4] will give a slice of the string, starting at index 2 and going up to, but not including, index 4.

x[2:4]

We can also leave one (or both) of the numbers out, to go all the way from the start, or all the way until the end of the string.

In [17]:
# From the second character until the end of the string
x[2:]
Out[17]:
'ephant'

Notice that since the x[:4] gives us the string until just before index 4, and x[4:] gives us the string starting at index 4, this actually splits the string into two parts with no overlaps or missing pieces.

In [18]:
print( x[:4] )
print( x[4:] )

# Concatenate the strings
print( x[:4] + x[4:] )
elep
hant
elephant

Lists and tuples

Lists are ordered collections of values. They are similar to arrays in some other languages. They are written out like this, in square brackets [], with items separated by commas:

In [19]:
odd_numbers = [1, 3, 5, 7, 9, 11]
print(odd_numbers)
[1, 3, 5, 7, 9, 11]
In [20]:
odd_strings = ["one", "three", "five", "seven", "nine", "eleven"]
print(odd_strings)
['one', 'three', 'five', 'seven', 'nine', 'eleven']

Lists are a sequence type, just like strings. This means we can index and slice them.

In [21]:
# First, third and last
odd_strings[0], odd_strings[2], odd_strings[-1]
Out[21]:
('one', 'five', 'eleven')
In [22]:
# First three items
odd_strings[:3]
Out[22]:
['one', 'three', 'five']

There are a few functions in Python that are particularly useful for working with lists. The built-in Python function len() gives the length of a list (and of other types, including strings and tuples):

In [23]:
len(odd_strings)
Out[23]:
6

Try using the len() function on a string, and on a tuple, to check it does what you expect.

There are also special functions in Python which apply only to certain data types. These functions are called methods and are used by writing them after the variable name, with a dot, like variable.function(). For instance, lists have a method called append() which adds an item to the end of the list:

In [24]:
odd_strings.append("thirteen")
In [25]:
odd_strings
Out[25]:
['one', 'three', 'five', 'seven', 'nine', 'eleven', 'thirteen']

Tuples are a lot like lists, but they are immutable, which means you cannot alter them in-place: you cannot add or remove items from them. They are written with round instead of square brackets.

In [26]:
even_numbers = (2,4,6)
print(even_numbers)
(2, 4, 6)

Tuples with only one value have a special syntax - they need a comma so that Python can tell we are writing a tuple and not just putting brackets around our code.

In [27]:
x = (5,)
print(x)
(5,)

Lists are mutable, meaning items can be added or removed with methods like .append(), .pop() and .remove(). Strings and tuples are immutable. What happens if you try these methods on a string or a tuple?

Although you can't alter strings or tuples directly, you can add a character or item by building a whole new string or tuple out of the old one.

In [28]:
even_numbers = even_numbers + (8,)
print(even_numbers)
(2, 4, 6, 8)

This works, but can be slow to run because the entire original string (or tuple) is copied each time. If you need to write an algorithm that builds a sequence piece by piece, you should use lists for efficiency.

Sets and dictionaries

Sets

Sets are a Python data type which, like lists, hold collections of values. The main differences between sets and lists are:

  • a set can only have one copy of each value at most: values must be unique
  • the values in a set are not in order - Python may return them in a random order each time!

If you're familiar with sets as used in mathematics, Python's sets are actually quite similar, and can be used for set calculations by calling set methods like .union() and .intersection().

We can create a set by passing a list into set(). Since sets only store unique items, only one copy of each item in the list will be added to the set.

In [29]:
big_list = ['3', '7', '3', '3', '5', '3', '5', '1', '1', '7', '3', '5', '5',
       '3', '5', '5', '5', '1', '7', '7', '1', '1', '7', '7', '3', '7',
       '3', '5', '1', '5', '5', '3', '3', '5', '1', '1', '1', '1', '7',
       '3', '7', '1', '5', '1', '7', '3', '5', '5', '6', '5', '7', '1',
       '5', '1', '5', '1', '3', '7', '7', '5', '3', '1', '1', '5', '5',
       '5', '1', '5', '1', '7', '3', '7', '1', '7', '5', '7', '1', '5',
       '3', '5', '3', '5', '3', '5', '5', '1', '3', '5', '3', '7', '1',
       '5', '7', '3', '1', '7', '5', '7', '7', '1']
unique_values = set(big_list)
print(unique_values)
set(['1', '3', '5', '7', '6'])

Values can be added to a set using the .add() method, and removed using .remove().

In [30]:
unique_values.add('20')
unique_values
Out[30]:
{'1', '20', '3', '5', '6', '7'}

Adding an item that is already in the set has no effect.

Notice that although sets are unordered, and you can't rely on the order in which items will be returned or printed, you can always sort them yourself using the built-in sorted() function, with a call such as sorted(unique_values). sorted() can also be applied to sequence data types, like lists.

Exercise: Try applying sorted() to various data types and see if it does what you expect. Look at the documentation for this function and see if you can sort in reverse order.

Dictionaries

Dictionaries are built-in Python data type for storing collections of values. They correspond to data types like maps or hashes in some other programming languages. Dictionaries are a bit like sets - they store unordered collections of items. But dictionaries store key-value pairs. "Keys" are used for indexing and must be unique. "Values" do not have to be unique.

So, just like in sets, all keys in a dictionary must be unique - only one of each can be stored. Also like a set, the keys in a dictionary are stored in no particular order.

Here's an example dictionary, which we define by using curly braces {}, and by putting a colon : between each key and its corresponding value. This dictionary stores some people's heights in centimetres.

In [31]:
heights = {"Sam":201, "Fiona":167, "Quentin":167}
print(heights)
{'Fiona': 167, 'Quentin': 167, 'Sam': 201}

We can retrieve a value from a dictionary using its key. We use square brackets [], just like getting an item from a list.

In [32]:
print(heights["Sam"])
201

We can assign a value to the dictionary using indexing and the assignment operator =, like so:

In [33]:
heights["Mary"] = 180
print(heights)
{'Fiona': 167, 'Quentin': 167, 'Mary': 180, 'Sam': 201}

Exercise: What happens if you try to index a dictionary with a key that is not in it? What happens if you assign an value to the dictionary but the key already exists?

We can explicitly check to see if a key is in the dictionary or not using the Python operator in, which returns True or False:

In [34]:
"Mary" in heights
Out[34]:
True
In [35]:
"Jason" in heights
Out[35]:
False
In [36]:
# This is the same as: not ("Mary" in heights)
"Mary" not in heights
Out[36]:
False

Dictionaries have several useful methods: two of the most important are .keys() and .values(), which return lists.

In [37]:
heights.keys()
Out[37]:
['Fiona', 'Quentin', 'Mary', 'Sam']
In [38]:
heights.values()
Out[38]:
[167, 167, 180, 201]

Remember that these lists are not guaranteed to be in the same order!

Dictionaries: algorithmic considerations

In fact, dictionaries (and sets) are an implementation of hash tables, and work by carrying out a hash function on the keys to decide where to store them in the computer's memory. This means that dictionaries are very fast at retrieving information. Even if you have a very large dictionary, finding an item in it using a key takes a roughly constant amount of time that does not grow with the size of the dictionary - the performance of the algorithm is approximately O(1).

This also means that the keys of a dictionary must be immutable types, because if the key is altered in-place, its expected location in memory will become wrong. So, lists can't be used as dictionary keys - you will get an error if you try. But tuples, strings, and numbers all can.

There is no problem, however, using a mutable type like a list as a dictionary value:

In [39]:
prime_factors = {12: [3,4],
                 100: [2,2,5,5],
                 9: [3,3]}
print(prime_factors)
{100: [2, 2, 5, 5], 12: [3, 4], 9: [3, 3]}

Other data types

There are other standard data types in Python which we won't have time to cover here. For your reference, a few more important ones are:

  • Booleans (True and False): These we'll cover below, when we talk about conditionals.
  • File objects: These represent open files and are returned when we open a file for reading or writing, e.g. f = open("myfile.txt")
  • Class instances: These are used in object-oriented programming when we define our own classes, which can act like custom data types.
  • Exceptions: These types are used for reporting errors. We'll look at them in a later section.

Loops

Once we have some data we often want to be able to loop over it to perform the same operation repeatedly. A for loop in Python takes the general form

for item in list:
    do_something

For instance, given our list of odd numbers:

In [40]:
for num in odd_numbers:
    print(num)
    print(2*num)
1
2
3
6
5
10
7
14
9
18
11
22

There are two things to notice here, which might be surprising if you're used to other programming languages:

Whitespace is important! There are no brackets or braces around the loop, but everything inside the loop is indented. Python will treat the next non-indented line as being the end of the for loop. Whitespace is important in Python generally; we will also use it to define conditionals and functions. Also notice the colon (:) at the end of the for statement itself; this is also used in conditionals and function definitions.

Since whitespace matters in Python, you have to be careful about your indentation, but it also makes programs very readable. It's a good idea to stick to spaces only when writing Python, and avoid tabs. This is because it's difficult for a human programmer to tell the difference between a tab and a set of spaces, so mixing the two can lead to some very confusing bugs.

We loop over a list (or some other kind of iterable object), and assign the loop variable to each list item in turn. This is in contrast to some languages, like C or Java, that use a condition and an increment operator to create a loop.

We can loop over any kind of list - it doesn't have to be numbers:

In [41]:
for num in odd_strings:
    print(num)
one
three
five
seven
nine
eleven
thirteen

Sometimes, we do want to do C-style for loops, and loop over numbers up to some maximum value. We don't want to have to type the whole list of numbers out by hand. Quite a common pattern is to use Python's built-in range() function to generate the list of numbers to loop over:

In [42]:
range(6)
Out[42]:
[0, 1, 2, 3, 4, 5]

For instance, let's use range() to simultaneously look at each element of odd_numbers and odd_strings:

In [43]:
# How long is the list?
L = len(odd_numbers)
# Now loop over all the index values
for i in range(L):
    print(i,odd_numbers[i],odd_strings[i])
0 1 one
1 3 three
2 5 five
3 7 seven
4 9 nine
5 11 eleven

range() is able to generate more sophisticated sequences, with different starting points and step sizes. Use help(range) to see the documentation.

Exercise: Write a loop that prints out every second number in odd_numbers.

Conditionals

The other standard thing we need to know how to do in Python is conditionals, or if/then/else statements. In Python the basic syntax is:

if condition:
    do_something

For instance:

In [44]:
num = 220

if num > 100:
    print('num is greater than 100')

if num > 300:
    print('num is greater than 300')
num is greater than 100

Since num is not greater than 300, the code in that conditional block was not executed at all.

Each condition, like num > 100, is an expression which evaluates to True or False. We can assign this boolean value to a variable, or write it out directly:

In [45]:
print(num > 100)
True

Here we've used the logical operator >, which means "greater than". We can also use == for equality, <= for less than or equal to, >= for greater than or equal to, and != for not equal to.

It's important to notice that the operator for comparing two things (==) is different to the operator for assigning a value to a variable (=).

We can also use the boolean operators and, or and not operator on conditionals:

In [46]:
True and False
Out[46]:
False
In [47]:
True or False
Out[47]:
True
In [48]:
not True
Out[48]:
False
In [49]:
(num > 100) and (num < 300)
Out[49]:
True
In [50]:
not (num > 100)
Out[50]:
False

The more extended syntax for conditionals is:

if condition:
    do_something
elif another_condition:
    do_something_else
elif a_third_condition:
    do_the_other_thing
else:
    do_something_completely_different

where we can have any number of elif statements (including zero). elif stands for "else if". Only one of the blocks in an if statement like the above will be executed. The elif and else conditions will only be considered if none of the earlier conditions has succeeded; the whole if statement is over as soon as one of the conditions evaluates to True. For instance:

In [51]:
num = 220
if num > 500:
    print("num is more than 500")
elif num > 200:
    print("num is more than 200")
elif num > 100:
    print("num is more than 100")
else:
    print("num is LESS than 100")
num is more than 200

Functions

Functions in Python take the general form

def function_name(inputs):
    do stuff
    return output

For instance, here is a function that just multiplies a value by 2:

In [52]:
def double(x):
    return 2*x

We can call our function like

In [53]:
double(7.3)
Out[53]:
14.6
In [54]:
twice_ten = double(10)
print(twice_ten)
20

Notice that since Python functions don't specify the types of their inputs, we can try to pass in values other than numbers and Python will execute the code (if it can)! Sometimes, this may lead to a surprising result:

In [55]:
double("this is a string")
Out[55]:
'this is a stringthis is a string'

Challenge

If you are new to Python, or are doing this quick introduction instead of the novice lessons, a good way to try putting the above together is to go on to the next workshop and start by implementing the basic Hamming distance function for yourself, without reading the solution.

The solution to this basic problem is also written in the workshop, so feel free to ask for help if you get stuck, rather than reading on too far - asking for hints is a much better way to develop your understanding than reading the solution.