Dictionaries Here are the bare bones. I will demonstrate dictionaries in class. The next lecture will show lots of scripts and functions we can write on dictionaries. The reality of understanding dictionaries is understanding the basic operations we can perform on dictionaries. But dictionaries are more powerful and a bit more subtle than lists, but for now just try to understand their simple operations Dictionaries are like generalized lists: lists associate values with indexes ranging from 0 to len(..); dictionaries (the dict type) associate values with arbitrary keys. Often the keys are strings and often values are ints or lists/tuples. The three major operations on dictionaries are (1) setting the value associated with a key (2) examinig the value associated with a key (3) iterating through a dictionary (there are three ways: by keys, by values, and by items - which is a 2-tuple consisting of a key and its value); dictionaries are not ordered with indexes (like strings/lists/tuples) so there is no standard order that they are iteratated over, but we will learn about the sorted So, like lists, dictionaries are mutable data structures. Sometimes dictionaries are called maps, because they embody how to map a key to its associated value. Dictionaries have literals: any number of key:values pairs (0 or more, with the key and its associated value separated by a colon), and different key:value pairs separated by commas, all in braces: the empty dictionary is written {}. So a = {'a':1, 'b':2, 'c':3} is a dictionary of str:int associations b = {1:'a', 2:'b', 3:'c'} is a dictionary of int:str associations c = {'penny':1, 'nickel':5, 'dime':10, 'quarter':25} like a but more interesting d = {'bob':['ICS-31','MATH-2A','ICS-6B'], 'mary':['ICS-31','BIO-9','ICS-6D']} is a dictionary of str:list[str] e = {'even': {0:'even',1:'odd'}, 'odd': {0:'odd',1:'even'}} is a dictionary whose values are sub-dictionaries (just as we can have lists whose values are sub-lists. We describe this annotation as {str:{str:str}}. f = {'a':'str', 2:'int', (0,0):'tuple'} is a dictionary whose keys are differnt types; we can use tuples as KEYS to dictionaries but not lists. The difference is mutability: all keys must be immutable; more on this later. Dictionary operations: (1) len: we can compute the length of a dictionary (# of key:value pairs at the top-level) len(a) is 3; len(b) is 3; len(c) is 4, len(d) is 2; len(e) is 2. (2) Indexing: we can refer to each value in a dictionary by its key: we index a dictionary by a key, computing its value: a['a'] is 1; b[3] is 'c'; c['quarter'] is 25; d['bob'] is ['ICS-31','MATH-2A','ICS-6B'] and note that d['bob'][0] is 'ICS-31'; e['even'][1] is 'odd'; f[(0,0)] is 'tuple' Note the asymmetry: given a key we can easily find its value, but given a value we cannot easily find its key: in fact, although keys are unique, there may be muliple values with different keys: {'alex':24, 'jessie':24}. Note that c['peso'] raises KeyError: 'peso'; because there is no value in the dictionary c associated with 'peso'. Another way to say this is there is no key 'peso' in this dictionary. (3) No Slicing (4) Checking containment: the in/not in operators These operators work on the keys in a dictionary 'a' in a is True; 'a' in b is False, but 3 in b is True; 'peso' in c is False; 'bob' in d is True; 0 in d['even'] is True; 'int' in f is False (5) No Catenation (6) No Multiplication (7) Iterability: there are three ways to iterate through a dictionary: by its keys, values, and items (tuples of key:value pairs). Each produces len(...) values, but their order is not fixed for k in d:/for k in d.keys(): produce all top-level keys in d for v in d.values() : produce all top-level values in d for kv in d.items() : produce all top-level (key,value) pairs in d We can write for k is sorted(...) if the keys/values/items can be compared for order (they are produced in the order specified: as with the the sort method, we can optionally supply key/reverse arguments). Note that we cannot sort a dictionary, but we can iterate over the keys in a sorted order for k in c: print(k,end=';') prints: penny;dime;nickel;quarter; for k in sorted(c): print(k,end=';') prints: dime;nickel;penny;quarter; This is similar to keys = list([k for k in d]) # created a list of the keys keys.sort() # sort the list (lists are sortable) for k in keys: # iterate through sorted keys list print(k,end=';') for k in sorted(c.keys()): # a common idiom: keys and values print(k,c[k],sep=':',end=';') prints: dime:10;nickel:5;penny:1;quarter:25; for kv in sorted(c.items()): print(kv,end=';') prints: ('dime', 10);('nickel', 5);('penny', 1);('quarter', 25); for kv in sorted(c.items()): print(kv[0],kv[1],sep=':',end=';') # note how each tuple in kv can have its indexes 0 and 1 accessed prints: 'dime':10;'nickel:'5;'penny':1;'quarter':25; for k,v in sorted(c.items()): # a common idiom: keys and values print(k,v,sep=':',end=';') # note how each tuple produced is unpacked into the names k and v prints: 'dime':10;'nickel:'5;'penny':1;'quarter':25; for v in sorted(c.values()): print(v,end=';') prints: 1;5;10;25; Note that iter(adict) for use in while loops, produces all the keys of adict Dictionary (mutation) operations (a) Assignment Suppose x = {'a':1, 'b':2, 'c':3} x['a'] is 1 x['a'] = 'z' now d is {'a':'z', 'b':2, 'c':3}; the value in key 'a' is changed to 'z' x['b'] = ['b1','b2'] now x is {'a':'z', 'b':['b1','b2'], 'c':3}; the value in key 'b' is changed to ['b1','b2'] and if we assign to a key not already in the dictionary, it just adds it (very different from lists, which require a call to append). So looking up a non-existant key raises KeyError, but assigning a non-existant key is fine (just like names in general: to look up their value the name must exist; but to set a value to a name the name can exist (change its value) or not exsit (create name and set its value) x['d'] = 'new' now x is {'a':'z', 'b':['b1','b2'], 'c':3, 'd':'new'}; key 'd' is added to the dictionary, with associated value 'new'. len(x) goes from 3 to 4. (b) del form: del adict[key] del x['b'] x is now {'a':'z', 'c':3, 'd':'new'} if we try to delete a key not in adict, Python raises KeyError; we can always write if key in adict: del adict[key] to ensure this exception is not raised. (c) adict.get(key[,default]) Same as adict[key] except if key is not in adict, it returns the value of default (and if default is not specified, returns None); but unlike [] indexing, it never raises KeyError Similar to the following function definition (we'll learn the truth soon) def get(key,default=None) if key in adict: return adict[key] else: return default (d) adict.setdefault(key[,default]) Same as adict.get(key[,default]) except if key is not in adict, it first adds the key:default pair to the dict and then returns the value of adict[key], which is now guaranteed to exist. Similar to the following function definition (we'll learn the truth soon) def setdefault(key,default=None) if key not in adict: adict[key] = default return adict[key] So it is like get, but if the key is not found, it not only returns default, but first it mutates the dictionary, puting the key:value pair in it. (e) adict.pop(key) or adict.pop(key,default) First form: removes the key (and its associted value) from the dict and returns the value associated with the key: it raises a KeyError if key is not in adict. Second form: the same, but returns default if key is not in adict adict.popitem() Removes a random key from the dictionary and returns the key:value as a tuple: raises KeyError if the dictionary is empty: len(adict) == 0 (f) adict.clear() Deletes all key:value pairs from the adict (g) keys(), values(), items() are technically called views of the dictionary. Besides iterating over these views, we can check if values are in/not in them (h) d = dict(...) Recall that we can write list(...) and tuple(...) to construct a list/tuple that contains all the values specified by ... For dict, there are two forms of ... (1) anything we can iterate over, that produces a sequence of 2-list/-tuple which is treated as a key/value. For example, we write a list of 2-tuples: d = dict([('a',1),('b',2),('c',3)]) (2) a list of parameters of the form p=v, where p (as a string) becomes a key and v becomes its associated value. This only works for keys that are string, but this is very common. d = dict(a=1,b=2,c=3) (i) adict.update(...) Update adict with the information in ..., with three forms of .... (0) another dict (1) see (h) above (2) see (h) above so, if adict is {'a':100, 'b':200, 'c':300} the following adict.update([('a',1),('x',2),('c',3)]) # note the 'x' adict.update({'a':1, 'x':2, 'c':3}) # note the 'x' adict.update(a=1,x=2,c=3) # note the 'x' all are like executing adict['a'] = 1 adict['x'] = 2 adict['c'] = 3 which result in adict being {'a':1, 'b':200, 'c':3, 'x':2} ------------------------------------------------------------------------------ Comprehensions As with lists/tuples, we can build dictionaries via comprehensions as d = {dict-comprehension} The form of a dict-comprehension is as follows (bool-expression-i is a boolean expression that can refer to i). Note that the [] mean EBNF option. key-expression-i:value-expresion-i for i in iterable [if bool-expression-i] So, to create a dictionary of keys that are strings and values that are their reverses, split (by spaces) from a string, we could write d = {s : s[::-1] for s in 'Four score and seven years ago'.split(' ')} If we wanted only the words longer than 3 characters, we could include the option and write: d={s : s[::-1] for s in 'Four score and seven years ago'.split(' ') if len(s)>3} Generally, we can translate a dictonary comprehension as follows. comprehension = {} for i in iterable: if bool_expression-i: comprehension[key-expression-i] = value-expression-i ------------------------------------------------------------------------------ Some Real Dictionary Code: The Power of Dictionaries Suppose we have a list of word and we want a count of how often each work appears. Here is a function that takes such a list as a parameter and returns a dictionary of how often each word occurs. def count_words(alist): answer = {} for w in alist: if w in answer: answer[w] += 1 else answer[w] = 0 return answer We can simplify this function by using setdefault (review its meaning). It is so useful just because this update-idiom is so frequent. In fact, we will learn about defaultdict that even makes this idiom easier to express. def count_words(alist): answer = {} for w in alist: answer[w] = answer.get(w,0) + 1 return answer In both cases, calling count_words('how much wood can a woodchuck chuck if a woodchuck could chuck wood'.split(' ')) returns {'could': 1, 'much': 1, 'chuck': 2, 'if': 1, 'a': 2, 'how': 1, 'wood': 2, 'woodchuck': 2, 'can': 1} Notice that the order of the key:value pairs is indeterminate. If we wrote answer = count_words(...) for k in sorted(answer): print(k,'->',answer[k]) the printed result would be a -> 2 can -> 1 chuck -> 2 could -> 1 how -> 1 if -> 1 much -> 1 wood -> 2 woodchuck -> 2