Sets and Frozen Sets Here are the bare bones. I will demonstrate sets in class, including some scripts and functions we can write using sets (and to a lesser extent frozensets). The reality of understanding sets/frozensets understanding the basic operations we can perform on them. Sets are mostly like lists (but share one property with dictionaries: see 3), but with three differences (1) Sets do not contain duplicates; if we add that already is in a set, the set remains unchanged; this means we can often add a value to a set without check if it is in the set: if it isn't in the set, it is added; if it is in the set, the set remains unchanged. (2) Sets are unordered: when we iterate through them the order of the values produced is not fixed (3) All values in sets (like keys in dictionaries) must be immutable. So we can have sets of tuples, but not sets of lists There are also a large number of operators/methods that take sets as arguments and produce sets as results (discussed below). Frozensets are like immutatable sets: they have the two big properties listed above, but their methods are restricted to those that do not mutate the frozenset. So frozensets are to sets like tuples are to lists. As with tuples, we can use frozensets as keys to dictionaries because frozensets are immutable. Sets have literals: a positive number of values (1 or more) separated by commas, all in braces. So, these literals are like dicts, but there is no colon between key:value (which is how Python tells the difference betwen a dicts and set. But there is one problem Python cannot tell whether {} is an empty dict or empty set. That is why the rule above says "1 or more"; we write empty dictionaries as {}; we must write empty sets as set(). So a = set() is the empty set (no/0 values) b = {'a', b', 'c'} is a set str c = {('ICS-31','MATH-2A','ICS-6B'), ('ICS-31','BIO-9','ICS-6D')} is a set of tuples Note that if we write the set {[]} Python raises: TypeError: unhashable type: 'list' Because list are mutable and we cannot have mutable values as keys in dictionaries or as values in sets. I would prefer is say TypeError: mutable type: 'list' but it doesn't; instead it says unhashable. hashable means immutable, so unhashable means un immutable, which means mutatable. Set operations: (1) len: we can compute the length of a set (# of valuesat the top-level) len(a) is 0; len(b) is 3 len(c) is 2. (2) No Indexing: the values in sets are unordered so it makes no sense to try to index them (3) No Slicing (4) Checking containment: the in/not in operators These operators work on the values in a set 'a' in a is False; 'a' in b is True; 'ICS-31' in c is False, but ('ICS-31','MATH-2A','ICS-6B') in c is True (5) No Catenation (6) No Multiplication (7) Iterability: for i in b: produces all the top-level values in a (there are len(a) of them): for i in b: print(i,end='') prints: abc Note that iter(aset) for use in while loops, produces all the values of aset Note that the functions max and sum work on lists, tuples, sets, and frozensets (and on adict.keys() and adict.values()) -so long as the values are numeric. We will write our own functions that will take arguments that are iterable, and thus work for all these different types of data. In fact, the constructrs for all these types take arguments that are iterable. We have seen how to construct a list from a tuple and a tuple from a list. We can also construct list from sets and sets from lists, by writing aset = set(alist) # len(aset) <= len(alist): aset has no duplicated values alist = list(aset) # len(alist) == len(aset): it has no duplicated values # because aset has no duplicated valuesl Note set('abc') constructs the set {'a','b','c'} because strings are iterable (8) There are a variety of set operations (from mathematics) that appear in Python in both a method and operator form. aset1 == aset2/aset1 != aset2 : set equality and inequalty two sets are equal if they have exactly the same values, otherwise they are not equal: {1,2,3} == {3,2,1} is True Note that sets are never equal to lists. For two objects to be == they must be the same data-type (two lists, two tuples, two dicts, two sets) and store the same values. aset1.isdisjoint(aset2) : do these sets have no common values aset1.issubset(aset2)/aset1 <= aset2: every value in aset1 is also in aset2 aset1 < aset2 : aset1 <= aset2 and aset1 != asets2 aset1.issuperset(aset2)/aset1>=aset2: every value in aset2 is also in aset1 sometimes if aset1 <= aset2 we say that aset1 is contained in aset2, and if aset1 >= aset2 we say that aset2 is contained in aset1 aset1.union(aset2, ..., asetn)/aset1 | aset2 | ... | asetn produces a new set with the union of all the sets: the new set has one of every value in the other sets: {1,2} | {2,3,4} | {1,3,6} is {1,2,3,4,6} so unions yield sets whose length gets bigger aset1.intersection(aset2, ..., asetn)/aset1 | aset2 & ... & asetn produces a new set with the intersection of all the sets: the new set has only values that are in every other set: {1,2} | {2,3,4} | {1,3,6} is {1} so intersections yield sets whose length gets smaller aset1.difference(aset2, ..., asetn)/aset1 - aset2 - ... - asetn produces a new set with the difference between aset1 and all the other sets: the new set has all the values that are in aset1 but not in any of the other sets: {1,2,3,4,5,6} - {2,4} - {4,5} is {1,3,6}; so differences yield sets smaller than the first aset1.symmetric_difference(aset2)/aset1 ^ set2 produces a new set with the values in one set but not the other: {0,2,4,5,6} ^ {1,3,5,6} is {0,1,2,3}; so symmetic_differences yield sets smaller than each argument/operand There is one big difference between methods and operators: the operators require sets for both operands, but the methods allow any iterables for their arguments. So we CANNOT write {1,2,3} | [2,3,4], but we CAN write {1,2,3}.union([2,3,4]) which results in {1,2,3,4}. Set (mutation) operations (a) aset.add(value): add value to set: does nothing if value is already in aset Suppose x = {1,2,3} After aset.add(2), the set is unchanged After aset.add('x'), the set is {1,2,3,'x'} (iterated in any order) aset.remove(value) : remove value from aset: if not in aset raise KeyError aset.discard(value): remove value from aset: if not in aset do nothing aset.pop() : remove random value from aset: if empty raise KeyError aset.clear() : remove all values from aset: make it empty (b) aset1.update(aset2,...asetn)/aset1 |= aset2 |= ... |= asetn mutates aset1 to include all the values found in aset1 and any other set aset1.intersection_update(aset2,...asetn)/aset1 &= aset2 &= ... &= asetn mutates aset1 to include only the values found in aset1 and every other set aset1.difference_update(aset2,...asetn)/aset1 -= aset2 -= ... -= asetn mutates aset1 to include only the values found aset1 and no other sets aset1.symmetric_difference_update(aset2)/aset1 ^= aset2 mutates aset1 to include only the values found aset1 or aset2 but not both Frozensets are very similar to sets, but we cannot use any of the muation methods or operators. The constructor is named frozenset: frozenset() constructs an empty frozenset, and frozenset(aset) constructs a frozenset with all the values in aset. ------------------------------------------------------------------------------ Comprehensions As with lists/tuples, we can build sets/frozensets via comprehensions as s = {comprehension} fs = frozenset({comprehension}) which constructs a frozenset from a set as above So, to create a set of words (no duplicates), split (by spaces) from a string, we could write words = {s in 'to be or not to be that is the question'.split(' ')} here words is now {'to', 'be', 'or', 'not', 'that', 'is', 'the', 'question'} If we wanted only the words of 3 or fewer characters, we could include the option and write: words={s for s in 'to be or not to be that is the question'.split(' ') if len(s)<=3} here words is now {'to', 'be', 'or', 'not', 'is', 'the'} Generally, we can translate a set comprehension as follows. comprehension = set() for i in iterable: if bool_expression-i: comprehension.add(i) Notice that we don't need to write if bool_expression-i and not i in comprehension: comprehension.add(i) because the add method automatically does the right thing. We shouldn't write such redundant checks. What add does is do that check first anyway, so if write such a check Python is doing it twice ------------------------------------------------------------------------------ A Quick use of Sets Recall that we discussed the following reverse method in the previous lecture. def reverse(adict): answer = {} for k,k_vals in adict.items(): for v in k_vals: answer.setdefault(v,[]).append(k) return answer But one problem with it was that the answer dictionary could contain duplicate values in the list associated with its keys. We solved the problem by writing code to not append the value to the list if it was already there. def reverse_distinct(adict): answer = {} for k,k_vals in adict.items(): for v in k_vals: where = answer.setdefault(v,[]) if k not in where: where.append(k) return answer But really, we should have chosen sets to use as the values in the answer dictionary. When using sets, there is a much easier solution: def reverse(adict): answer = {} for k,k_vals in adict.items(): for v in k_vals: answer.setdefault(v,set()).add(k) return answer Notice that the only change was to the line answer.setdefault(v,set()).add(k) Here we set the default (if v is not in aswer) to be the empty set (which recall we must write as set(), not {} which is an empty dictionary). Also we must substitute add (the method for adding a value to a set) for append (the method for appending a value to a list) When printed (with print_dict), the anwer looks as follows AZ -> {'alex'} CA -> {'rich', 'alex', 'ellen', 'mark'} IL -> {'rich'} IN -> {'mark'} NY -> {'alex', 'david'} OR -> {'ellen', 'patty'} PA -> {'david', 'alex', 'rich', 'ellen', 'mark', 'patty'} RI -> {'david'} WA -> {'david', 'alex', 'rich', 'ellen', 'mark', 'patty'} ------------------------------------------------------------------------------ Default Dictionaries: very simple to use There is a special kind of dictionary, called a defaultdict, that makes the code above even simpler. It also makes the code for count simpler. Let's take a quick look at defaultdict and how to simplify the code for these two dictionaries. First, we must import it from the collections module: typically by from collections import defaultdict Finally (that was short!) when we define a defaultdict we specify a parameter that is the name of the type to construct an object from if we look up a key that is not in the defaultdict: that is, we say what default value to supply when we define the defaultdict. With this new kind of dictionary (and I use it a lot) the above code simplifies to def reverse(adict): answer = defaultdict(set) # key not in answer? use/put a set() in for k,k_vals in adict.items(): for v in k_vals: answer[v].add(k) # add it to current set, or a new one return answer Likewise, we can simplify the count function to def count(alist): answer = defaultdict(int) # key not in answer? use/put a int()/0 for v in alist: answer[v] += 1 # increment current value, or 0 return answer Note that int() returns a reference to the 0 int object (how convenient))