Watch out for list(dict.keys()) in Python 3

As everyone is probably aware by now, in Python 3 dict.keys(), dict.values() and dict.items() will all return iterable views instead of lists. The standard way being suggested to overcome the difference, when the original behavior was actually intended, is to simply use list(dict.keys()). This should be usually fine, but not in all cases.

One of the reasons why someone might actually opt to perform a more expensive copying operation is because, with the pre-3.0 semantics, the keys() method is atomic, in the sense that the whole operation of converting all dictionary keys to a list is done while the global interpreter lock is held. Thus, it’s thread-safe to run dict.keys() with Python 2.X.

The suggested replacement in Python 3, list(dict.keys()), is not. There’s a chance that the interpreter will give another thread a chance to run before or during the iteration of the view, and this will cause an exception if the dictionary is modified at the same time. To fix the problem, either a lock must protect the iteration, or a more expensive operation such as dict.copy().keys() must be used.

The 2to3 tool won’t help you there, unfortunately. So, keep an eye on it!

10 thoughts on “Watch out for list(dict.keys()) in Python 3

  1. Jesse

    Relying on the atom-icity of certain actions when doing threaded coding is generally a bad idea, the GIL is not there to allow for shortcuts like this one. You’d be better off synchronizing the access to the shared data structure – it’s safer.

  2. Gustavo Niemeyer Post author

    Jesse, yes I agree that relying on the interpreter lock is a bad idea, because hopefully it will go away at some point.

    That said, the purpose of the GIL is precisely to make operations atomic and thus thread-safe, and people do rely on this behavior, no matter how good or bad it is. The subject of this post isn’t about how to do threaded coding in a safe way.

    Also, relying on the atomicity of certain actions is essential to programming with threads in the first place. Hopefully you didn’t perceive the generality of your sentence.

  3. Tom Ritchford

    “As everyone is probably aware by now, in Python 3 dict.keys(), dict.values() and dict.items() will all return iterable views instead of lists.”

    Citation needed! Not that I disbelieve you in the slightest but I didn’t know this and searching hasn’t found a great article, this one mentioning it with little corroborative detail: http://www.python.org/dev/peps/pep-3100/

  4. Gustavo Niemeyer Post author

    You’re right Tom, I should have added more information, and have modified the post to add a link to PEP 3106. Also, that’s the way that the 2to3 migration script will modify the code automatically. I’ve already heard about this in conferences and in other blog posts as well.

  5. Jim Baker

    I don’t think you can generally rely on the seeming atomicity of operations with the GIL, but dict.items(), etc., in pre-3.0 CPython would seem to be on the safe side. But in general, I suggest locking.

    There is another approach. Jython 2.5 (nice to use that version number!) uses a java.util.concurrent.ConcurrentHashMap (http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html) to back dicts. One nice aspect of this is that CHMs support weakly consistent iterators that “guarantee to traverse elements as they existed upon construction of the iterator, and may (but is not guaranteed to) reflect any modifications subsequent to construction”. Short of transactional memory semantics, these are pretty good.

    So in Jython, if you do dict.items(), this is effectively the sames as list(dict.iteritems()), no locking is done, or is needed unless you need to have a stronger degree of isolation between threads. (The caveat, good to look at code one has worked on, is that we actually test if the size of the dictionary has changed during iteritems, iterkeys, itervalues. I suspect we can get rid of this testing and the corresponding unit test because such mutation can’t corrupt the dict.)

  6. Philipp von Weitershausen

    @Lennart: Code like this will break in Python 3:

    >>> keys = dict.keys()
    >>> keys.sort()
    >>> print “Sorted keys: “, keys

    Yes, you can simply use sorted(dict.keys()) in both Python 2 and 3, but this is a) not known by everybody and b) it still doesn’t change the fact that legacy code like the above breaks.

  7. Jesse

    @Gustavo You’re right, I should have pointed out my comment was more directed at what you were talking about, namely concurrent access to “atomic” operations such as dict.keys(). I’ve found that yes – while certain actions are atomic within Python, it’s much safer to build a system where you do not rely on the atom-icity of those actions when sharing amongst threads. The point of the GIL is *not* to be an end-user language feature, the point is to make it easy to maintain C extensions and the interpreter itself, relying on the GIL makes your code brittle, and won’t run on any other interpreter that lacks that GIL-safety net. It’s just a bad idea.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>