Here’s a small function that walks over any* Python object and yields the objects contained within (if any) along with the path to reach them. I wrote it and am using it to validate a deserialized datastructure, but you can probably use it for many things. In fact, I’m rather surprised I didn’t find something like this on the web already, and perhaps it should go in itertools.
Edit: Since the original post I added infinite recursion protection following Eli and Greg’s good advice, added Python 3 compatibility and did some refactoring (which means I had to add proper unit test). You will always be able to get the latest version here, on ActiveState’s Python Cookbook (at least until it makes its way into stdlib, fingers crossed…).
[sourcecode language=”python”]
from collections import Mapping, Set, Sequence
# dual python 2/3 compatability, inspired by the "six" library
string_types = (str, unicode) if str is bytes else (str, bytes)
iteritems = lambda mapping: getattr(mapping, ‘iteritems’, mapping.items)()
def objwalk(obj, path=(), memo=None):
if memo is None:
memo = set()
iterator = None
if isinstance(obj, Mapping):
iterator = iteritems
elif isinstance(obj, (Sequence, Set)) and not isinstance(obj, string_types):
iterator = enumerate
if iterator:
if id(obj) not in memo:
memo.add(id(obj))
for path_component, value in iterator(obj):
for result in objwalk(value, path + (path_component,), memo):
yield result
memo.remove(id(obj))
else:
yield path, obj
[/sourcecode]
And here’s a little bit of sample usage:
[sourcecode language=”python” light=”1″]
>>> tuple(objwalk(True))
(((), True),)
>>> tuple(objwalk({}))
()
>>> tuple(objwalk([1,2,3]))
(((0,), 1), ((1,), 2), ((2,), 3))
>>> tuple(objwalk({"http": {"port": 80, "interface": "0.0.0.0"}}))
(((‘http’, ‘interface’), ‘0.0.0.0’), ((‘http’, ‘port’), 80))
>>>
[/sourcecode]
Comments
6 responses to “Walking Python objects recursively”
How about adding an “infinite recursion” protection mechanism for cyclic data-structures? Shouldn’t be too hard, by keeping a dict of object IDs.
Obviously a good idea, I’ll add it.
recursion protection might be a bit over-zealous. Shouldn’t you remove from the memo after recursing, else this non-recursive structure isn’t fully described:
a={1:2}
b={0:a, 1:a}
tuple=objwalk(b) ?
Hmmm, well, I definitely didn’t think about it, although you could say that perhaps it’s not desired to visit a twice in such a case. Regardless, I think I’d have been least-astonished with the behaviour you’re describing, so I’ll change the code. Thanks!
It’s best to have interface checks (eg: hasattr(obj, ‘__iter__’) ) instead of isinstance checks (for portability).
Yes, I’ve considered it, but opted to go with PEP 3119’s rationale. All in all, I think it’s the best approach.