Russ Allbery > Eagle's Path > November 2019 |
I'm going to preach the wonders of Python dataclasses, but for reasons of interested to those who have already gone down the delightful rabbit-hole of typed Python. So let me start with a quick plug for mypy if you haven't heard about it.
(Warning: this is going to be a bit long.)
mypy is a static type-checker for Python. In its simplest form, instead of writing:
def hello(name): return f"Hello, {name}"
you write:
def hello(name: str) -> str: return f"Hello, {name}"
The type annotations are ignored at runtime, but the mypy
command
makes use of them to do static typing. So, for instance:
$ cat > t.py def hello(name: str) -> str: return f"Hello {name}" hello(1) $ mypy t.py t.py:3: error: Argument 1 to "hello" has incompatible type "int"; expected "str"
If you're not already using this with your Python code, I cannot recommend
it highly enough. It's somewhat tedious to add type annotations to
existing code, particularly at first when you have to look up how mypy
represents some of the more complicated constructs like decorators, but
once you do, mypy starts finding bugs in your code like magic. And it's
designed to work incrementally and tolerate untyped code, so you can start
slow, and the more annotations you add, the more bugs it finds. mypy is
much faster than a comprehensive test suite, so even if you would have
found the bug in testing, you can iterate faster on changes. It can even
be told which variables may be None
and then warn you if you use
them without checking for None
in a context where None
isn't
allowed.
But mypy can only help with code that's typed, so once you get the religion, the next goal is to push typing ever deeper into complicated corners of your code.
Python code often defaults to throwing any random collection of data into a dict. For a simple example, suppose you have a paginated list of strings (a list, an offset from the start of the list, a limit of the number of strings you want to see, and a total number of strings in the underlying data). In a lot of Python code, you'll see something like:
strings = { "data": ["foo", "bar", "baz"], "offset": 5, "limit": 3, "total": 10, }
mypy is good, but it's not magical. It has no way to keep track of the
fact that strings["data"]
is a list of strings, but
strings["offset"]
is a int. Instead, it decides the type of each
value is the superclass of the types it sees in the initializer (in this
case, object
, which provides almost no type checking).
There are two traditional solutions: an object, and a NamedTuple (the typing-enhanced version of collections.namedtuple). An object is tedious:
class Strings: def __init__( self, data: List[str], offset: int, limit: int, total: int ) -> None: self.data = data self.offset = offset self.limit = limit self.total = total
This provides perfect typing, but who wants to write all that. A
NamedTuple
is a little better:
Strings = NamedTuple( "Strings", [("data", List[str]), ("offset", int), ("limit", int), ("total", int)], )
but still kind of tedious and has other quirks, such as the fact that your object can now be used as a tuple, which can introduce some surprising bugs.
Enter dataclasses, which are new in Python 3.7 (although inspired by attrs, which have been around for some time). The equivalent is:
@dataclass class Strings: data: List[str] offset: int limit: int total: int
So much nicer, and the same correct typing. And unlike NamedTuple
,
dataclasses support default values, expansion via inheritance, and are
full classes so you can attach short methods and do other neat tricks.
You can also optionally mark them as frozen, which provides the
NamedTuple
behavior of making them immutable after creation.
Using dataclasses for those random accumulations of data is already great, but today I found a way to use them for a trickier typing problem.
I work on a medium-sized (about 75 routes) Tornado web UI using Jinja2 and WTForms for templating. Returning a page to the user's browser involves lots of code that looks something like this:
self.render("template.html", service=service, owner=owner)
Under the hood, this loads that template, builds a dictionary of template
variables, and tells Jinja2 to render the template with those variables.
The problem is the typing: the render method has no idea what sort of data
you want to pass to a given template, so it uses the dreaded
**kargs: Any
, so you can pass anything you want. And mypy can't
look inside Jinja2 template code.
Forget to pass in owner? Exception or silent failure during template rendering depending on your Jinja2 options. Pass in the name of a service when the template was expecting a rich object? Exception or silent failure. Typo in the name of the parameter? Exception or silent failure. Better hope your test suite is thorough.
What I did today was wrap each template in a dataclass:
@dataclass class Template(BaseTemplate): service: str owner: str template: InitVar[str] = "template.html"
Now, the code to render it looks like:
template = Template(service, owner) self.finish(template.render(self))
and now I have type-checking of all of the template arguments and only need to ensure the dataclass definition matches the needs of the template implementation.
The magic happens in the base template class:
@dataclass class BaseTemplate: def render(self, handler: RequestHandler) -> str: template_name = getattr(self, "template") template = handler.environment.get_template(template_name) namespace = handler.get_template_namespace() for field in fields(self): namespace[field.name] = getattr(self, field.name) return template.render(namespace)
(My actual code is a bit different and more complicated since I move some
other template setup to the render()
method.)
There's some magic here to work around dataclass limitations that warrants some explanation.
I pass the Tornado handler class into the template render()
method
so that I have access to the template environment and the (overridden)
Tornado get_template_namespace()
call to get default variable
settings. Passing them into the dataclass constructor would make the code
less clean and is harder to implement, mostly due to limitations on
attributes with default values, mentioned below.
The name of the template file should be a property of the template
definition rather than something the caller needs to know, but that means
it has to be given last since dataclasses require that all attributes
without default values come before ones with default values. That in turn
also means that the template
attribute cannot be defined in
BaseTemplate
, even without a default value, because if a child
class sets a default value, @dataclass
then objects. Hence the
getattr
to hide from mypy the fact that I'm breaking type rules and
assuming all child classes are well-behaved.
template
in the child classes is marked as InitVar
so that
it won't be included in the fields of the dataclass and thus won't be
passed down to Jinja2.
Finally, it would be nice to be able to use dataclasses.asdict()
to
turn the object into a dictionary for passing into Jinja2, but
unfortunately asdict
tries to do a deep copy of all template
attributes, which causes all sorts of problems. I want to pass functions
and WTForms form objects into Jinja2, which resulted in asdict
throwing all sorts of obscure exceptions. Hence the two lines that walk
through the fields and add a shallow copy of each field to the template
namespace.
I've only converted four templates so far (this code base is littered with half-finished transitions to better ways of doing things that I try to make forward progress on when I can), but I'm already so much happier. All sorts of obscure template problems will now be caught at mypy even before needing to run the test suite.
Posted: 2019-11-09 15:02 — Why no comments?
Russ Allbery > Eagle's Path > November 2019 |