Posts for December 2019

2019-12-09: More fun with Jinja2 templates

When last I left this discussion, I was advocating using Python 3 dataclasses to wrap Jinja2 templates. I had another idea and a chance to experiment with it, and I was reasonably happy with the results.

Can the dataclass corresponding to the Jinja2 template be used by the test suite to check that all required parameters for a template are present in the dataclass?

The answer is mostly yes, although unfortunately there are some substantial caveats because Jinja2 doesn't provide all of the tools that one would like to analyze parsed templates.

The basic idea is to ask Jinja2 for all the variables used in a template. It provides a function for doing this:

from jinja2.meta import find_undeclared_variables

def get_template_variables(environment: Environment, name: str) -> Set[str]:
    source = environment.loader.get_source(environment, name)[0]
    ast = environment.parse(source)
    return find_undeclared_variables(ast)

Then, use the dataclass introspection features to see what variables it defines, and compare that to the variables used in the template:

from dataclasses import fields

template_fields = fields(template_class)
expected: Set[str] = set()
for template_field in template_fields:
    if template_field.name == "template":
        template = template_field.default
    else:
        expected.add(template_field.name)
assert template

wanted = get_template_variables(environment, template)
assert expected == wanted, f"fields for {template}"

This uses the convention (defined in the previous post) that the name of the template is stored in an InitVar field of the dataclass named template as the default value.

Then, this can be turned into a test by putting it in a for loop:

for template_class in BaseTemplate.__subclasses__():

This works and diagnoses template variables that aren't defined by the dataclass, catching problems that otherwise would have to rely on test coverage and allowing more confidence in enabling the Jinja2 StrictUndefined option for all your templates if (like this project) you made the mistake of not starting that way.

In my actual code, I pass in a set of variables into every template in addition to the ones in the dataclass. If you also use that (common) pattern, the easiest way to handle it is to have the test include a set of names of variables that are always set and remove those from the result of get_template_variables before comparing it against the expected list.

There are just a few problems, all of which I think are bugs in Jinja2 (although I've not yet written them up properly):

  1. find_undeclared_variables does not understand macro inclusion (such as from 'macros.html' import dropdown). All the macros are returned as undeclared variables. I tried various approaches to fix this, such as concatenating the files that define macros together with the files that use them before doing the parse, none of which worked. I had to list all the macros and exclude them from the returned template variables to work around this.

  2. The approach above only parses a file in isolation, so it doesn't expand include directives and doesn't support extends. The latter isn't much of a problem for this code base because there is only one level of extends, and the base template only uses default variables. I worked around the lack of support for include by inlining the previously-included templates, which is a bit irritating from a code reuse standpoint.

These are unfortunate problems and make this technique a bit less clean than it otherwise would be, but they're relatively minor. I think it's worth it for being able to test the consistency between dataclass wrappers and underlying templates. Hopefully Jinja2 will provide some better utility functions for doing this sort of testing in the future.

2019-12-11: Astronomy!

Starting in January of 2020, I will be joining the Data Management team (specifically the Science Quality and Reliability Engineering team) for the Large Synoptic Survey Telescope.

There's a much longer description at the above Wikipedia link and also at LSST's own site, but the short version is that the mission of LSST is to survey the entire southern night sky about twice a week for ten years. This in turn will provide vast amounts of data that will be used to do wide-ranging research in astronomy. All of that data requires indexing and processing so that scientists can use it. The team I'm joining is applying current software engineering techniques (containers, Jupyter notebooks, continuous integration, and so on) to that problem.

For me, this is an opportunity to return to the academic, non-profit world that's always been my first love. It's also an opportunity to learn a bunch of new things (astronomy, for the most obvious, and scientific research computing more generally, but also some areas of technology that I've never had enough time to explore). Even better, everything my new team does is free software and is developed on GitHub, which means I'll be returning to a job where free software is at the center of the work instead of an optional side project for which there's rarely time.

Dropbox has been a great employer and I wasn't looking to leave, but this was too good of an opportunity to pass up.

I'm going to be staying in the Bay Area and working remotely, which also means that I'm going to get about six hours a week back from the commute, and have an opportunity to wander around the area and find interesting places from which to work, something that I'm looking forward to.

When I went to Dropbox, I thought that would mean a bit more time for Debian, and was sadly completely incorrect. No promises this time, but I have some reasons to hope that I'll at least be able to get back to the levels of involvement I had at Stanford. Even if the new job, since it's scientific computing, is using CentOS....

2019-12-25: podlators 4.13

podlators provides the utilities to convert Perl's POD documentation syntax to text and man pages.

In this release, I finally dropped support for Perl 5.6. Even CPAN testers have stopped testing this old of a version, and it isn't available on Travis-CI, which means that support may well regress without me knowing about it. It felt like time.

I considered bumping the required version high enough that I could use a few new features (use parent and the version argument to package), but I decided to be conservative since CPAN testers are still actively testing Perl 5.8, so I only bumped the required version to 5.8.

I also finally tracked down the weird inconsistencies in S<> handling with Pod::Text, which turned out to be due to Unicode strings changing the meaning of \s in regexes. This should now be consistent regardless of the input character set.

Finally, this release changes some Pod::Text::Termcap behavior. Zenin pointed out that it doesn't make sense to assume ECMA-048 escape sequences if Term::Cap doesn't provide escape sequences for one of the types of formatting that we want to do, so this release gets rid of the fallbacks if Term::Cap doesn't have relevant information. It also removes a workaround for problems on ancient Solaris systems that led the module to set the TERMPATH environment variable globally, which is poor behavior for a module.

You can get the latest release from CPAN or from the podlators distribution page.

Last spun 2020-10-05 from thread modified 2019-12-26