close

Moved

Moved. See https://slott56.github.io. All new content goes to the new site. This is a legacy, and will likely be dropped five years after the last post in Jan 2023.

Showing posts with label jinja. Show all posts
Showing posts with label jinja. Show all posts

Tuesday, June 21, 2022

My Shifting Understanding and A Terrible Design Mistake

I've been fascinated by Literate Programming forever. 

I have two utterly divergent takes on this.

See https://github.com/slott56/PyLit-3 for one.

See https://github.com/slott56/py-web-tool for another.

And yet, I've still done a really bad design job. Before we get to the design, a little bit of back story.

Back Story

Why two separate literate programming projects? Because it's not clear what's best. It's a field without too many boundaries and a lot of questions about the value produced.

PyLit I found, forked, and upgraded to Python 3. I didn't design it. It's far more clever than something I'd design.

Py-Web-Tool is something I wrote based on using a whole bunch of tools that follow along behind the original WEB tools. Nothing to do with web servers or web.py.

The Problem Domain

The design problem is, in retrospect, pretty obvious. I set it out here as a cautionary tale.

I'm looking at the markup languages for doing literate programming. The idea is to have named blocks of code in your document, presented in an order that makes sense to your reader. A tool will "weave" a document from your source. It will also "tangle" source code by rearranging the code snippets from presentation order into compiler-friendly order.

This means you can present your core algorithm first, even though it's buried in the middle of some module in the middle of your package. 

The presentation order is *not* tied to the order needed by your language's toolchain.

For languages like C this is huge freedom. For Python, it's not such a gigantic win.

The source material is a "web" of code and information about the code. A web file may look like this:

Important insight.

@d core feature you need to know about first @{
    def somecode() -> None:
        pass
@}

And see how this fits into a larger context?

@d something more expansive @{
def this() -> None:
    pass
    
def that() -> None:
    pass
    
@<core feature you need to know about first@>
@}

See how that works?

This is easy to write and (relatively) easy to read. The @<core feature you need to know about first@> becomes a hyperlink in the published documentation. So you can flip between the sections. It's physically expanded inline to tangle the code, but you don't often need to look at the tangled code.

The Design Question

The essential Literate Programming tool is a compiler with two outputs:

  • The "woven" document with markup and such
  • The "tangled" code files which are code, largely untouched, but reordered.

We've got four related problems.

  1. Parsing the input
  2. An AST we can process
  3. Emitting tangled output from the AST
  4. Emitting woven output form the AST

Or, we can look at it as three classic problems: deserialization, AST representation, and serialization. Additionally, we have two distinct serialization alternatives.

What did I do?

I tackled serialization first. Came up with a cool bunch of classes and methods to serialize the two kinds of documents.

Then I wrote the deserialization (or parsing) of the source WEB file. This is pretty easy, since the markup is designed to be as trivial as possible. 

The representation is little more than glue between the two.

What a mistake.

A Wrong Answer

Focusing on serialization was an epic mistake.

I want to try using Jinja2 for the markup templates instead of string.Template

However. 

My AST was such a bad hack job it was essentially impossible to use it. It was a quagmire of inconsistent ad-hoc methods to solve a specific serialization issue.

As I start down the Jinja road, I found a need to be able to build an AST without the overhead of parsing.

Which caused me to realize that the AST was -- while structurally sensible -- far from the simple ideal.

What's the ideal?

The Right Answer

This ideal AST is something that lets me build test fixtures like this:

example = Web(
   chunks=[
       TextChunk("\n"),
       NamedCodeChunk(name="core feature you need to know about first", lines=["def someconme() -> None: ...", "pass"])),
       TextChunk("\nAnd see how this fits into a larger context?\n"),
       NamedCodeChunk(name="something more expansive", lines=[etc. etc.])
   ]
)

Here's my test for usability: I can build the AST "manually" without a parser. 

The parser can build one, also, but I can build it as a sensible, readable, first-class Python object.

This has pointed me to a better design for the overall constructs of the WEB source document. Bonus. It's helping me define Jinja templates that can render this as a sensible woven document.

Tangling does not need Jinja. It's simpler. And -- by convention -- the tangled code does not have anything injected into it. The woven code is in a markup language (Markdown, RST, HTML, LaTeX, ASCII DOC, whatever) and some markup is required to create hyperlinks and code sections. Jinja is super helpful here. 

TL;DR

The essence of the problem is rarely serialization or deserialization.  It's the internal representation.


Tuesday, January 29, 2019

Things that start badly

Today's Example of Starting Badly: Building HTML.

The code has a super-simple email message with f"<html><body><p>stuff {data}</p></body></html>". It was jammed into an email object along with the text version. All very nice.

For a moment, I considered suggesting that f-string substitution wasn't a good long-term solution, since it doesn't cover anything more than the most trivial case.

Two things stopped me from complaining:
  • The case really was trivial.
  • It's administrative code: it sends naggy reminder emails periodically. Why over-engineer it?
What an idiot I was.

Today, the {data} has been replaced with a complex table instead of a summary. (Why? The user story evolved. And we needed to replace the summary with details.)

The engineer was pretty sure they could use htmlify(data) or data.htmlify() to transform the data into an HTML structure without seriously breaking the f-string nature of the app.

I should have commented "Don't build HTML that way, it's a bad way to start" on the previous release.

The f-string solution turns rapidly into complexities layered on complexities dusted over the top with sprinkles of NOPE.

This is a job for Jinja2 or Mako or something similar. 

There's a step function change in the app's perceived "complexity". Instead of a simple f-string, we now have to populate a template. It goes from one line of code to more than one (three seems typical.) And. The file-system loader for templates seems more appropriate rather than hard-coding the template in the body of the code. So there are now more files in the app with the HTML templates in them.

However. The Jinja {{variable|round(2)}} was an immediate victory. The use of {%for%} to build the HTML table was the goal, and that simplification was worth the price of entry. Now we're arguing over CSS nuances to get the columns to look "right."

Lessons learned.

Don't let the currently superficial trivial case slide past without at least a warning. Make the suggestion that functions like "get template" and "populate template" will be necessary even for trivial f-string or string.Template processing.

HTML isn't a first-class part of anything. It's external serialization.  Yes, it's for people, but it's only serialization. Serialization has to be separated from the other aspects of the data gathering, map-reduce summarization, and email distribution. There's a pipeline of steps there and the final app should reflect the complete separation of these concerns. Even if it is admin overhead.

Thursday, October 16, 2014

Using Bottle as a miniature demo server

Let's talk small.

When writing API's, it sometimes helps to have a small demo web site to show the API in a context that's easy to visualize. API's are sometimes abstract, and without an application to provide some context, it can be unclear why the path looks like that or why the JSON document has those fields.

I want to emphasize the "small" part of the small demo. A small page or two with some HTML forms and a submit button. Really small.

The actual customer-facing apps (mobile, mobile web, and full web site) are being built by lots of other people. Not us. They're big. We build the API's (there are a lot) that support the data structures that support the processing that supports the user experience.

Building fake mobile apps is right out. We're not going to lard on Android SDK or Xcode development environments to our already overburdened laptops. We build backend API's.

Building a fake mobile web or full web site is appealing. What makes it complex is the UX folks are building everything in Angular.js. If we want to properly implement a form, we would have to master Angular just to do a demo for the product owner.

No thanks. Still too far afield for API developers. We're focused on mongo and JSON and performance and scalability. Not Angular.js and the UX.

What we want to do is build a small web server which runs just a few pages plucked out of the UX demo code so that we can show how interactions with a web page put stuff in a database. And vice-versa: stuff in the database will show up on a web page.

"Really?" we get asked. Some folks look askance at us for wanting to put a small demo site together.

"Yes," we answer. "Our product owner has a big vision and we're breaking that into a bunch of little API's. It's not perfectly clear how we're building up to that vision."

It's not perfectly clear how some of this work. Folks outside the scrum team have distracting questions. We want to have a page or two where we can fill in a form and click submit and stuff happens. This is far easier to explain than showing them Postman or SoapUI and claiming that this will support some user stories.

And as we grow toward the epic, the workflow aspects of this will grow. The stuff that admin "A" does after user "U" has made an initial request. Or the stuff that internal user "I" does after external user "X" has done something. But really, it's just a few small web pages. Small.

Imagine the demo. On laptop #1, we'll show user "X". On laptop #2, we're running a Mongo shell to query what's in the db. On laptop #3 we're showing user "I". The focus is really the API's. And how the API's add up to an epic collection of stories.

Serving some HTML pages

Just to make it painful, we can't simply grab the demo web pages out of the UX team's SVN repository. Why not? First, it's an Angular app. We can't just grab some HTML and go. The demo pages are served via node.js with Bower, so it's not even clear (to us) what the complete deployment looks like.

So. We cheated. We took a screen shot. We trimmed the edges of the page as .PNG files. We wrote our own form and cobbled in enough CSS to make it look close. We're not here to fake out the UX. We just want to enter some data and have it tickle our API. (Indeed, we have a "Not The Real Experience" on some pages.)

Initially, some of the team members tried serving these small pages with WebLogic. Then Jetty. It's not bad. But it's Java. It takes forever to build and deploy something after a trivial change. There are a lot of moving parts even with Jetty, and not all of them are obvious.

Since we're building "enterprise" API's, we're deeply enmeshed with every feature of the Spring Framework. Our STS/Eclipse environments are fat with add-ons and features.

While the Spring Framework ideal is to allow a developer to focus on relevant details and have the irrelevant details handled automagically, the magic almost gets in the way. These are small applications that are little more than a few static pages with forms and a submit button. Spring can do it, of course. But we're often testing our the actual API's in a Jetty server (or two). If the demo site requires yet another instance of Jetty with yet another configuration, our ability to cope diminishes.

How can we get back to small?

Python and Bottle

Python has several web servers built-in. We can use http.server. We can use wsgiref. Both of these are almost OK for what we want to do.

We can do better with two small downloads: Bottle and Jinja2. With these, we can build simple HTML pages that show some data. We can build simple servers that collect form data, use http.client to make API requests, and write copious logging details. We can write little bottle apps that handle just GETs and POSTs simply.

This is suitably small.

We can share the module with the Bottle object and the HTML mock-up pages. We can fire up the app in an instant on anyone's laptop, no matter what else they're running. We can tweak the server to adjust the logging or the API request or the form.

We actually run the server from within Idle. Make a change and hit F5 to redeploy after a change. It's small. It's fast. And it doesn't involve the huge complexities associated with Java.

Bottle doesn't do much. But what little it does do is a pretty tidy fit with tiny little demonstrations of super-simple HTML interactions.

Wednesday, May 20, 2009

This sounds complicated, because it is

For a while, I generated documentation with Cheetah. I wrote bodies as a fragment of HTML and used Cheetah to wrap those bodies in standard templates with navigation and branding.

To write my books, I learned DocBook markup and used DocBook XSL tools to create HTML and PDF versions of the book's text. Even though XML is hard to work with, I managed to muddle through. It's painful -- at times -- but doable.  

[Eventually, I found XMLMind's XML Editor.  It rocks.  But that's off-topic.]

Then, I fount RST and RST2HTML.  For a while, I wrote my documentation in RST and used a simple script to create the HTML version of the documentation from RST source.

Why ReStructuredText?

From their site: "reStructuredText is an easy-to-read, what-you-see-is-what-you-get plaintext markup syntax".  
  • Easy-to-Read.  The markup is very, very simple.  Mostly spacing and simple quoting.  Yet, for edge cases, there is enough richness to approach DocBook XML.
  • WYSIWYG.  The markup doesn't get in the way; you write the text with a few conventions for spacing and quoting.
  • Plain Text.  A few spacing and quoting rules are used to distinguish structure from content.  Presentation is a limited part RST (like HTML where some presentation is present in the structural markup, but can be avoided.)
RST lead me, eventually to Sphinx

The Secret of Sphinx

Sphinx is RST-based markup.  You write in plaintext (plus some quoting and spacing) and you get an elegant HTML web site with inter-document references all resolved correctly, contents, indexes, auto-generated API documentation for your Python software, syntax coloring, everything.  Wow.

I can't stop myself from doing everything in Sphinx.  You create a development structure for your source files.  You use a series of toctree directives to build the resulting documentation structure that people will see and use.

I've decided to convert some ancient Cheetah-based stuff to Sphinx.  

Unmarking Up

Revising HTML-based document bodies to RST is annoying.  It can be done with Beautiful Soup.  The HTML is pretty regular (and pretty simple) so it wouldn't be too bad.  Except for a bunch of edge cases that have significant complexity.

The original Cheetah-based site wasn't purely documentation.  It doesn't fit the Sphinx use cases perfectly.  A fairly significant percentage of the Cheetah-based pages are HTML pages with complex, embedded applets to do calculations.

These pages are not -- strictly speaking -- documentation.  They're an application.  They contain markup (<embed> mostly) that RST can't generate.  Further, they have to be unit tested prior to running Sphinx to build the documentation, since the HTML is actually part of the application.

Raw HTML?

The applet pages are -- more or less -- raw HTML pages that need to be folded in with the Sphinx-generated documentation.  Sphinx has an HTML_STATIC_PATH configuration parameter that can copy these applications from project folders into destination directories.

But this leaves me with dozens of Cheetah-generated pages as part of this application.  The presence of Cheetah in the midst this Sphinx operation makes things complicated.

Or, perhaps it doesn't.

It turns out that Sphinx is built on Jinja.  There's a template engine under the hood!  That's handy.  That lets me build the application HTML with a slightly different template engine; one that's compatible with the rest of the Sphinx-generated site.

I think I've got a clean, RST-based replacement for my lovingly hand-crafted HTML.  It's a lot of rework, but the simplification is of immense value.