close

Moved

Moved. See https://slott56.github.io. All new content goes to the new site. This is a legacy, and will likely be dropped five years after the last post in Jan 2023.

Showing posts with label configuration management. Show all posts
Showing posts with label configuration management. Show all posts

Tuesday, March 24, 2015

Configuration Files, Environment Variables, and Command-Line Options

We have three major tiers of configuration for applications. Within each tier, we have sub-tiers, larding on yet more complexity. The organization of the layers is a bit fungible, too. Making good choices can be rather complex because there are so many variations on the theme of "configuration". The desktop GUI app with a preferences file has very different requirements from larger, more complex applications.

The most dynamic configuration options are the command-line arguments. Within this tier of configuration, we have two sub-tiers of default values and user-provided overrides to those defaults. Where do the defaults come from? They might be wired in, but more often they come from environment variables or parameter files or both.

There's some difference of opinion on which tier is next in the tiers of dynamism. The two choices are configuration files and environment variables. We can consider environment variables as easier to edit than configuration files. In some cases, though, configuration files are easier to change than environment variables. Environment variables are typically bound to the process just once (like command-line arguments), where configuration files can be read and re-read as needed.

The environment variables have three sub-tiers. System-level environment variables tend to be fixed. The variables set by a .profile or .bashrc tend to be specific to a logged-in user, and are somewhat more flexible that system variables. The current set of environment variables associated with the logged-in session can be modified on the command line, and are as flexible as command-line arguments.

Note that we can do this in Linux:

PYTHONPATH=/path/to/project python3 -m some_app -opts

This will set an environment variable as part of running a command.

The configuration files may also have tiers. We might have a global configuration file in /etc/our-app. We might look for a ~/.our-app-rc as a user's generic configuration. We can also look for our-app.config in the current working directory as the final set of overrides to be used for the current invocation.

Some applications can be restarted, leading to re-reading the configuration files. We can change the configuration more easily than we can bind in new command-line arguments or environment variables.

Representation Issues

When we think about configuration files, we also have to consider the syntax we want to use to represent configurable parameters. We have five common choices.

Some folks are hopelessly in love with Windows-style .ini files. The configparser module will parse these. I call it hopelessly in love because the syntax is rather quite limited. Look at the logging.config module to see how complex the .ini file format is for non-trivial cases.

Some folks like Java-style properties files. These have the benefit of being really easy to parse in Python. Indeed, scanning a properties file is great exercise in functional-style Python programming.
I'm not completely sold on these, either, because they don't really handle the non-trivial cases well.

Using JSON or YAML for properties has some real advantages. There's a lot of sophistication available in these two notations. While JSON has first-class support, YAML requires an add-on module.

We can also use Python as the language for configuration. For good examples of this, look at the Django project settings file. Using Python has numerous advantages. The only possible disadvantage is the time wasted arguing with folks who call it a "security vulnerability."

Using Python as the configuration language is only considered a vulnerability by people who fail to realize that the Python source itself can be hacked. Why waste time injecting a bug into a configuration file? Why not just hack the source?

My Current Fave 

My current favorite way to handle configuration is by defining some kind of configuration class and using the class object throughout the application. Because of Python's import processing, a single instance of the class definition is easy to guarantee.

We might have a module that defines a hierarchy of configuration classes, each of which layers in additional details.

class Defaults:
    mongo_uri = "mongodb://localhost:27017" 
    some_param = "xyz" 

class Dev(Defaults):
    mongo_uri = "mongodb://sandbox:27017"

class QA(Defaults):
    mongo_uri = "mongodb://username:password@qa02:27017/?authMechanism=PLAIN&authSource=$external"

Yes. The password is visible. If we want to mess around with higher levels of secrecy in the configuration files, we can use PyCrypto and a key generator to use an encrypted password that's injected into the URI. That's a subject for another post. The folks to can edit the configuration files often know the passwords. Who are we trying to hide things from?

How do we choose the active configuration to use from among the available choices in this file? We have several ways.
  • Add a line to the configuration module. For example, Config=QA will name the selected environment. We have to change the configuration file as our code marches through environments from development to production. We can use from configuration import Config to get the proper configuration in all other modules of the application.
  • Rely on the environment variable to specify which configuration use. In enterprise contexts, an environment variable is often available.We can import os, and use Config=globals()[os.environ['OURAPP_ENVIRONMENT']] to pick a configuration based on an environment variable. 
  • In some places, we can rely on the host name itself to pick a configuration. We can use os.uname()[1] to get the name of the server. We can add a mapping from server name to configuration, and use this: Config=host_map(os.uname()[1],Defaults).
  • Use a command-line options like "--env=QA". This can a little more complex than the above techniques, but it seems to work out nicely in the long run.
Command-line args to select a specific configuration

To select a configuration using command-line arguments, we must decompose configuration into two parts. The configuration alternatives shown above are placed in a config_params.py module. The config.py module that's used directly by the application will import the config_params.py module, parse the command-line options, and finally pick a configuration. This module can create the required module global, Config. Since it will only execute once, we can import it freely.

The config module will use argparse to create an object named options with the command-line options. We can then do this little dance:

import argparse
import sys
import config_params

parser= argparse.ArgumentParser()
parser.add_argument("--env", default="DEV")
options= parser.parse_args()

Config = getattr(config_params, options.env)
Config.options= options

This seems to work out reasonably well. We can tweak the config_params.py flexibly. We can pick the configuration with a simple command-line option.

If we want to elegantly dump the configuration, we have a bit of a struggle. Each class in the hierarchy introduces names: it's a bit of work to walk down the __class__.__mro__ lattice to discover all of the available names and values that are inherited and overridden from the parents.

We could do something like this to flatten out the resulting values:

Base= getattr(config_params, options.env)
class Config(Base):
    def __repr__(self):
        names= {}
        for cls in reversed(self.__class__.__mro__):
            cls_names= dict((nm, (cls.__name__, val)) 
                for nm,val in cls.__dict__.items() 
                    if nm[0] != "_")
            names.update( cls_names )
        return ", ".join( "{0}.{1}={2}".format(class_val[0], nm, class_val[1]) 
            for nm,class_val in names.items() )

It's not clear this is required. But it's kind of cool for debugging.

Tuesday, September 21, 2010

A Really Bad Idea -- Adding Clutter to A Language

A DBA suggested that I read up on "Practical API Design: Confessions of a Java Framework Architect".

Apparently the DBA had read the phrase "direct compiler support of versioning of APIs" in a review of the book and -- immediately -- become terribly confused.

I can see why a DBA would be confused. From a DBA's point of view all data, all processing and all management-- all of it -- is intimately tied to a single tool. The idea behind Big Relational is to conflate configuration management, quality assurance, programming and the persistent data so that the product is inescapable.

[The idea is so pervasive that not using the RDBMS has to be called a "movement", as in "NoSQL Movement". It's not a new idea -- it's old wine in new bottles -- but Big Relational has become so pervasive that avoiding the database makes some folks feel like renegades.]

Adding to the confusion is the reality that DBA's live in a world where version management is difficult. What is an API version number when applied to the database? Can a table have a version? Can a schema have a version?

[IMO, the answer is yes, database designs -- metadata -- can easily be versioned. There's no support in the database product. But it's easy to do with simple naming conventions.]

For a DBA -- who's mind-set is often twisted into "one product hegemony" and "versioning is hard" -- the phrase "direct compiler support of versioning of APIs" maps to "direct tool/database/everything support of versioning." Nirvana.

All Things in Moderation

A relevant quote from the book is much more sensible than this fragment of a review. "Some parts of the solution should be in the compiler, or at least reflected in the sources, and processed by some annotation processor later."

API versioning is not a good idea for adding to a programming language. At all. It's entirely a management discipline. There's no sensible avenue for "language" support of versioning. It can make sense to carry version information in the source, via annotations or comments. But to augment a language to support management can't work out well in the long run.

Why not?

Rule 1. Programming Languages are Turing Complete. And Nothing More. Syntactic sugar is okay, if it can be proven to be built on the Turing complete core language. Extra "features" like version control are well outside the minimal set of features required to be Turing complete. So far outside that they make a completeness proof hard because there's this extra stuff that doesn't matter to the proof.

Therefore: Don't Add Features. The language is the language. Add features via a toolset or a annotation processor or somewhere else. Your API revision junk will only make the proof of completeness that much more complex; and the proof won't touch the "features".

Rule 2. Today's Management Practice is Only A Fad. Version numbering for API's with a string of Major.Minor.Release.Patch is simply a trendy fad. No one seems to have a consistent understanding of what those numbers mean. Further, some tools (like subversion) simply using monotonically increasing numbers -- no dots.

Someday, someone will come up with an XML Feature Summary (XFS) for describing features and aspects of the the API, and numbers will be dropped as uselessly vague and replaced with a complex namespace of features and feature enumeration and a URI referencing an RDF that identifies the feature set. Numbers will be replaced with URI's.

Therefore: Don't Canonize Today's Management Practice in the Language. When the current practice has faded from memory, we don't want to have to retool our programming languages.

What To Do?

What we do for API version control is -- well -- hard work. Annotations are good. A tool that scrapes out the annotations to create a "profile" of the API might be handy.

In Python (and other dynamic languages) it's a much simpler problem than it is in static languages like Java and C++. Indeed, API version management may be one of the reasons for the slow shift from static to dynamic languages.

If we try to fold in complex language features for API version support, we introduce bugs and problems. Then -- when management practice drifts to a new way of handling API's -- we're stuck with bad language features. We can't simply deprecate them, we have to find a new language that has similar syntax, but lacks the old-bad API management features.

Distutils

Python distutils has a nice "Requires", "Provides" and "Obsoletes" specification that's part of the installation script. This is a handy level of automation: the unit of configuration management (the module) is identified at a high level using simple numbers. More than this is probably ill-advised.

And -- of course -- this isn't part of the Python language. It's just a tool.

Thursday, February 4, 2010

ALM Tools

There's a Special Report in the January 15 SDTimes with a headline that bothers me -- a lot. In the print edition, it's called "Can ALM tame the agile beast?". Online it's ALM Tools Evolve in the Face of Agile Processes.

The online title makes a lot more sense than the print title. The print title is very disturbing. "Agile Beast?" Is Agile a bad thing? Is it somehow out of control? It needs to be "tamed"?

The article makes the case -- correctly -- that ALM tools are biased toward waterfall projects with (a) long lead times, (b) a giant drop of deliverables, and (c) a sudden ending. Agile projects often lack these attributes.

The best part of the special report is the acknowledgement that "barriers between developers and QA are disappearing". TDD works to blur the distinction between test and development, which is a very good thing. Without unit tests, how do you know you're finished coding?

How Many Tools Do We Need?

The point in the article was that the ALM vendors have created a collection of tools, each of which seems like a good idea. However, it's too much of the wrong thing for practical, Agile, project management.

The article claims that there were three tools for requirements, tests and defects. I've seen organizations with wishlists that are much bigger than these three. The Wikipedia ALM Article has an insane list of 16 tools (with some overlaps).

Of these, we can summarize them into the following eight categories, based on the kind of information kept. Since the boundaries are blurry, it isn't sensible to break these up by who uses them.
  • Requirements - in user terms; the "what"
  • Modeling and Design - in technical terms; an overview of "how"
  • Project Management (backlog, etc.) - requirements and dates
  • Configuration Management - technology components
  • Build Management - technology components
  • Testing - components, tests (and possibly requirements)
  • Release and Deployment - more components
  • Bug, Issue and Defect Tracking - user terms, requirements, etc.
Agile methods can remove the need for most (but not all) of these categories of tools. If the team is small, and really collaborating with the users, then there isn't the need to capture a mountain of details as well as complex management overviews for slice-and-dice reporting.

YAGNI

Here's a list of tools that shouldn't be necessary -- if you're collaborating.
  • Requirements have an overview in the backlog, on the scrumboard. Details can be captured in text documents written using simple markup like RST or Markdown. You don't need much because this is an ongoing conversation.
  • Modeling and Design is a mixture of UML pictures and narrative text. Again, simpler tools are better for this. Tool integration can be accomplished with a simple web site of entirely static content showing the current state of the architecture and any detail designs need to clarify the architecture. Write in RST, build it with Sphinx.
  • Project Management should be simply the backlog. This is digested into periodic presentations to various folks outside the scrum team. There isn't much that can be automated.
For UML pictures, ARGO UML is very nice. Here's a more complete list of Open Source UML Tools from Wikipedia.

Configuration Management

This is, perhaps, the single most important tool. However, there are two parts to this discipline.
For Revision Control, Subversion works very nicely.

Continuous Integration

The more interesting tools fall under the cover of "Continuous Integration". Mostly, however, this is just automation of some common tasks.
  • Build Management might be interesting for complex, statically compiled applications. Use of a dynamic language (e.g., Python) can prevent this. Build management should be little more than Ant, Maven or SCons.

    Additional tools include the Build Automation list of tools.

  • Testing is part of the daily build as well as each developer's responsibility. It should be part of the nightly build, and is simply a task in the build script.

    Overall integration or acceptance testing, however, might require some additional tools to exercise the application and confirm that some suite of requirements are met. It may be necessary to have a formal match-up between user stories and acceptance tests.

    There's a Wikipedia article with Testing Tools and Automated Testing. Much of this is architecture-specific, and it's difficult to locate a generic recommendation.

  • Release and Deployment can be complex for some architectures. The article on Software Deployment doesn't list any tools. Indeed, it says "Because every software system is unique, the precise processes or procedures within each activity can hardly be defined."

    Something that's important is a naming and packaging standard, similar to that used by RPM's or Python .egg files. It can be applied to Java .EAR/.WAR/.JAR files. Ideally, the installed software sits in a standard directory (under /opt) and a configuration file determines which version is used for production.

    Perhaps most important is the asset tracking, configuration management aspect of this. We need to plan and validate what components are in use in what locations. For this BCFG2 seems to embody a sensible approach.

For most build, test and release automation, SCons is sufficient. It's easily extended and customized to include testing.

More elaborate tools are listed in the Continuous Integration article.

Customer Relationship Management

The final interesting category isn't really technical. It includes tools for Bug, Issue and Defect Tracking. This is about being responsive to customer requests for bug fixes and enhancements.

The Comparison of Issue Tracking Systems article lists a number of products. Bugzilla is typical, and probably does everything one would actually require.

Old and Busted

I've seen organizations actively reject requirements management tools and use unstructured documents because the tool (Requisite Pro) imposed too many constraints on how requirements could be formalized and analyzed.

This was not a problem with tool at all. Rather, the use of a requirements management tool exposes serious requirements analysis and backlog management issues. The tool had to be dropped. The excuse was that it was "cumbersome" and didn't add value.

[This same customer couldn't use Microsoft Project, either, because it "didn't level the resources properly." They consistently overbooked resources and didn't like the fact that this made the schedule slip.]

When asked about requirements tools, I suggest people look at blog entries like this one on Create a Collaborative Workspace or these pictures of a well-used scrumboard.

Too much software can become an impediment. The point of Agile is to collaborate, not use different tools. Software tools can (and do) enforce a style of work that may not be very collaborative.

Bottom Line

Starting from the ALM overview, there are potentially a lot of tools.

Apply Agile methods and prune away some of the tools. You'll still want some design tools to help visualize really complex architectures. Use Argo UML and plain text.

Developers need source code revision control. Use Subversion.

Most everything else will devolve to "Continuous Integration", which is really about Build and Test, possibly Release. SCons covers a lot of bases.

You have some asset management issues (what is running where?) There's a planning side of this as well as an inventory side of confirming the configuration. Use BCFG2.

And you have customer relationship management issues (what would you like to see changed?) Use Bugzilla.

Monday, October 26, 2009

Process Not Working -- Must Have More Process

After all, programmers are all lazy and stupid.

Got his complaint recently.

"Developers on a fairly routine basis check in code into the wrong branch."

Followed by a common form of the lazy and stupid complaint. "Someone should think about which branch is used for what and when." Clearly "someone" means the programmers and "should think about" means are stupid.

This was followed by the "more process will fix this process problem" litany of candidate solutions.

"Does CVS / Subversion have a knob which provides the functionality to
prevent developers from checking code into a branch?"

"Is there a canonical way to organize branches?" Really, this means something like what are the lazy, stupid programmers doing wrong?

Plus there where rhetorical non-questions to emphasize the lazy, stupid root cause. "Why is code merging so hard?" (Stupid.) "If code is properly done and not coupled, merging should be easy?" (Lazy; a better design would prevent this.) "Perhaps the developers don't understand the code and screw up the merge?" (Stupid.) "If the code is not coupled, understanding should be easy?" (Both Lazy and Stupid.)

Root Cause Analysis

The complaint is about process failure. Tools do not cause (or even contribute) to process failure. There are two possible contributions to process failure: the process and the people.

The process could be flawed. There could be no earthly way the programmers can locate the correct branch because (a) it doesn't exist when they need it or (b) no one told them which branch to use.

The people could be flawed. For whatever reason, they refuse to execute the process. Perhaps they know a better way, perhaps they're just being jerks.

Technical means will not solve either root cause problem. It will -- generally -- exacerbate it. If the process is broken, then attempting to create CVS / Subversion "controls" will end in expensive, elaborate failure. Either they can't be made to work, or (eventually) someone will realize that they don't actually solve the problem. On the other hand, if the people are broken, they'll just subvert the controls in more interesting, silly and convoluted ways.

My response -- at the time -- was not "analyze the root causes". When I first got this, I could only stare at it dumbfounded. My answer was "You're right, your developers are lazy and stupid. Good call. Add more process to overcome their laziness and stupidity."

After all, the questioner clearly knows -- for a fact -- that more process helps fix a broken organization. The questioner must be convinced that actually talking to people will never help.
The question was not "what can I do?" The question was "can I control these people through changes to CVS?" There's a clear presumption of "process not working -- must have more process."

The better response from me should have been. "Ask them what the problem is." I'll bet dollars against bent pins that no one tells them which branch to use in time to start work. I'll bet they're left guessing. Also, there's a small chance that these are off-shore developers and communication delays make it difficult to use the correct branch. There may be no work-orders, just informal email "communication" between on-shore and off-shore points-of-contact (and, perhaps, the points-of-contact aren't decision-makers.)

Bottom Line. If people can't make CVS work, someone needs to talk to them to find out why. Someone does not need to invent more process to control them.