Monday, July 24, 2006

Agile SCM Principles - From OOD to TBD+CBV+POB

I finally finished a set of articles I'd been working on for almost 10 years on and off on the subject of "translating" principles of OOD into principles of SCM. See the following:
The principles of OOD translated into principles of Task-Based Development (TBD), Container-based Versioning (CBV), and Project-Oriented Branching (POB).

Here are the principles that I translated. Most of them are from Robert Martin's book Agile Software Development: Principles, Patterns, and Practices, but a couple of them are from The Pragmatic Programmers:


Here is what I ended-up translating them into. Note that some of the principles translated into more than one principle for version control because they applied to more than one of changes/workspaces, baselines, and codelines. I'm not real thrilled about the names & acronyms for several of them and am open to alternative names & acronyms:

    General Principles of Container-Based Versioning
    The Content Encapsulation Principle (CEP) All version-control knowledge should have a single authoritative, unambiguous representation within the system that is its "container. In all other contexts, the container should be referenced instead of duplicating or referencing its content.
    The Container-Based Dependency Principle (CBDP) Depend upon named containers, not upon their specific contents or context. More specifically, the contents of changes and workspaces should depend upon named configurations/codelines.
    The Identification Insulation Principle (IDIP) A unique name should not identify any parts of its context nor or of its related containers (parent, child or sibling) that are subject to evolutionary change.
    The Acyclic Dependencies Principle (ADP) The dependency graph of changes, configurations, and codelines should have no cycles.
    Principles of Task-Based Development
    The Single-Threaded Workspace Principle (STWP) A private workspace should be used for one and only one development change at a time.
    The Change Identification Principle (CHIP) A change should clearly correspond to one, and only one, development task.
    The Change Auditability Principle (CHAP) A change should be made auditably visible within its resulting configuration.
    The Change/Task Transaction Principle (CHTP) The granule of work is the transaction of change.
    Principles of Baseline Management
    The Baseline Integrity Principle (BLIP) A baseline's historical integrity must be preserved - it must always accurately correspond to what its content was at the time it was baselined.
    The Promotion Leveling Principle (PLP) Define fine-grained promotion-levels that are consumer/role-specific.
    The Integration/Promotion Principle (IPP) The scope of promotion is the unit of integration & baselining
    Principles of Codeline Management
    The Serial Commit Principle (SCP) A codeline, or workspace, should receive changes (commits/updates) to a component from only one source at a time.
    The Codeline Flow Principle (CLFP) A codeline's flow of value must be maintained - it should be open for evolution, but closed against disruption of the progress/collaboration of its users.
    The Codeline Integrity Principle (CLIP) Newly committed versions of a codeline should consistently be no less correct or complete than the previous version of the codeline.
    The Collaboration/Flow Integration Principle (CFLIP) The throughput of collaboration is the cumulative flow of integrated changes.
    The Incremental Integration Principle (IIP) Define frequent integration milestones that are client-valued.
    Principles of Branching & Merging
    The Codeline Nesting Principle (CLNP) Child codelines should merge and converge back to (and be shorter-lived than) their base/parent codeline.
    The Progressive-Synchronization Principle (PSP) Synchronizing change should flow in the direction of historical progress (from past to present, or from present to future): more conservative codelines should not sync-up with more progressive codelines; more progressive codelines should sync-up with more conservative codelines.
    The Codeline Branching Principle (CLBP) Create child branches for value-streams that cannot "go with the flow" of the parent.
    The Stable Promotion Principle (SPP) Changes and configurations should be promoted in the direction of increasing stability.
    The Stable History Principle (SHIP) A codeline should be as stable as it is "historical": The less evolved it is (and hence more mature/conservative), the more stable it must be.

You can read the 2nd article to see which version-control principles were derived from which OOD principles. Like I mentioned before, I'm not real thrilled about the names & acronyms for several of them and am open to alternative names & acronyms. So please share your feedback on that (or on any of the principles, and how they were "derived").

3 comments:

Anonymous said...

I'm finding it very difficult to relate the principles to my current version control environment (CVS supporting an Oracle Data Warehouse development).

There are things that we are trying to achieve, and most of our habits seem to be helpful.

I'm sure we could do better, and I'm trying to use the principles to apply to our situation. Here's where the approach looks promising...

* Content encapsulation - we represent each component of our ETL solution as a single exported text file in CVS

* Acyclic dependencies - component usage is hierarchical (ETL items generated from the tool can depend on hand-coded packages but not vice versa); the CVS repository structure mirrors the code deployment structure

...and here's where I'm struggling:

* Container-based dependency - I find this hard to understand. Can you provide an example?

* Identification insulation - each of our component names is chosen to reflect the context to some extent, so that its purpose can be inferred more easily in the deployed environment; and the exported, 'encapsulated', components actually have their location within the code hierarchy embedded in the file (a bit like 'package com.acme.blah' in a Java file?)

regards
Bob

Brad Appleton said...

Hi Bob!
I agree that this draft of version control principles isnt real easy to comprehend. The words and phrasing are still more steeped in the object-oriented domain and not targeted to a version-control audience. I need to work on rewording that, and probably renaming several of the principles too (so your feedback is greatly appreciated).

Content Encapsulation is really about identifying those things from the "version control" domain that correspond to classes/objects (and hence units of encapsulation and abstraction). I think those things are :

Changes: a "change" encapsulates a set of revisions to a set of files, and they are all checked-in/merged together.

Versions: a version (or a "named configuration") is represented by a tag or label. If the version is "blessed", we baseline it and call it a "baseline". We refer to the contents of the version by using the name of the label/tag instead of trying to enumerate specific file revisions that belong to that "version".

Codelines: a codeline encapsulates a "current/latest" version in an evolving progression of versions. Instead of trying to keep-up with the name of the most recent tag/label I simply use the name of the codeline, and I either rely on the "tip" of the codeline to be the "latest and greatest stuff", or else I rely on some kind of "floating" or "sticky" tag to always reflect the "last good build" of the codeline.

This is where Container-Based Dependency comes in to play. Anytime I want to obtain the contents of a particular "view" or "configuration" of files, I should try and refer to the name of the container, instead of trying to point to specific contents.

This happens most often when I am first populating my workspace (sandbox) with file version, and also when Im merging a set of files into my workspace. The application of CBDP would be ...

When populating a workspace, use a codeline or baseline to reference the initial view of versions see/use. Dont try to cherry-pick specific files revisions, or specific changes if a codeline-name or version-name will get the job done.

When merging changes into a workspace, again you typically want to merge from a codeline (or a baseline) rather than from a specific change or a specific set of file-revisions.

For identification insulation, the idea is that the "unique identifier" should be stuff that wont change. If a branch is dedicated for a particular release or iteration, then naming it after that release or iteration is fine.

If I then create a task-branch for a feature named "ABC" and I name the task-branch "rel1_iter3_abc", then what happens if that feature is deferred to iteration-4 or even release-2?

Do I leave the branch-name "as is" and live with the inconsistency? or do I rename the branch and hope no other person, tool, query or system was using that information or (worse yet) copied it into some other tool or system expecting it to be a 'foreign key' for my change?

if I dont put the release or iteration in the name of the task-branch, I dont have to face that dilemma! Those are pieces of context information about the "intended target". Making them part of the branch-name violates the version-control equivalent of the Law-of-Demeter.

It also violates the "Lean" principle of "deferring commitment" (Im "commiting" the association between that content and that context in the database, and if it changes, I have a dillemma no matter what).

Granted, for usability, it might be really nice to see that information in the branch name itself. But at what cost/impact if the context changes?

How terrible is it if I add a level of indirection and perhaps make the release+iteration instead be separate attributes/properies that are associated with the branch (either in the version-control tool, or the change-tracking tool) instead of being part of its name?

That example used a branch-name. There are similar cases for things like document names/identifiers, names of "subparts" that assume they know what "containing-part" they belong to (how do we know it want be refactored to another location/components someday in the near future, or perhaps be part of multiple components or products instead of just one?)

Hope that helps! (and thanks for the feedback!)

Anonymous said...

Thanks, Brad

I'm looking for ways to improve the structure of our code, and the organisation of our version control repository.

After a year, it's become apparent where some of the dependencies are: the aspects of our current setup that are somewhat awkward to handle. The (apparent) inertia of the version control respository and the development environment means that we don't actually reorganise much. We're too busy :-)

I believe that we could organise our version control much more as we might "wish we had done", and as we do so I think the structure of the code also needs to change somewhat.

At the moment, I'm trying to map some of these concepts onto our environment, as we learn more about what we've built (!).

Thanks again for the conversation.
Regards,
Bob