Versioning is Just Too Complex


I’m currently trying to unlearn 10 years of CVCS and Subversion certainties to learn DVCS and Git. And although a lot of people see those as a huge leap forward, I can’t help thinking it’s still way too hard. Versioning is such a basic need, it shouldn’t require that much knowledge. Right now, whether it’s Subversion or Git (or Mercurial, or whatever), the problem is always the same for me: it’s just too low-level. Those systems are designed as if I needed to be a mechanic to drive my car. I don’t care about injectors, gearbox and all that stuff. All I want is to go faster or slower, turn left or right, and that’s about it. Now sure if I’m a mechanic, I can make my car perform much better, but if I just need to get from point A to point B, do you really think I will learn mechanics? Most people won’t. They will just walk there, or ride a bike. And that’s exactly what’s happening in too many companies: “manual versioning”, version numbers in file names, shared drives, even $harepoints, are just poor solutions to a real problem. We need versioning, but existing solutions have an unacceptable learning curve. They’re not smart enough, and they require us to be smart, even though that’s not our main purpose.

Even as a developer, sure I can understand such concepts, but I don’t want to learn what merge, pull, push, commit, branch or update mean. I want to start new projects, participate in existing projects, fix bugs, implement features, refactor my code. Whatever underlying system my company or team is using, whatever technologies I’m working with, my own workflow should not be influenced so much by the tools I use to keep track of my work and collaborate with my team. It should be transparent.

My point is: Git is a leap forward, but it’s a small one, and we need a giant one. We need a versioning system that implements workflows at a higher level, that abstracts away all those silly commands with similar names but different meanings. We need an abstraction layer that can work on top of existing lower-level tools like git, subversion and others, and that can be integrated seamlessly in development environments like IntelliJ Idea or Eclipse (sigh). Sure we will probably lose some power in the process, but we can leave it as an option.

In fact, we need to do with versioning what Maven has done to build lifecycle. Convention over configuration. What are the most common tasks we do on a daily basis:

  • start a new project
  • enter an existing project
  • start fixing a bug
  • start implementing a new feature
  • start refactoring some code
  • switch to another bugfix/feature/refactoring
  • complete a bugfix/feature/refactoring
  • share a bugfix/feature/refactoring with my team or with the world
  • what else?

And then let’s try to map those high-level tasks with lower-level command sequences. And let’s do it in such a way that I can easily change the underlying implementation, or reconfigure certain aspects of it. And why not add the possibility to define your own workflows for other things than traditional dev. In fact, to make things clearer, let’s just call git, subversion and other existing tools “versioning tools”, and let’s design a “collaboration framework”. Exactly like Maven is not a “build tool”, it’s a “build lifecycle framework”.

I’m just ranting out loud here, but what do you think?

, ,

19 responses to “Versioning is Just Too Complex”

  1. It looks to me that your frustration is mainly tied to the command set of git, which is, I agree, not self-explanatory (and thus difficult to remember, reuse, etc). You probably should have started with Mercurial, its commands are similar to SVN, + the distributed additions.

    In Mercurial, you do this to create change sets in your own repository:
    hg update
    hg commit
    And when you need to share with a remote repository, you do this:
    hg push
    hg pull

    git and Mercurial are similar. git is probably lower level, it has the staging thing to prepare a commit before committing. Its syntax is a bit odd (same command names are used for different functions). Its tracking feature is very powerful, and makes it easier to edit history. Mercurial on the other hand has a cleaner syntax and is generally easier to understand and use, it mostly hides complexity when not needed. But some patterns used in git are not very practical in Mercurial (names branches vs. anonymous).

    For high-level tasks that you enumerate, I believe they do actually correspond to single commands of either git or Mercurial.

    I think developers should be flexible, and be able to use any of those tools (or learn them when needed). If you want to do a bugfix on a branch, you have to know the branching concepts. This should be part of the knowledge of team members to make collaboration possible.

    • No because not all teams are made up of only developers. And when you need something done fast, you don’t have time to learn yet another versioning paradigm, exactly like you could save some time learning a new project structure (which is where Maven helps a lot for example).

      As for the low-level stuff, let’s take a simple example. When I want to create a new project, in git, I have at least 3 commands to run: init, add and commit. Switch to another implementing a new feature? branch and checkout (with any combination of pull and merge before that to be sure I’m basing my new feature on the latest version of the code). I’ve had a look at Mercurial, and it doesn’t solve the problem either. Even Subversion falls into the same low-level trap. I’ve gotten used to it, I know Subversion better than git or mercurial but all three share the same low-level apparent mechanics. Branches and tags are the guts of collaboration, I don’t want to see them, exactly as I don’t want to see javac or jar tasks in my build, but only compile and package.

  2. As I have little experience with Git, I likely did some mistakes or false assumptions below in my comment.
    My suggestion require validation. But I hope this will be a contribution.

    – VCS are too low level

    I agree with you that common versionning systems are too low-level for everyday recurring tasks, and that we could probably simplify them.
    This is so obvious that we didn’t pay much attention. This is illuminating to point this out.

    – Different usages

    These systems serve as general purpose file versioning (preferably text-based), mostly for software source code, but not necessarily only for that. They can also be very useful for other purposes, for example to manage project’s text-based documentation, from plain text READMEs to LaTeX/Lout documents, or any other usage we don’t imagine yet.

    – What concepts to deal with ?

    The most common usage of VCS, which interests us from our perspective is collaborative software development.
    From develper perspective, the 3 listed tasks (fixing a bug, creating a new feature or refactoring something) could be viewed as the same kind of task: to implement a single and coherent unit of *change* in program source.
    So my suggestion is to keep just these 3 elements: the “repository”, the “project” and the “change”. We could discuss if such approach is too much abstract; my point is that it is not always obvious to classify some atomic change as a bugfix, a new feature or a refactoring; or something which was a bugfix can become a refactoring.

    – Tasks

    There I copy/pasted and your list and modified it with my own suggestions to retain just the 3 core concepts of “repository”, “project” and “change”.
    * connect a repository
    * start a new project
    * enter an existing project
    * update my project with all changes (from my team or from the world)
    * start a change
    * switch to another change
    * complete a change
    * send all my changes (to my team or to the world)

    – Mapping

    While I did not took the time to do the mapping between the high-level framework (or “abstraction layer”) and low-level VCS, I have some idea about the best suited VCS to choose for the low-level part : Git.
    I propose to create a branch for each change. While it might sound scary to do this with CVS or SVN, it seems just OK with Git, because branch switching looks very efficient thank to the sandbox (to be confirmed in practice when switching between branches having consequent changes). With Git you can switch locally from one branch to the other without losing anything and without having to push anything remotely.

    When the user asks to start a change, the abstraction layer asks him to give it an identifier (which is in fact the branch name for Git).
    When the user switches to another change, the abstraction layer add/commit pending changes then switches between branches.
    When the user completes a change, the abstraction layer add/commit pending changes then merges it into its local master (head or trunk).
    When the user sends its change, the abstraction layer add/commit pending changes then push.

    The “project” is there only because this is a unit of work, and we don’t necessarily want to checkout all the projects of a repository.
    A single change could apply accross multiple projects in the same repository.

    – Limits

    There will still be merging conflicts, it is impossible to avoid them as we want to collaborate in teams, so it will always be necessary for the developer to deal manually with the granularity of each file to do such merge. It seems difficult smooth that.
    Other limits ?

    – More discussion

    More generally this is a question we could discuss with our peers.
    I see an opportunity to ask some help at the Devoxx, on 18 november I will attend this promising short talk:
    http://www.devoxx.com/display/Devoxx2K10/Using+Git+to+just+replace+SVN+is+like+using+a+Ferrari+to+haul+dirt

    • I’ll be there at Devoxx too. Let’s try to meet there and discuss this in more details.
      I like your way of seeing things, even though there might be differences in terms of workflow between a bug fix, a new feature and a refactoring. But I was the first one to say that the tool should not influence the workflow, so you’re right, we should try to keep all those under the change concept.
      For merging conflicts, I agree too: there will always be, but for example I never understood why SVN first messes up with your files when you update, merges like crazy, and then asks you to go into each file and delete irrelevant parts. There’s got to be a more user-friendly way to do this. For example, why not a syntax-aware merging engine? A merger that could recognize classes, functions and other constructs and could give me something easier to deal with than added and removed lines.
      About using git as an underlying implementation, it’s indeed a good starting point, all the more so as there is a jgit implementation that makes it easy to build on top of. But ideally, I saw something more interface-based, something that could be plugged into many versioning tools thanks to plugins, exactly like Maven can associate the compile phase with javac or groovyc goals. So even though we can start with a more permissive versioning like git’s, it should be loosely coupled in order to be able to work on top of any existing infrastructure.

  3. I don’t expect a single tool that solves everything, especially for the variety of people needs, skills, and the variety of resource kinds to be shared.

    Further simplification of branch system seems a bit dangerous to me, it could lead to people not knowing what they are doing, and thus doing it wrong. You could make the tools easier with a proper GUI for instance with pre-defined workflows that people should follow in a company, but that comes with limitations that may be just fine for this particular task, or not.

  4. People are always afraid to lose power when you make things simpler, and yet great tools like Hibernate and Maven made a lot of projects much more productive. Even though they have their own issues because abstractions are always leaky, they still make our lives easier by allowing to focus more on the business at hand.

    I think it’s the same for collaboration. Sometimes, not knowing everything that happens under the hood is a blessing. It gives you more time to think of your real objective. That’s what progress is all about. Otherwise we would still be coding everything in assembly language. A single tool will certainly not solve everything, but if 20% of factorization work can simplify 80% of workflows, that’s already a huge progress, don’t you think?

  5. I agree, simplicity is an improvement. But “simple” and “simplistic” shouldn’t be mixed up. Essence should be kept (very Zen-like wording…).

    Translating a phrase like “I’d like to implement a fix based on last week code and send that fix” into just a few commands is quite an achievement already. Now you could put a GUI above it, that’d make it even more accessible to people not familiar with command line. I don’t see much room left to make it even simpler, yet keep the interesting capabilities. If you do though, I won’t be the one to stop you, I’d be curious instead to see what you come up with.

  6. I think this is a great and important post.
    Im happy to see that im not the only one thinking that versioning is more complicated than it has to be.
    As a developer, sometimes i have to spend more valuable time than id like, handling versioning. Time that could be used to focus on the actual project.

    With all the great tools and frameworks coming out today, allowing us to develop more efficiently, versioning is one of the few things still lacking user-friendliness.

  7. You’re so right about this.

    I hated SVN because all it is used for is a central file system on some server that sometimes can help you if you did something wrong but oftentimes will make you shoot in your own foot.

    I wanted Git, because everyone talks about how cool it is, but I really don’t want to go back to the command line and learn a new (lets call it) “change management language”. And yes, I said back there.

    Now I use Mercurial, because its the more usable Git (at least if you’re on windows).

    Btw. the first company I worked at used folders named as version numbers for versioning because they didn’t understand any versioning systems.

  8. Hi Sebastian,

    First of all hallelujah !! No, really, I love your thinking and I’ve always thought bazaar was the best candidate for getting there. I wanted to point out two projects in that arena that may provide food for thought.

    First of all you have bzr explorer that provides an easy to use GUI over the bazaar command line interface *and* attempts to get close to the type of workflows you describe. It is still fairly close to the atomic bzr commands though in many instances.

    http://doc.bazaar.canonical.com/explorer/en/

    Second you have Ground Control which is a way of integrating workflows like “fix a bug” into Nautilus, the GNOME file manager. It makes it really easy for people to work with both bzr and launchpad to make the Ubuntu distribution better.

    http://ground-control.org/

    Please check them out if you haven’t already, I’m sure you will like some of the things you see in each.

  9. Thanks for a very interesting post.

    While reading I’m thinking about UNIX, a system developed by developers for developers. This has recently been made very popular for completely other user groups with the introduction of Mac OS X.

    Do we have a developer implementation today of VCS? Is it feature complete and stable enough to build upon?
    In that case we might just need a UI designer with the ambition to revolutionize the user interaction.

    • I see a lot of people suggesting that UI might be the solution to the low-level problem but I don’t share that enthusiasm. I mean, such UI’s already exist for developers: Eclipse has Mylyn (at least I think that’s roughly what it does), IntelliJ has excellent higher-level VCS integration. But I think we should be looking for something more generic, cross-platform and IDE-independent. Like a simple engine with a command-line interface and some programmable API’s to ease integration into any environment, whether it is an IDE, a filesystem, a webserver or even MS Word. I think that part of Subversion is awesome: java implementations, webdav integration. This makes it possible for everyone to use it, provided that you understand its low-level commands. That’s why once again, I think a Maven-like tool would be much better than a UI. Now that doesn’t prevent us to build nice UI’s on top of that abstraction but I would really like to have such an abstraction first. What do you guys think?

Leave a Reply

%d bloggers like this: