Monthly Archives: July 2009

The problem with automated provisioning (II of III)

The second of my articles on the macro-level issues with (automated) provisioning and focusing again on the theme of complexity as a result of “No. of Instances” x “Freq. of Change” described in the previous article “The problem with automated provisioning (I of III)“, but this time comparing an Enterprise Data Centre build-out versus a typical “Web Scale” Data Centre build-out.

Having built out both examples demonstrated I find the below a useful comparison when describing some of the issues around automated provisioning and occasionally why there are misconceptions about it from those who typically deliver systems from one of these ‘camps’ and not the other.

Enterprise

the-problem-with-provisioning-0.1-enterprise

Basically the number of systems within a typical Enterprise Data Centre (and within that Enterprise itself) is larger than that in a Web Scale (or HPC) Data Centre (or supported by that organisations), and the differing number of components that support those systems is higher too. For instance at the last Data Centre build out I led there were around eight different Operating Systems being implemented alone. This base level of complexity, which is then exasperated because of the Frequency of having to patch and update this (as demonstrated by “Automated Provisioning Complexity = No. of Instances x Freq. of Change” equation) significantly impacts any adoption of automated provisioning (it makes defining operational procedures more complex too).

Web Scale

the-problem-with-provisioning-0.1-web

Frankly a Web Scale build out is much more likely to use a greater level of standardisation to be able to drive the level of scale and scaling required to service the user requests and to maintain the system as a whole (here’s a quote from Jeff Dean, Google Fellow, “If you’re running 10,000 machines, something is going to die every day.”). This is not to say that there is not a high level of complexity inherent in these types of system, it’s just that in order to cope with the engineering effort required to ensure that the system can scale to service many hundreds of millions of requests it may well require a level of component standardisation well beyond the typical you’d see in an Enterprise type deployment (where functionality and maintenance of business process is paramount). Any complexity is more likely to be in the architecture to cope with said scaling, for instance distributed computational proximity algorithms (i.e. which server is nearest to me physically so as to reduce latency versus which servers are under most load so as to process the request as optimally as possible), or in the distributed configuration needed to maintain said system as components become available and are also de-commissioned (for whatever reason).

Automated Provisioning Complexity = No. of Instances x Freq. of Change

At the most base level provisioning a thousand machines which all have the same OS, stack and code base, with updated configuration is easier to set up than a thousand machines which use a mixture of four or five Operating Systems, which all have differing patch schedules and patch methods, with a diverse infrastructure and application software stack and multiple code bases. I suspect that upon reading this article you may think that it was an overtly obvious statement to make, but it is the fundamentals that I keep seeing people trip up on over and over again which infuriates me no end, and so, yes, expect another upcoming article on the “top” architectural issues that I encounter too.

HPC, or High Performance Computing, the third major branch of computing, build-outs usually follow the model above for that of “web scale” ones. I have an upcoming article comparing the three major branches of computing usage, Enterprise, Web Scale, and HP, in much greater detail, however for the time being the comparison above is adequate to demonstrate the point I am drawing to your attention; that of complexity of environment exasperating implementation of an automated provisioning system. Hope you enjoyed this article, it is soon to be followed by a reappraisal and revised look at Enterprise and Web Scale provisioning.

The problem with automated provisioning (I of III)

Referring back to my previous article “The problem with automated provisioning – an introduction” once you get over these too human of issues into the ‘technical’ problem of provisioning then I’d have been much nearer the mark in my initial assessment, because it is indeed an issue of complexity. The risks, costs, and likely success, of setting up and maintaining an automated provisioning capability is integrally linked to that of the complexity of the environment to be provisioned.

There are a number of contributing factors, including, number of devices, virtual instances, etc., location and distribution from the command and control point, but the two main ones in my mind are “Number of Instances” and “Frequency of Change”.

And so ‘Complexity’, in terms of automated provisioning, at a macro level, can be calculated as being “Number of Instances” versus “Frequency of Change”.

No. of Instances x Freq. of Change

By “Number of Instances” I mean number of differing operating systems in use, number of differing infrastctrue applications, number of differing application runtime environments and application frameworks, number of differing code bases, number of content versions being hosted, etc.

By “Frequency of Change” I am drawing attention to patches, code fixes, version iterations, code releases, etc., and how often they are delivered.

The following diagram demonstrates what I frequently call ‘The Problem with Provisioning’; as you can see I’ve delineated against three major architectural “levels”, from the lowest and nearest to the hardware, the OS layer which also contains ‘infrastructure software’, the Application layer, containing the application platform and runtime environment, and the “CCC” layer containing Code, Configuration and Content.

the-problem-with-provisioning-0.1-overview

In a major data-centre build-out it is not atypical to see three, four or even more, different operating systems being deployed, each of which is likely to require three or six monthly patches, as well as interim high value patches (bug fixes that effect the functionality of the system and security patches). Furthermore it’s likely the number of ISV applications, COTS products, and application runtime environments will be much higher than the number of OS instances, and that the amount of “CCC” instances will be even higher.

I find it important to separate the system being provisioned into these three groupings because, typically they require differing approaches (and technologies) for the provisioning thereof, something I mentioned in the previous article when organisations mistakenly believe that the provisioning technology that they have procured will scale the entire stack, from just above ‘bare metal’ to “CCC” changes (I’ve seen this issue more than once, even by a Sun team who should of known better, albeit it was around three years ago).

This model brings to the fore the increasing level of complexity, both of components at each layer, and the frequency of changes that then occur, and although the model above is a trifle simplistic, it is useful when describing the issues that one can encounter with implementing automated provisioning systems, especially to those with little knowledge or awareness of the topic.

links for 2009-07-28

Gartner Highlights Five Attributes of Cloud Computing – The five attributes of cloud computing according to Gartner are (1) Service-Based, (2) Scalable and Elastic, (3) Shared, (4) Metered by Use, and (5) Uses Internet Technologies. …..

The problem with automated provisioning – an introduction

I was going to start this short series of articles with the statement that the problem with provisioning is one of complexity, and I’d have been wrong, the predominant issues with provisioning, and specifically automated provisioning, are awareness and expectation.

Awareness and Expectations

The level of awareness of what can actually be done, and often, more importantly, what cannot be done, with automated provisioning, or even what automated provisioning actually “is” is a significant barrier, followed by the expectations set, both by end users with a hope for IT “silver bullets”, who may well have been oversold, and Systems Integrators, product vendors and ISVs who sadly promise a little too much to be true or are a trifle unaware of the full extent of their own abilities (positivity and confidence aside).

For instance I was once asked to take over and ‘rescue’ the build out of a data centre on behalf of a customer and their outsourcerer (£30M+ to build out, estimated £180M total to build and run for the first five years).

Personally I would say that this data-centre build out was of medium complexity, being made up of more than five hundred Wintel servers, circa three hundred UNIX devices, and around two hundred ancillary pieces of hardware including network components, firewalls, switches, bridges, intelligent KVMs and their ilk, storage components, such as SAN fabric, high end disk systems, such as Hitachi 9900 range, high end tape storage, etc., and other components.

One of the biggest problems in this instance was that the contract between client and vendor stipulated using automated provisioning technologies, not a problem in itself, however an assumption had been made, by both parties, that the entire build out would be done via the provisioning system, without a great deal of thought following this through to it’s logical conclusion.

Best to say here that they weren’t using Sun’s provisioning technology, but the then ‘market leader’, however the issues were not to do with the technology, nor functionality and capabilities of the provisioning product. It would have been as likely the similar problems would have been encountered even if it had.

This particular vendor had never implemented automated provisioning technologies before on a brand new “green-field” site, they had always implemented them in existing “brown-field” sites, where, of course, their was an existing and working implementation to encapsulate in the provisioning technology.

As some of the systems were being re-hosted from other data-centres (in part savings were to be made as part of a wider data-centre consolidation), another assumption had been made that this was not a fresh “green-field” implementation, but a legacy “brown-field” one, however this was a completely new data-centre, moving to an upgrade of hardware and infrastructure, never mind later revisions of application runtime environments, new code releases, and in-part enhanced, along with, wholly-new functionality too. AKA this was not what we typically call a “lift and shift”, where a system is ‘simply’ relocated from one location to another (and even then ‘simply’ is contextual). Another major misconception and example of incorrectly set expectation was that the provisioning technology in question would scale the entire stack, from just above ‘bare metal’ to ‘Code, Configuration and Content’ (CCC) changes, something that was, and still is extremely unlikely.

Sadly because of these misconceptions and lack of fore-thought predominantly on behalf of the outsourcerer no one had allowed for the effort to either build-out the data-centre in entirety and then encapsulate it within the provisioning technology (a model they had experience of, and which was finally adopted), nor allow for the time to build the entire data-centre as ‘system images’ within the provisioning technologies and then use it to implement the entire data-centre (which would have taken a great deal longer, not only because testing a system held as system images would have been impossible, as they would have to be loaded into the hardware to do any testing, either testing of the provisioning system, or real world UAT, system, non-functional, and performance testing).

Unsurprisingly one of the first things I had to do when I arrived was raise awareness that this was an issue, as it had not fully been identified, before getting agreement from all parties on a way forward. Effort, cost, resources, and people, were all required to develop the provisioning and automated provisioning system in a workable solution. As you can guess there had been no budget put aside for all of this, so the outsourcerer ended up absorbing the costs directly, leading to increased resentment of the contract that they had entered into and straining the relationship with the client, however this had been their own fault because of lack of experience and naivete when it came to building-out new data-centres (this had been their first so they did a lot of on the job learning and gained a tremendous amount of experience, even much of this was how not to build out a data centre).

This is why I stand by the statement that the major issues facing those adopting automated provisioning is one of awareness of the technology and what it can do and one of expectations of the level of transformation and business enablement it will facilitate, as well as how easy it is to do. The other articles in this series will focus a little more on the technical aspects of the “problem with provisioning”.

Disqus and Twitter integration to get even more improvements

You’d imagine Twitter integration in Disqus couldn’t get any better, however speaking with Daniel Ha at Disqus I hear even more improvements are on the way.

If you haven’t seen how Disqus integrates with Twitter (and incidentally what Disqus integration looks like with blogs.sun.com, or any other Apache Roller Weblogger based blog system) have a look at this blog entry which has generated a few traditional comments, but quite a few tweets, and see how Disqus displays them all as part of that conversation too:

http://blogs.sun.com/eclectic/entry/reasons_projects_and_programmes_fail

You’ll need to scroll down to the bottom of the page, but as you can see Disqus has captured a lot of the tweets and retweets about the article, which I think is pretty cool. Disqus also does the same for Facebook and a host of other social networking platforms as well.

Talking to Daniel he said that tweet and social networking metrics and “count” was on the way as well as other advancements, so I am firmly looking forward to those when they arrive.

The great thing about Disqus is that it is firmly becoming a conversation catcher and conversation engine, which is really what I want, to capture disparate conversations about what I write in an aggregate manner.

If you are interested in integrating Disqus with your Sun blog or any other Apache Roller Weblogger based blog, I have a tutorial and overview over here, along with the code and code examples you need to use:

https://horkan.com/2008/09/09/disqus-integration-bsc-roller-weblogger

Bill Vass’ top reasons to use Open Source software

You might not have seen Bill Vass’ blog article series on the topic of the top reasons to use and adopt Open Source software; and as it’s such an insightful series of articles I thought I’d bring it to your attention here.

Each one is highly data driven and contains insight that you probably haven’t seen before but is useful to be aware of when positioning Open Source to a CTO, a CIO or an IT Director, because of Bill viewpoints (having come from a CIO and CTO background). Often when you see this sort of thing written it can be rather subjective, almost ‘faith based’, so I’m always on the lookout for good factual information that is contextually relevant.

Bill Vass’ top reasons to use and adopt open source:

    1. Improved security

    1. Reduced procurement times

    1. Avoid vendor lock in

    1. Reduced costs

    1. Better quality

  1. Enhance functionality

And before you mention it, I know Bill already summarised these articles in his lead-in piece “The Open Source Light at the End of the Proprietary Tunnel…“, but it was such a great set of articles it seems a shame not to highlight them them to you!

Jakob Nielsen: “Mobile User Experience is Miserable”

Latest research into mobile web user experience says that overall the experience is “miserable”, and cites the major issues with Mobile web usages, as well as looking at overall “success” rates which, although improved from results of research in the mid-1990’s are much lower than typical PC and workstation results.

It is well worth a read for those looking at optimising for mobile readership and audience and the full report is available here: http://www.useit.com/alertbox/mobile-usability.html

This new report names two major factors to improving the aforementioned success rates; that is sites designed specifically with mobile use in mind and improvement and innovations in phone design (smart phones and touch screens perform best).

Jakob Nielsen, ex-Sun Staff member and Distinguished Engineer is famous for his work in the field of “User Experience”, and his site is a key resource to getting advice and best practice in terms of web, and other types of, user experience design.

The Reasons Projects and Programmes Fail

In this post I’ll be describing the five categories that I’ve identified of reasons that Projects and Programmes Fail, this categorisation has been built up from doing a large number of system, project and programme reviews and audits over the years, and this article follows on from the project review and programme audit framework which I wrote about recently.

Whatever problems are found in a project or programme in my experience they can be broken down into these five categories:

  1. Strategic / Alignment
  2. Contractual / Financial
  3. People / Politics
  4. Process / Procedural
  5. Technical / Architectural

For a number of years my categorization of reasons why projects and programmes fail did not include “Strategic / Alignment” as an area, and was a model made up of just the other four categories, but then I kept coming across a couple of definitive reasons why it should be added; more on this below.

So lets look at these five categories individually in more detail:

    1. Strategic / AlignmentA fundamental lack of strategic alignment to the Business has been made.

      Basically the project should never have been commissioned in the first place. It is either not required whatsoever (and yes, shockingly, I have come across this happening), or is no longer required (either because of a change of business circumstance, or functionality overlap with another system, i.e. something else does this just fine thank you very much).

      A lack of an Executive Sponsor is a good indication that this could be an issue, and even if the project or programme is some form of ‘Skunk Works’ you would expect the overall ‘Skunk Works’ innovation concept and framework to be supported by an Executive Sponsor, such as the Head of R&D;, and for a watching brief kept over costs versus potential revenues and benefits.

      Projects and programmes which are purely or highly non-functional, and provide limited, or unperceived business benefit, may also be an indication of this issue.

    1. People / PoliticsGetting people to work together can be complex and difficult, especially when their goals are not co-ordinated. Long term political enemies, people competing for resources, promotions and remuneration, are all potential issues.

      This magnifies up at a macro level into business units being in competition for talent, resources and even access to customers and partners. Programmes where multiple business units have to work together and integrate systems and functionality are almost always problematic, even when there are serious penalties if it not done.

      In general Governance compliance issues and management failings also fall into this category, as do business conduct issues, moral, etc.

    1. Process / ProceduralThe process is ‘broken’. Procedures are not in place or are not being complied with. It is either the wrong process to have been used in the first place, is not being adhered to correctly, or is not even being used at all.

      A process is in place but it is over subscribed and can not ‘scale’, alternately a process does not have enough people to service it, perhaps because of downsizing or such.

      The status of a capable Project Management Office (PMO), or of stable, authoritative, Document Repository are also indicators that there is a problem in this area, as is a lack of due diligence when managing and implementing change control.

      Governance in terms of appropriate operating model and related procedural items are here too.

    1. Contractual / FinancialFor some reason the financial arrangements of the project are having a negative impact on the ability to deliver that project. The contract is counter intuitive perhaps, or is weighted in such a way that means that the ends are not easily achieved, or does a poor job of defining the requirements.

      If you hear something like the “spirit of the contract” versus the “word of the contract” then this is a good indicator that there is an issue with the contract and that it doesn’t cover what is wanted or expected.

      Be aware that this is likely to be a problem shared by the client and their vendors as mutual understanding of what can be delivered versus what is wanted and needed by the business. This a re-iterative learning process as the business learns more about what can be delivered by technology and the system defined, whilst those involved in delivery learn the semantics, language and nature of the business and experience more of the challenges that the business has.

  1. Technical / ArchitecturalThis is last for a very good reason, and that is this is often the least contributing factor in terms of projects failing to deliver.

    When there are issues in this area in my experience it is often one of not having the appropriate people and skills at the right time, or not even identifying the key individuals you require accurately, rather than hard technology issues.

    Other issues are architectural and compositional problems (more on architectural issues in an upcoming article), access to resources at the right time, and the typical technology compatibility issues (i.e. “what works with what”) and access to vendor technology and knowledge bases.

    As a reviewer of projects and programmes which could be failing it’s likely that you will have come from a technology implementation background and that this area is well within your ‘comfort zone’, but I assure you that in the majority of cases technology may well be a minor contributing factor to the failure of an overall project, nor is the hardest problem area to improve upon (with good recommendations, of course), but it may be an area that you could potentially over focus upon and lose sight of more significant issues at hand.

Again, hope you enjoyed the article, will try and look at some other pieces, such as the top architectural mistakes made, how to identify possibly failing projects and suggestions for rescuing them.

Project Review and Programme Audit Framework – a simple example of it’s use

This is a simple example review utilising the project review and programme audit framework that I wrote about in the proceeding article. …..