Saturday Night Music Interlude — The Arcs performing “Put a Flower in Your Pocket,” “Outta My Mind,” and “Stay In My Corner.”

The Arcs are comprised of Dan Auerbach (of The Black Keys), Leon Michels, Richard Swift, Homer Steinweiss, and Nick Movshon, which is a side project of Auerbach while Black Keys drummer Patrick Carney recovers from an injury.  They released an album of bluesy soul, rock, rhythm & blues, and psychedelic rock on the 5th of this month entitled “Yours, Dreamily.”  Also on the album are guitarist Kenny Vaughan and Mariachi Flor de Toloache, a female Mariachi band.


Do You Believe in Magic? — Big Data, Buzz Phrases, and Keeping Feet Planted Firmly on the Ground

My alternative title for this post was “Money for Nothing,” which is along the same lines.  I have been engaged in discussions regarding Big Data, which has become a bit of a buzz phrase of late in both business and government.  Under the current drive to maximize the value of existing data, every data source, stream, lake, and repository (and the list goes on) has been subsumed by this concept.  So, at the risk of being a killjoy, let me point out that not all large collections of data is “Big Data.”  Furthermore, once a category of data gets tagged as Big Data, the further one seems to depart from the world of reality in determining how to approach and use the data.  So for of you who find yourself in this situation, let’s take a collective deep breath and engage our critical thinking skills.

So what exactly is Big Data?  Quite simply, as noted by this article in Forbes by Gil Press, term is a relative one, but generally means from a McKinsey study, “datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.”  This subjective definition is a purposeful one, since Moore’s Law tends to change what is viewed as simply digital data as opposed to big data.  I would add some characteristics to assist in defining the term based on present challenges.  Big data at first approach tends to be unstructured, variable in format, and does not adhere to a schema.  Thus, not only is size a criteria for the definition, but also the chaotic nature of the data that makes it hard to approach.  For once we find a standard means of normalizing, rationalizing, or converting digital data, it no longer is beyond the ability of standard database tools to effectively use it.  Furthermore, the very process of taming it thereby renders it non-big data, or perhaps, if a exceedingly large dataset, perhaps “small big data.”

Thus, having defined our terms and the attributes of the challenge we are engaging, we now can eliminate many of the suppositions that are floating around in organizations.  For example, there is a meme that I have come upon that asserts that disparate application file data can simply be broken down into its elements and placed into database tables for easy access by analytical solutions to derive useful metrics.  This is true in some ways but both wrong and dangerous in its apparent simplicity.  For there are many steps missing in this process.

Let’s take, for example, the least complex example in the use of structured data submitted as proprietary files.  On its surface this is an easy challenge to solve.  Once someone begins breaking the data into its constituent parts, however, greater complexity is found, since the indexing inherent to data interrelationships and structures are necessary for its effective use.  Furthermore, there will be corruption and non-standard use of user-defined and custom fields, especially in data that has not undergone domain scrutiny.  The originating third-party software is pre-wired to be able to extract this data properly.  Absent having to use and learn multiple proprietary applications with their concomitant idiosyncrasies, issues of sustainability, and overhead, such a multivariate approach defeats the goal of establishing a data repository in the first place by keeping the data in silos, preventing integration.  The indexing across, say, financial systems or planning systems are different.  So how do we solve this issue?

In approaching big data, or small big data, or datasets from disparate sources, the core concept in realizing return on investment and finding new insights, is known as Knowledge Discovery in Databases or KDD.  This was all the rage about 20 years ago, but its tenets are solid and proven and have evolved with advances in technology.  Back then, the means of extracting KDD from existing databases was the use of data mining.

The necessary first step in the data mining approach is pre-processing of data.  That is, once you get the data into tables it is all flat.  Every piece of data is the same–it is all noise.  We must add significance and structure to that data.  Keep in mind that we live in this universe, so there is a cost to every effort known as entropy.  Computing is as close as you’ll get to defeating entropy, but only because it has shifted the burden somewhere else.  For large datasets it is pushed to pre-processing, either manual or automated.  In the brute force world of data mining, we hire data scientists to pre-process the data, find commonalities, and index it.  So let’s review this “automated” process.  We take a lot of data and then add a labor-intensive manual effort to it in order to derive KDD.  Hmmm..  There may be ROI there, or there may not be.

But twenty years is a long time and we do have alternatives, especially in using Fourth Generation software that is focused on data usage without the limitations of hard-coded “tools.”  These alternatives apply when using data on existing databases, even disparate databases, or file data structured under a schema with well-defined data exchange instructions that allow for a consistent manner of posting that data to database tables. The approach in this case is to use APIs.  The API, like OLE DB or the older ODBC, can be used to read and leverage the relative indexing of the data.  It will still require some code to point it in the right place and “tell” the solution how to use and structure the data, and its interrelationship to everything else.  But at least we have a means for reducing the cost associated with pre-processing.  Note that we are, in effect, still pre-processing data.  We just let the CPU do the grunt work for us, oftentimes very quickly, while giving us control over the decision of relative significance.

So now let’s take the meme that I described above and add greater complexity to it.  You have all kinds of data coming into the stream in all kinds of formats including specialized XML, open, black-boxed data, and closed proprietary files.  This data is non-structured.  It is then processed and “dumped” into a non-relational database such as NoSQL.  How do we approach this data?  The answer has been to return to a hybrid of pre-processing, data mining, and the use of APIs.  But note that there is no silver bullet here.  These efforts are long-term and extremely labor intensive at this point.  There is no magic.  I have heard time and again from decision makers the question: “why can’t we just dump the data into a database to solve all our problems?”  No, you can’t, unless you’re ready for a significant programmatic investment in data scientists, database engineers, and other IT personnel.  At the end, what they deploy, when it gets deployed, may very well be obsolete and have wasted a good deal of money.

So, once again, what are the proper alternatives?  In my experience we need to get back to first principles.  Each business and industry has commonalities that transcend proprietary software limitations by virtue of the professions and disciplines that comprise them.  Thus, it is domain expertise to the specific business that drives the solution.  For example, in program and project management (you knew I was going to come back there) a schedule is a schedule, EVM is EVM, financial management is financial management.

Software manufacturers will, apart from issues regarding relative ease of use, scalability, flexibility, and functionality, attempt to defend their space by establishing proprietary lexicons and data structures.  Not being open, while not serving the needs of customers, helps incumbents avoid disruption from new entries.  But there often comes a time when it is apparent that these proprietary definitions are only euphemisms for a well-understood concept in a discipline or profession.  Cat = Feline.  Dog = Canine.

For a cohesive and well-defined industry the solution is to make all data within particular domains open.  This is accomplished through the acceptance and establishment of a standard schema.  For less cohesive industries, but where the data or incumbents through the use of common principles have essentially created a de facto schema, APIs are the way to extract this data for use in analytics.  This approach has been applied on a broader basis for the incorporation of machine data and signatures in social networks.  For closed or black-boxed data, the business or industry will need to execute gap analysis in order to decide if database access to such legacy data is truly essential to its business, or given specification for a more open standard from “time-now” will eventually work out suboptimization in data.

Most important of all and in the end, our results must provide metrics and visualizations that can be understood, are valid, important, material, and be right.

Over at — Plato’s Cave: A Framework for Better Project Indicators

My latest post at discusses the difference between the relative effectiveness of measures.

In Book VII of Plato’s dialogue, The Republic, the Greek philosopher introduces his powerful imagery of learning and means of perception in Allegory of the Cave. In his allegory, there is a group of prisoners who have lived in a cave since childhood, legs and necks chained so that they cannot move. Above and behind them is a fire blazing at a distance so that they can only see shadows of what is between the fire and the wall of the cave in front of them. A low wall has been built to hide men who carry statues and figures of animals so that the shadows of these figures appear on the wall…

To read more, please click here.

Rise of the Machines — Drivers of Change in Business and Project Management

Last week I found myself in business development mode, as I often am, in explaining to a prospective client our future plans in terms of software development.  The point that I was making was that it was not our goal to simply reproduce the functionality that every other software solution provider offered, but to improve how the industry does business by making the drive for change through the application of appropriate technology so compelling through efficiencies, elimination of redundancy, and improved productivity, that not making the change would be deemed foolish.  In sum, we are out to take a process and improve on it through the application of disruptive technology.  I highlighted my point by stating:  “It is not our goal to simply reproduce functionality so we can party like it’s 1998, it’s been eight software generations since that time and technology has provided us smarter and better ways of doing things.”

I received the usual laughter and acknowledgement by some of the individuals to whom I was making this point, but one individual rejoined: “well, I don’t mind doing things like we did it in 1998,” or words to that effect.  I acknowledged the comment, but then reiterated that our goal was somewhat more proactive.  We ended the conversation in a friendly manner and I was invited to come back and show our new solution upon release to market.

Still, the rejoinder of being satisfied with things the way they are has stuck with me.  No doubt that being a nerd (and years as a U.S. Navy officer) have inculcated a drive in me for constant process improvement.  My default position going into a discussion is that the individuals that I am addressing share that attitude with me.  But that is not always the case.

The kneejerk position of other geeks is often of derision when confronted by resistance to change.  But not every critic or skeptic is a Luddite, and it is important to understand the basis for both criticism and skepticism.  For many of our colleagues in the project management world, software technology is a software application, something that “looks into the rear glass window.”  This meme is pervasive out there, but it is wrong.  Understanding why it is wrong is important in addressing the concerns behind them in an appropriate manner.

This view is wrong because the first generations of software that serve this market simply replicated the line and staff, specialization, and business process and analysis regime that existed prior to digitization.  Integration of data that could provide greater insight was not possible at a level of detail needed to establish confidence.  The datasets upon which we derived our data were not flexible, nor did they allow for widespread distribution of more advanced corporate and institutional knowledge.  In fact, the first software generation in project management often supported and sustained the subject matter expert (SME) framework, in which only a few individuals possessed advanced knowledge of methods and analytics, upon which the organization had to rely.

We still see this structure in place in much of industry and government–and it is self-sustaining, since it involves not only individuals within the organization that possess this attribute, but also a plethora of support contractors and consultants who have built their businesses to support it.

Additional resistance comes from individuals who have dealt with new entries in the past, which turned out only to be incremental or marginal improvements for what is already in place, not to mention the few bad actors that come along.  Established firms in the market take this approach in order to defend market share and like the SME structure, it is self-sustaining by attempting to establish a barrier to new entrants into the market.  At the same time they establish an environment of stability and security from which buyers are hesitant to leave, thus the prospective customer is content to “party like it’s 1998.”

Value proposition alone will not change the mind of those who are content.  You sell what a prospective customer needs, not usually solely what they want.  For those introducing disruptive innovation, the key is to be at the forefront in shifting the basis for what defines the basis of market need.

For example, in business and project systems, the focus has always been on “tools.”  Given the engineering domain that is dominant in many project management organizations, such terminology provides a comfortable and familiar way of addressing technology.  Getting the “right set of tools” and “using the right tool for the work” are the implicit assumptions in using such simplistic metaphors.  This has caused many companies and organizations to issue laundry lists of features and functionality in order to compare solutions when doing market surveys.  Such lists are self-limiting, supporting the self-reinforcing systems mentioned above.  Businesses who rely on this approach to the technology market are not open to leveraging the latest capabilities in improving their systems.  The metaphor of the “tool” is an out of date one.

The shift, which is accelerating in the commercial world, is emphasis on software technology that is focused on the capabilities inherent in the effective use of data.  In today’s world data is king, and the core issue is who owns the data.  I have referred to some of the new metaphors in data in my last post and, no doubt, new ones will arise.  What is important to know about the shift to an emphasis on data and its use is that it is driving organizational change that not only breaks down the “tool”-based approach to the market, but also undermines the software market emphasis on tool functionality, and on the organizational structure and support market built on the SME.

There is always fear surrounding such rapid change, and I will not argue against the fact that some of it needs to be addressed.  For example, the rapid displacement through digitization of previously human-centered manual work that previously required expertise and which paid well, will soon become one of the most important challenges of our time.  I am optimistic that the role of the SME simply needs to follow the shift, but I have no doubt that the shift will require fewer SMEs.  This highlights, however, that the underlying economics of the shift will make it both compelling and necessary.

Very soon, it will be impossible to “party like it’s 1998” and still be in business.

The Water is Wide — Data Streams and Data Reservoirs

I’ll have an article that elaborates on some of the ramifications of data streams and data reservoirs on, so stay tuned there.  In the meantime, I’ve had a lot of opportunities lately, in a practical way, to focus on data quality and approaches to data.  There is some criticism in our industry about using metaphors to describe concepts in computing.

Like any form of literature, however, there are good and bad metaphors.  Opposing them in general, I think, is contrarian posing.  Metaphors, after all, often allow us to discover insights into an otherwise opaque process, clarifying in our mind’s eye what is being observed through the process of deriving similarities to something more familiar.  Strong metaphors allow us to identify analogues among the phenomena being observed, providing a ready path to establishing a hypothesis.  Having served this purpose, we can test that hypothesis to see if the metaphor serves our purposes in contributing to understanding.

I think we have a strong set of metaphors in the case of data streams and data reservoirs.  So let’s define our terms.

Traditionally a data stream in communications theory is a set of data packets that are submitted in sequence.  For the purpose of systems theory, a data stream is data that is submitted between two entities either on a sequential real time or on a regular periodic basis.  A data reservoir is just what it sounds like it is.  Streams can be diverted to feed a reservoir, which diverts data for a specific purpose.  Thus, data in the reservoir is a repository of all data from the selected streams, and any alternative streams, that includes legacy data.  The usefulness of the metaphors are found in the way in which we treat these data.

So, for example, data streams in practical terms in project and business management are the artifacts that represent the work that is being performed.  This can be data relating to planning, production, financial management and execution, earned value, scheduling, technical performance, and risk for each period of measurement.  This data, then, requires real time analysis, inference, and distribution to decision makers.  Over time, this data provides trending and other important information that measures the inertia of the efforts in providing leading and predictive indicators.

Efficiencies can be realized by identifying duplication in data streams, especially if the data being provided into the streams are derived from a common dataset.  Streams can be modified to expand the data that is submitted, so as to eliminate alternative streams of data that add little value on their own, that is, that are stovepiped and suboptimized contrary to the maximum efficiency of the system.

In the case of data reservoirs, what these contain is somewhat different than the large repositories of metadata that must be mined.  On the contrary, a data reservoir contains a finite set of data, since what is contained in the reservoir is derived from the streams.  As such, these reservoirs contain much essential historical information to derive parametrics and sufficient data from which to derive organizational knowledge and lessons learned.  Rather than processing data in real time, the handling of data reservoirs are done to append the historical record of existing efforts to provide a fuller picture of performance and trending, and of closed out efforts that can inform systems approaches to similar future efforts.  While not quite fitting into the category of Big Data, such reservoirs can probably best be classified as Small Big Data.

Efficiencies from the streams into the reservoir can be realized if the data can be further definitized through the application of structured schemas, combined with flexible Data Exchange Instructions (DEIs) that standardize the lexicon, allowing for both data normalization and rationalization.  Still, there may be data that is not incorporated into such schemas, especially if the legacy metadata predates the schema specified for the applicable data streams.  In this case, data rationalization must be undertaken combined with standard APIs to provide consistency and structure to the data.  Even in this case, however, given the finite set since the data is specific to a system that uses a fairly standard lexicon, such rationalization will yield results that are valid.

Needless to say, applications that are agnostic to data and that provide on-the-fly flexibility in UI configuration by calling standard operating environment objects–also known as fourth generation software–have the greatest applicability to this new data paradigm.  This is because they most effectively leverage both flexibility in the evolution of the data streams to reach maximum efficiency, and in leveraging the lessons learned that are derived from the integration of data that was previously walled off from complementary data that will identify and clarify systems interdependencies.


In Between Posts Interlude — The Little Willies performing “Jolene”

As promised, posts are in draft and will be rolling in on a host of project management issues.  For now, however, here is the band The Little Willies, composed of Lee Alexander (bass), Jim Campilongo (guitar), Richard Julian (guitar/vocals), Dan Rieser (drums), and the incomparable Norah Jones.  (And, yes, despite the assertion on the new FX series Sex&Drugs&Rock&Roll many of us really do note the names of the rhythm section).  These nouveau New Yorkers started up the band after a gig that was booked at the Living Room on New York’s Lower East Side where they enjoy playing classic country and Americana.  I can’t find a website, but you can view their biographies and discography here and a Facebook page here.  Here they are performing Dolly Parton’s classic song, “Jolene.”

Points of View — Source Lines of Code as a Measure of Performance

Glen Alleman at Herding Cats has a post on the measure of source lines of code (SLOC) in project management.  He expresses the opinion that SLOC is an important measure of determining cost and schedule–a critical success factor–in what is narrowly defined as Software Intensive Systems.  Such systems are described as being those that are development intensive and represent embedded code.  The Wikipedia definition of an embedded system is as follows:  “An embedded system is a computer system with a dedicated function within a larger mechanical or electrical system, often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. Embedded systems control many devices in common use today.”  In critiquing what can only be described as a strawman argument, he asserts that in criticizing the effectiveness of SLOC, “It’s one of those irrationally held truths that has been passed down from on high by those NOT working in the domains where SLOC is a critical measure of project and system performance.”

Hmmm.  I…don’t…think…so.  I must respectfully disagree with my colleague in his generalization.

What are we measuring when we measure SLOC?  And which SLOC measure are we using?

Oftentimes we are measuring an estimate of what we think (usually based on the systemic use of the Wild-Assed Guess or “WAG” in real life), given the language in which we are developing, the numbers of effective and executable code lines needed to achieve the desired functionality.  No doubt there are parametric sets that are usually based on a static code environment, that will tell us the range of SLOC that should yield a release, given certain assumptions.  But this is systems estimation, not project management and execution.  Estimates are useful in the systems engineering process in sizing and anticipating the effort in processes such as COCOMO, SEER-SEM, and other estimating methods in a very specific subset of projects where the technology is usually well defined and the code set mature and static–with very specific limitations.

SLOC will not, by itself, provide an indication of a working product.  It is, instead, a part of the data stream in the production process in code development.  What this means is that the data must be further refined to determine effectiveness to become a true critical success factor.  Robert Park at SEI of Carnegie Institute effectively summarizes the history and difficulties in defining and applying SLOC.  Even for supporters of the metric, there are a number of papers similar to Nguyen, Deeds-Rubin, Tan, and Boehm of the Center for Systems and Software Engineering at the University of Southern California articulate the difficulty in specifying a counting standard.

The Software Technology Support Center at Hill Air Force Base’s GSAM 3.0 has this to say about SLOC:

Source lines-of-code are easy to count and most existing software estimating models use SLOCs as the key input.  However, it is virtually impossible to estimate SLOC from initial requirements statements.  Their use in estimation requires a level of detail that is hard to achieve (i.e., the planner must often estimate the SLOC to be produced before sufficient detail is available to accurately do so.)
Because SLOCs are language-specific, the definition of how SLOCs are counted has been troublesome to standardize.  This makes comparisons of size estimates between applications written in different programming languages difficult even though conversion factors are available.

What I have learned (through actual experience in coming from the software domain first as a programmer and then as a program manager) is that there was a lot of variation in the elegance of produced code.  When we use the term “elegance” we are not using woo-woo terms to obscure meaning.  It is a useful term that connotes both simplicity and effectiveness.  For example, in C programming language environments (and its successors), differences in SLOC between a good developer and a run-of-the-mill hack who uses cut-and-paste in recycling code can be as much as 20% or more.  We find evidence of this variation in the details underling the high rate of software project failure noted in my previous posts and in my article on Black Swans at  A 20% difference in executable code translates not only into cost and schedule performance, but the manner in which the code is written translates into qualitative differences in the final product such as its ability to scale and sustainment.

But more to the point, our systems engineering practices seem to contribute to suboptimization.  An example of this was articulated by Steve Ballmer in the movie Triumph of the Nerds where he voiced the very practical financial impact of the SLOC measure:

In IBM there’s a religion in software that says you have to count K-LOCs, and a K-LOC is a thousand lines of code.  How big a project is it?  Oh, it’s sort of a 10K-LOC project.  This is a 20K-LOCer.  And this is 50K-LOCs. And IBM wanted to sort of make it the religion about how we got paid.  How much money we made off OS/2 how much they did.  How many K-LOCs did you do?  And we kept trying to convince them – hey, if we have – a developer’s got a good idea and he can get something done in 4K-LOCs instead of 20K-LOCs, should we make less money?  Because he’s made something smaller and faster, less K-LOC. K-LOCs, K-LOCs, that’s the methodology.  Ugh!  Anyway, that always makes my back just crinkle up at the thought of the whole thing.

Thus, it is not that SLOC is not a metric to be collected, it is just that, given developments in software technology and especially the introduction of Fourth Generation programming language, that SLOC has a place, and that place is becoming less and less significant.  Furthermore, institutionalization of SLOC may represent a significant barrier to technological innovation, preventing leveraging the advantages provided by Moore’s Law.  In technology such bureaucratization is the last thing that is needed.