Sunday, January 1, 2012

A Look at Two Competing Programming Ideologies

A Look at Two Competing Programming Ideologies
by Steven Schmidt

In today’s world of science and technology, a couple of competing ideologies have arisen in the realm of computer programming. These two ideologies are held, more or less, by two different camps of people.

On the one hand we have the Computer Scientist.  These individuals came out of the School of Thought which has embraced the computer revolution of the past 25 years, and which teaches that this revolution has led to the pinnacle of programming perfection - the art of Large Scale Software Development, Object Oriented Programming, an assumption that computer memory is cheap and plentiful, and that nearly all problems have already been solved by Logicians and Mathematicians in the Standard Design Pattern collection.  This camp views any deviation from these principles as old, obsolete, and out-of-date.  This is an assumed fact, since a large majority have never spent any significant time programming in any other way than this Software Development ideology that they have been taught.

On the other hand, we have the Engineer or Physicist.  These individuals spend less time in the world of philosophy and abstract logic, and live in a more practical world.  They have real-world problems that must be solved, and ultimately what matters is that the answer is absolutely right, and that it was arrived at in good time.  Or, that the machine absolutely works, and that it won’t break.  These types of programmers are often in circumstances where memory is not plentiful (either because the data sets are huge or that the computer hardware is small, hardened, and/or proprietary), and the efficiency of the algorithm can be the difference between success and failure.

The nature of the problems that these two groups face have resulted in two very different programming philosophies.  Each have their pros and cons, and each is effective for the type of problems they are intended to solve.

The Computer Scientist’s philosophy works well for large-scale projects that are seen and maintained by hundreds of software engineers over several years, and the software used by thousands of users.  The computer hardware is generally standard consumer-level electronics, and the data-set sizes are usually small and easy to work with.  Little bugs or inaccuracies in the software’s operation are inconvenient, but most of the time do not jeopardize the whole project.

The Engineer or Physicist’s philosophy works well for small yet highly complex proprietary projects that will most likely be maintained by only a small group of engineers.  The problem it is designed to solve is extremely specialized, and absolutely must work properly to work at all.  The circumstance of the algorithm is often unusual and specific to the problem at hand, so no assumptions can be made until the problem is understood thoroughly, by which time an efficient process has already been comprehended by the engineers without respect to any pre-defined Standard Pattern.  Thus, any similarity to actual Design Patterns, living or dead, is purely coincidental.

These two philosophies ought to be able to live and work together in harmony.  The problem, however, has been an insistence of many individuals within each of these two camps to stubbornly stick to their own preferred ideology, assuming that their point of view is “Right in All Cases and is Inherently Superior”.  This attitude seems to have affected those in the Computer Scientist camp more than the Engineer/Physicist camp, largely, I think, because of the way each have been taught their field.  Computer Scientists are taught a purely Object-Oriented Programming style with a historical context that emphasizes it as a “new” pattern that replaces and improves upon the non-Object-Oriented patterns of the past.  They then immerse themselves in this Object-Oriented paradigm, and gain little or no experience using the older patterns.  Most Engineers and Physicists, on the other hand, are not given any Object-Oriented education, but are immersed instead in the non-Object-Oriented pattern, with emphasis not on programming styles but rather on results.  The Engineers and Physicists suffer because they miss out on the Object Oriented understanding, but they become very familiar with many advantages in non-Object-Oriented programming.  They comprehend that Object Oriented programming has many advantages in many areas (as their practical experience suggests that all programming styles have their pros and cons), and so respect its use, but lack the experience to utilize it effectively. The Computer Scientists, however, interact with Physicists and Engineers and quickly recognize their lack of Object-Oriented understanding.  Their knee-jerk reaction, therefore, is that the Physicists and Engineers are ignorant and old-fashioned when it comes to programming, because they use what the Computer Scientists have been taught is an inferior programming style.  As a result, many Computer Scientists have little or no interest in learning or appreciating the many pros that non-Object-Oriented programming has in many circumstances.  Thus, the divide perpetuates.

And so, in an attempt to break this divide, let us examine these two philosophies in some detail, highlighting some of the difficulties that arise from enforcing a one-sided point of view.

***

The Computer Scientist is all about sacrificing for the sake of the software’s “public interface”.  To them, the public interface is the pinnacle of all programming holiness, and must always be maintained.  This means that no matter what the object or function or framework actually does under the hood, it needs to maintain the "face" of being a standard Joe Blow element.  It's all about properly maintaining the facade.  Even if this means bending over backwards and jumping through strange obscure hoops, overriding initialize and deallocate functions, etc, all for the sake of maintaining an interface that the user of the object or framework finds normal.

This, of course, is because it is assumed that there is a huge group of people using this interface that have their own software that depend on its consistency.  This makes sense for large-scale software projects that are in fact used by many different people and entities.

The problem with this, however, is that nobody outside of that framework ever knows really what's going on under the hood.  The Computer Scientist would argue that you "don't care" what's under the hood, because, like I said, if the interface is maintained, then it doesn't matter what happens, even if what actually is happening is completely different then what you'd expect by the normal 'meaning' of the interface syntax.  As long as the result is correct, you shouldn't care, they say.

The Engineer or Physicist, on the other hand, is all about consistency and transparency.  What happens under-the-hood matters -- to obscure what is going on just works against you.  An Engineer or Physicist is sensitive to the validity of a syntactical language.   If the code is written a certain way, then it implies a particular functionality underneath.  If the code is written differently underneath, then it should be written differently on the outside to reflect that difference in operation.  If sacrifice must be made, then conformity should be sacrificed for the sake of being specific and accurate.

The Computer Scientist is also the king of re-using functionality.  Why do it over again if what's already there has already been tested and proven?  And so, a Computer Scientist will sacrifice and program and bend-over-backwards twice as much all for the sake of utilizing already-created functionality.  Don't reinvent the wheel, they say.  You might introduce new bugs, they say.  Who cares if our logic seems upside-down and backwards because we're trying to adapt a functionality to something way different than it was intended?  As long as we maintain the interface, who cares? they say.  These arguments are good arguments, it is true, but only to a certain extent.

The Engineers and Physicists, by contrast, are all about practicality.  If it can be done simply, then do it simply.  If there's a direct way to get the job done right there in that function, then do it directly and get the job done right there.  Enforcing the reuse of code or conforming to the standard pattern is frivolous when it could have been done in a quarter of the time and with a quarter of the code by reworking the problem from scratch in twenty lines of code.  Besides, to do it this way makes it very clear what is being done, because you see it right in front of you.  No need to trace the bug through five different frameworks and fifty different class methods before you find the block of code that actually does the work!  In addition, if there’s a slight variation in how the algorithm should work in that particular instance, then it’s easy to adjust it without unexpectedly affecting how a different unrelated part of the code operates.

The Computer Scientist considers it extremely important to conform to Standard Design Patterns, even if the software ends up being structured differently than the reality the software was designed to serve.  This inconsistency is not seen as a problem because the majority of programmers are so deep in their corner of the software that they don't know or care to know the big-picture purpose.  The fact that the chosen design pattern is “over-kill” for the context of the problem (since the pattern was designed to solve a much wider set of problems than the particular one at hand) is either not considered or is ignored for the purpose of conforming to a pattern that all programmers (it is assumed) already know.  (This assumption is rationalized by a circular proof - defining “real programmers” as those who know and embrace these design patterns, and dismissing all who don’t know them as irrelevant, since by that definition they are not “real programmers”.)

The Engineer or Physicist, on the other hand, is very careful to remember the form and structure of the reality behind the model.  The software must clearly parallel the reality it is designed to work for, for both conceptual clarity and algorithmic accuracy.  Ultimately it is the end goal of the software that counts, and the underlying structure is only created to serve that end.  It is recognized that there may be a standard design pattern for doing this particular thing, but if that pattern obscures the reality behind the purpose of the software, or if it makes the software more complicated then it needs to be, then it shouldn't be used.  If a programmer who is new to the project thoroughly understands the problem (the reality) the software was designed to solve, then parallels between the problem and the software are recognized quickly and easily, allowing the new programmer to quickly become a contributing member of the team.

As this is done, the simplest and most to-the-point solution is preferred.  You see, the big-picture concept behind the code does matter, and the more abstraction and unneeded complication you layer on, the harder it is for the scientist to conceptually maintain the big-picture model.  The fact that the programmer seeks to see the big-picture helps keep the project to a small scale.  And because of its small conceptual ‘size’, the software can be written fast and can go from development to outcome quickly.  The few programmers who work on it understand the project from top to bottom, inside and out, and can therefore modify and change the code quickly, can find bugs in the code quickly, and potential inefficiencies are comprehended and avoided.

The Computer Scientist will rarely attempt to comprehend the big-picture of the software.  Whether this is a result of the software being too big and complicated to comprehend even if they tried, or because the software became that way because they never tried in the first place, is debatable and probably depends on the specific project.  Nonetheless, as a result the Computer Scientist depends on other reference points to maintain code reliability and operation than does the Physicist or Engineer.  Hence their insistence on using Design Patterns.  The fact is, they don’t understand the big picture of the program, but they do understand the Design Pattern and the Public Interface to the incomprehensible blob that is the rest of the program.  So if they maintain the pattern and match the interface, then the software should keep on doing what it’s doing -- whatever that is.

Also as a result of not comprehending the big-picture, the Computer Scientist is hyper-paranoid about unknown or edge use-cases.  They are constantly building into the code every last safeguard against every last possible way that a framework or class or method could possibly be used by an unknown caller to the code.  And since they don’t know what the rest of the program does, then every case is within the realm of possibility.  Certainly, when working on a large-scale project, this is very often the only way to avoid potential bugs.

The Engineer or Physicist doesn't have to be so paranoid.  They have no need to consider every edge case, because they already know who will call the code and how the code will be used.  In response to the Computer Scientist's complaint "But if you call this method this way then the code will crash!" the engineer or physicist will say "But I will never call this code that way.  It's only going to be used in a specific such-and-such scenario, where that isn't an issue."

Both the Computer Scientist and the Physicist or Engineer from time to time find need to write data to disk.  Computer Scientists have many formats that they find themselves partial to, but XML seems to be one that is particularly common in recent years.  If something needs to be written to disk, XML is very often the format of choice.  This is because the format is extremely easy to write a parser for, and is very “object friendly” -- it allows for clear text representations of objects and structures of objects.  It also helps that the classes defined in many standard libraries have built-in XML support.  As a result, many Computer Scientists will default to using XML whenever anything needs to be written to disk.  This is understandable, since a common and unambiguously defined file format makes it easy to allow many different frameworks and programs to read and interpret the file without much adaptation or head-ache.

However, this partiality to the XML format has become so predominate that many Computer Scientists have either forgotten or are completely blind to its negatives, some of which are big problems for the types of applications that Physicists and Engineers encounter.  Take, for example, the fact that generally an XML file must be parsed completely from beginning to end before the data contained in the file can be used.  The file format generally does not support “streaming”, or the ability to read and work on small pieces of the file at a time, without seeing to the end of the file.  (This is because an object being read from an XML-formatted file can’t be created fully in memory until the ending </tag> is reached.  Usually, there is a tag that defines the whole file, the start <tag> at the beginning of the file and the ending </tag> found at the very end.  Nothing is for sure until that ending tag is reached.)  Think about what this means for both speed and memory.  If the whole file must be read in order for it to be used, then that probably also means that the entire contents of the file must be held in memory at the same time.  It also means that the computer must wait for the file to be completely read before it can do anything.  What if that file is 200 GB in size?  What if the computer running this code has only 4 MB of RAM because its a proprietary embedded computer system on a satellite?  Another negative is the percentage of the file that the identifiers take up.  If a particular item in the file is only a few characters in length, then adding the identifier tags could double the size of these files on disk.  That 200 GB file just became 400 GB.

The Engineer or Physicist (as the above paragraph makes quite clear) often has clear reasons to choose to write their files not in an XML-type format but instead as simple lists of numbers or matrices.  These files can be read one line at a time, with the program only needing to hold one line or even one number in memory at a time.  The file only contains the information it needs, so will not be overly bloated on disk, saving both disk space and the time it takes to read the file.  The sacrifice, of course, is clarity.  If the program reading the file doesn’t already know what format to expect, then that information is lost.

***

There are probably more areas that can be highlighted.  The point is, as computers and technology hold a bigger and bigger influence on our world, the need to understand the engineering philosophies behind all kinds of technological fields will become more and more important.  No one philosophy trumps all, and no one paradigm is superior to any other, as much as each of our Prides want to tell us otherwise.  As people, and as coworkers, let us hope to keep our minds open to new ideas and even old ideas -- always remembering that perhaps, in this case, there was something I missed.  Maybe, just this once, doing it that way is better after all.

No comments: