Software Architecture in Practice
The book is very complex. The topic is not easy and writing style of the authors makes it worse.
Some ideas are good but explanations are overly long. It could be much shorter, maybe a medium-sized blog post. I had a feeling that authors tried to cram into it everything they find useful about architecture. Links to aerospace standards (who needs them in a book like this?), clumsy discussion of product lines (an entire chapter is useless) and many more. Authors use “Source of stimulus - Stimulus - Environment - Artifact - Response - Response measure” framework to discuss quality attributes like modifiability, performance and so on. It’s pretty interesting attempt but it’s worthless, in my humble opinion.
Some chapters are very useful. One explains how to deal with architecture documentation. It even has an advice on how to structure architecture presentation. Another chapter explains why architecture and implementation part ways sometimes. Using this book you can find new ways to improve a particular quality attribute.
I loved a section with questions at the end of each chapter. Most of them are open-ended and very deep.
The software architecture of a system is the set of structures needed to reason about the system, which comprise software elements, relations among them, and properties of both.
This definition stands in contrast to other definitions that talk about the system’s “early” or “major” design decisions. While it is true that many architectural decisions are made early, not all are—especially in Agile or spiral development projects. It’s also true that very many decisions are made early that are not architectural. Also, it’s hard to look at a decision and tell whether or not it’s “major.”
Sometimes only time will tell. And since writing down an architecture is one of the architect’s most important obligations, we need to know now which decisions an architecture comprises.
There are three categories of architectural structures, which will play an important role in the design, documentation, and analysis of architectures:
- First, some structures partition systems into implementation units, which in this book we call modules. Modules are assigned specific computational responsibilities, and are the basis of work assignments for programming teams (Team A works on the database, Team B works on the business rules, Team C works on the user interface, etc.). In large projects, these elements (modules) are subdivided for assignment to subteams. For example, the database for a large enterprise resource planning (ERP) implementation might be so complex that its implementation is split into many parts. The structure that captures that decomposition is a kind of module structure, the module decomposition structure in fact. Another kind of module structure emerges as an output of object-oriented analysis and design—class diagrams. If you aggregate your modules into layers, you’ve created another (and very useful) module structure. Module structures are static structures, in that they focus on the way the system’s functionality is divided up and assigned to implementation teams.
- Other structures are dynamic, meaning that they focus on the way the elements interact with each other at runtime to carry out the system’s functions. Suppose the system is to be built as a set of services. The services, the infrastructure they interact with, and the synchronization and interaction relations among them form another kind of structure often used to describe a system. These services are made up of (compiled from) the programs in the various implementation units that we just described. In this book we will call runtime structures component-and-connector (C&C) structures. The term component is overloaded in software engineering. In our use, a component is always a runtime entity.
- A third kind of structure describes the mapping from software structures to the system’s organizational, developmental, installation, and execution environments. For example, modules are assigned to teams to develop, and assigned to places in a file structure for implementation, integration, and testing. Components are deployed onto hardware in order to execute. These mappings are called allocation structures.
Two disciplines related to software architecture are system architecture and enterprise architecture. Both of these disciplines have broader concerns than software and affect software architecture through the establishment of constraints within which a software system must live. In both cases, the software architect for a system should be on the team that provides input into the decisions made about the system or the enterprise.
A system’s architecture is a representation of a system in which there is a mapping of functionality onto hardware and software components, a mapping of the software architecture onto the hardware architecture, and a concern for the human interaction with these components. That is, system architecture is concerned with a total system, including hardware, software, and humans.
A system architecture will determine, for example, the functionality that is assigned to different processors and the type of network that connects those processors. The software architecture on each of those processors will determine how this functionality is implemented and how the various processors interact through the exchange of messages on the network. A description of the software architecture, as it is mapped to hardware and networking components, allows reasoning about qualities such as performance and reliability. A description of the system architecture will allow reasoning about additional qualities such as power consumption, weight, and physical footprint.
When a particular system is designed, there is frequently negotiation between the system architect and the software architect as to the distribution of functionality and, consequently, the constraints placed on the software architecture.
Enterprise architecture is a description of the structure and behavior of an organization’s processes, information flow, personnel, and organizational subunits, aligned with the organization’s core goals and strategic direction. An enterprise architecture need not include information systems—clearly organizations had architectures that fit the preceding definition prior to the advent of computers—but these days, enterprise architectures for all but the smallest businesses are unthinkable without information system support.
Thus, a modern enterprise architecture is concerned with how an enterprise’s software systems support the business processes and goals of the enterprise. Typically included in this set of concerns is a process for deciding which systems with which functionality should be supported by an enterprise.
An enterprise architecture will specify the data model that various systems use to interact, for example. It will specify rules for how the enterprise’s systems interact with external systems. Software is only one concern of enterprise architecture. Two other common concerns addressed by enterprise architecture are how the software is used by humans to perform business processes, and the standards that determine the computational environment.
Sometimes the software infrastructure that supports communication among systems and with the external world is considered a portion of the enterprise architecture; other times, this infrastructure is considered one of the systems within an enterprise. (In either case, the architecture of that infrastructure is a software architecture!) These two views will result in different management structures and spheres of influence for the individuals concerned with the infrastructure.
The system and the enterprise provide environments for, and constraints on, the software architecture. The software architecture must live within the system and enterprise, and increasingly it is the focus for achieving the organization’s business goals. But all three forms of architecture share important commonalities: They are concerned with major elements taken as abstractions, the relationships among the elements, and how the elements together meet the behavioral and quality goals of the thing being built.
A view is a representation of a coherent set of architectural elements, as written by and read by system stakeholders. It consists of a representation of a set of elements and the relations among them.
A structure is the set of elements itself, as they exist in software or hardware.
In short, a view is a representation of a structure. For example, a module structure is the set of the system’s modules and their organization. A module view is the representation of that structure, documented according to a template in a chosen notation, and used by some system stakeholders.
So: Architects design structures. They document views of those structures.
Okay, it’s not completely true to say that they had no architecture documentation. They did produce a single page diagram, with a few boxes and lines. Some of those boxes were, however, clouds. Yes, they actually used a cloud as one of their icons. When I pressed them on the meaning of this icon—Was it a process? A class? A thread?—they waffled. This was not, in fact, architecture documentation. It was, at best, “marketecture.”
Software architecture is the set of design decisions which, if made incorrectly, may cause your project to be cancelled.
… why architecture matters from a technical perspective. We will examine a baker’s dozen of the most important reasons.
- An architecture will inhibit or enable a system’s driving quality attributes.
- The decisions made in an architecture allow you to reason about and manage change as the system evolves.
- The analysis of an architecture enables early prediction of a system’s qualities.
- A documented architecture enhances communication among stakeholders.
- The architecture is a carrier of the earliest and hence most fundamental, hardest to change design decisions.
- An architecture defines a set of constraints on subsequent implementation.
- The architecture dictates the structure of an organization, or vice versa.
- An architecture can provide the basis for evolutionary prototyping.
- An architecture is the key artifact that allows the architect and project manager to reason about cost and schedule.
- An architecture can be created as a transferable, reusable model that forms the heart of a product line.
- Architecture-based development focuses attention on the assembly of components, rather than simply on their creation.
- By restricting design alternatives, architecture channels the creativity of developers, reducing design and system complexity.
- An architecture can be the foundation for training a new team member.
It is possible to make quality predictions about a system based solely on an evaluation of its architecture. If we know that certain kinds of architectural decisions lead to certain quality attributes in a system, then we can make those decisions and rightly expect to be rewarded with the associated quality attributes.
Each stakeholder of a software system—customer, user, project manager, coder, tester, and so on—is concerned with different characteristics of the system that are affected by its architecture. For example:
- The user is concerned that the system is fast, reliable, and available when needed.
- The customer is concerned that the architecture can be implemented on schedule and according to budget.
- The manager is worried (in addition to concerns about cost and schedule) that the architecture will allow teams to work largely independently, interacting in disciplined and controlled ways.
- The architect is worried about strategies to achieve all of those goals.
“Well, I was just wondering,” said the users’ delegate. “Because I see from your chart that the display console is sending signal traffic to the target location module.” “What should happen?” asked another member of the audience, addressing the first questioner. “Do you really want the user to get mode data during its reconfiguring?” And for the next 45 minutes, the architect watched as the audience consumed his time slot by debating what the correct behavior of the system was supposed to be in various esoteric states.
The debate was not architectural, but the architecture (and the graphical rendition of it) had sparked debate. It is natural to think of architecture as the basis for communication among some of the stakeholders besides the architects and developers: Managers, for example, use the architecture to create teams and allocate resources among them. But users? The architecture is invisible to users, after all; why should they latch on to it as a tool for understanding the system.
Architectures exist in four different contexts.
- Technical. The technical context includes the achievement of quality attribute requirements. We spend Part II discussing how to do this. The technical context also includes the current technology. The cloud (discussed in Chapter 26) and mobile computing (discussed in Chapter 27) are important current technologies.
- Project life cycle. Regardless of the software development methodology you use, you must make a business case for the system, understand the architecturally significant requirements, create or select the architecture, document and communicate the architecture, analyze or evaluate the architecture, implement and test the system based on the architecture, and ensure that the implementation conforms to the architecture.
- Business. The system created from the architecture must satisfy the business goals of a wide variety of stakeholders, each of whom has different expectations for the system. The architecture is also influenced by and influences the structure of the development organization.
- Professional. You must have certain skills and knowledge to be an architect, and there are certain duties that you must perform as an architect. These are influenced not only by coursework and reading but also by your experiences.
An architecture has some influences that lead to its creation, and its existence has an impact on the architect, the organization, and, potentially, the industry. We call this cycle the Architecture Influence Cycle.
No matter the source, all requirements encompass the following categories:
- Functional requirements. These requirements state what the system must do, and how it must behave or react to runtime stimuli.
- Quality attribute requirements. These requirements are qualifications of the functional requirements or of the overall product. A qualification of a functional requirement is an item such as how fast the function must be performed, or how resilient it must be to erroneous input. A qualification of the overall product is an item such as the time to deploy the product or a limitation on operational costs.
- Constraints. A constraint is a design decision with zero degrees of freedom. That is, it’s a design decision that’s already been made. Examples include the requirement to use a certain programming language or to reuse a certain existing module, or a management fiat to make your system service oriented. These choices are arguably in the purview of the architect, but external factors (such as not being able to train the staff in a new language, or having a business agreement with a software supplier, or pushing business goals of service interoperability) have led those in power to dictate these design outcomes.
What is the “response” of architecture to each of these kinds of requirements?
- Functional requirements are satisfied by assigning an appropriate sequence of responsibilities throughout the design. As we will see later in this chapter, assigning responsibilities to architectural elements is a fundamental architectural design decision.
- Quality attribute requirements are satisfied by the various structures designed into the architecture, and the behaviors and interactions of the elements that populate those structures.
- Constraints are satisfied by accepting the design decision and reconciling it with other affected design decisions.
The seven categories of design decisions are
- Allocation of responsibilities
- Coordination model
- Data model
- Management of resources
- Mapping among architectural elements
- Binding time decisions
- Choice of technology.
Allocation of responsibilities
Decisions involving allocation of responsibilities include the following:
- Identifying the important responsibilities, including basic system functions, architectural infrastructure, and satisfaction of quality attributes.
- Determining how these responsibilities are allocated to non-runtime and runtime elements (namely, modules, components, and connectors).
Strategies for making these decisions include functional decomposition, modeling real-world objects, grouping based on the major modes of system operation, or grouping based on similar quality requirements: processing frame rate, security level, or expected changes. In Chapters 5–11, where we apply these design decision categories to a number of important quality attributes, the checklists we provide for the allocation of responsibilities category is derived systematically from understanding the stimuli and responses listed in the general scenario for that QA.
Software works by having elements interact with each other through designed mechanisms. These mechanisms are collectively referred to as a coordination model. Decisions about the coordination model include these:
- Identifying the elements of the system that must coordinate, or are prohibited from coordinating.
- Determining the properties of the coordination, such as timeliness, currency, completeness, correctness, and consistency.
- Choosing the communication mechanisms (between systems, between our system and external entities, between elements of our system) that realize those properties. Important properties of the communication mechanisms include stateful versus stateless, synchronous versus asynchronous, guaranteed versus non-guaranteed delivery, and performance-related properties such as throughput and latency.
Every system must represent artifacts of system-wide interest—data—in some internal fashion. The collection of those representations and how to interpret them is referred to as the data model. Decisions about the data model include the following:
- Choosing the major data abstractions, their operations, and their properties. This includes determining how the data items are created, initialized, accessed, persisted, manipulated, translated, and destroyed.
- Compiling metadata needed for consistent interpretation of the data.
- Organizing the data. This includes determining whether the data is going to be kept in a relational database, a collection of objects, or both. If both, then the mapping between the two different locations of the data must be determined. Management of resources An architect may need to arbitrate the use of shared resources in the architecture. These include hard resources (e.g., CPU, memory, battery, hardware buffers, system clock, I/O ports) and soft resources (e.g., system locks, software buffers, thread pools, and non-threadsafe code). Decisions for management of resources include the following:
- Identifying the resources that must be managed and determining the limits for each.
- Determining which system element(s) manage each resource.
- Determining how resources are shared and the arbitration strategies employed when there is contention.
- Determining the impact of saturation on different resources. For example, as a CPU becomes more heavily loaded, performance usually just degrades fairly steadily. On the other hand, when you start to run out of memory, some point you start paging/swapping intensively and your performance suddenly crashes to a halt.
Mapping among architectural elements
An architecture must provide two types of mappings. First, there is mapping between elements in different types of architecture structures—for example, mapping from units of development (modules) to units of execution (threads or processes). Next, there is mapping between software elements and environment elements—for example, mapping from processes to the specific CPUs where these processes will execute.
Useful mappings include these:
- The mapping of modules and runtime elements to each other—that is, the runtime elements that are created from each module; the modules that contain the code for each runtime element.
- The assignment of runtime elements to processors.
- The assignment of items in the data model to data stores.
- The mapping of modules and runtime elements to units of delivery.
Binding time decisions
Binding time decisions introduce allowable ranges of variation. This variation can be bound at different times in the software life cycle by different entities— from design time by a developer to runtime by an end user. A binding time decision establishes the scope, the point in the life cycle, and the mechanism for achieving the variation.
The decisions in the other six categories have an associated binding time decision. Examples of such binding time decisions include the following:
- For allocation of responsibilities, you can have buildtime selection of modules via a parameterized makefile.
- For choice of coordination model, you can design runtime negotiation of protocols.
- For resource management, you can design a system to accept new peripheral devices plugged in at runtime, after which the system recognizes them and downloads and installs the right drivers automatically.
- For choice of technology, you can build an app store for a smartphone that automatically downloads the version of the app appropriate for the phone of the customer buying the app.
When making binding time decisions, you should consider the costs to implement the decision and the costs to make a modification after you have implemented the decision. For example, if you are considering changing platforms at some time after code time, you can insulate yourself from the effects caused by porting your system to another platform at some cost. Making this decision depends on the costs incurred by having to modify an early binding compared to the costs incurred by implementing the mechanisms involved in the late binding. choice of technology
Every architecture decision must eventually be realized using a specific technology. Sometimes the technology selection is made by others, before the intentional architecture design process begins. In this case, the chosen technology becomes a constraint on decisions in each of our seven categories. In other cases, the architect must choose a suitable technology to realize a decision in every one of the categories.
Choice of technology
Those decisions involve the following:
- Deciding which technologies are available to realize the decisions made in the other categories.
- Determining whether the available tools to support this technology choice (IDEs, simulators, testing tools, etc.) are adequate for development to proceed.
- Determining the extent of internal familiarity as well as the degree of external support available for the technology (such as courses, tutorials, examples, and availability of contractors who can provide expertise in a crunch) and deciding whether this is adequate to proceed.
- Determining the side effects of choosing a technology, such as a required coordination model or constrained resource management opportunities.
- Determining whether a new technology is compatible with the existing technology stack. For example, can the new technology run on top of or alongside the existing technology stack? Can it communicate with the existing technology stack? Can the new technology be monitored and managed.
Requirements for a system come in three categories:
- Functional. These requirements are satisfied by including an appropriate set of responsibilities within the design.
- Quality attribute. These requirements are satisfied by the structures and behaviors of the architecture.
- Constraints. A constraint is a design decision that’s already been made To express a quality attribute requirement, we use a quality attribute scenario. The parts of the scenario are these:
- Source of stimulus
- Response measure.
Hazard analysis is a technique that attempts to catalog the hazards that can occur during the operation of a system. It categorizes each hazard according to its severity. For example, the DO178B standard used in the aeronautics industry defines these failure condition levels in terms of their effects on the aircraft, crew, and passengers:
- Catastrophic. This kind of failure may cause a crash. This failure represents the loss of critical function required to safely fly and land aircraft. * Hazardous. This kind of failure has a large negative impact on safety or performance, or reduces the ability of the crew to operate the aircraft due to physical distress or a higher workload, or causes serious or fatal injuries among the passengers. * Major. This kind of failure is significant, but has a lesser impact than a
- Hazardous failure (for example, leads to passenger discomfort rather than injuries) or significantly increases crew workload to the point where safety is affected.
- Minor. This kind of failure is noticeable, but has a lesser impact than a Major failure (for example, causing passenger inconvenience or a routine flight plan change).
- No effect. This kind of failure has no impact on safety, aircraft operation, or crew workload.
Other domains have their own categories and definitions. Hazard analysis also assesses the probability of each hazard occurring. Hazards for which the product of cost and probability exceed some threshold are then made the subject of mitigation activities.
Fault tree analysis
Fault tree analysis is an analytical technique that specifies a state of the system that negatively impacts safety or reliability, and then analyzes the system’s context and operation to find all the ways that the undesired state could occur. The technique uses a graphic construct (the fault tree) that helps identify all sequential and parallel sequences of contributing faults that will result in the occurrence of the undesired state, which is listed at the top of the tree (the “top event”). The contributing faults might be hardware failures, human errors, software errors, or any other pertinent events that can lead to the undesired state.
A fault tree lends itself to static analysis in various ways. For example, a “minimal cut set” is the smallest combination of events along the bottom of the tree that together can cause the top event. The set of minimal cut sets shows all the ways the bottom events can combine to cause the overarching failure. Any singleton minimal cut set reveals a single point of failure, which should be carefully scrutinized. Also, the probabilities of various contributing failures can be combined to come up with a probability of the top event occurring. Dynamic analysis occurs when the order of contributing failures matters. In this case, techniques such as Markov analysis can be used to calculate probability of failure over different failure sequences. Fault trees aid in system design, but they can also be used to diagnose failures at runtime. If the top event has occurred, then (assuming the fault tree model is complete) one or more of the contributing failures has occurred, and the fault tree can be used to track it down and initiate repairs.
Escalating restart is a reintroduction tactic that allows the system to recover from faults by varying the granularity of the component(s) restarted and minimizing the level of service affected. For example, consider a system that supports four levels of restart, as follows. The lowest level of restart (call it Level 0), and hence having the least impact on services, employs passive redundancy (warm spare), where all child threads of the faulty component are killed and recreated. In this way, only data associated with the child threads is freed and reinitialized. The next level of restart (Level
- frees and reinitializes all unprotected memory (protected memory would remain untouched). The next level of restart (Level 2) frees and reinitializes all memory, both protected and unprotected, forcing all applications to reload and reinitialize. And the final level of restart (Level 3) would involve completely reloading and reinitializing the executable image and associated data segments. Support for the escalating restart tactic is particularly useful for the concept of graceful degradation, where a system is able to degrade the services it provides while maintaining support for mission-critical or safety-critical applications.
Nonstop forwarding (NSF) is a concept that originated in router design. In this design functionality is split into two parts: supervisory, or control plane (which manages connectivity and routing information), and data plane (which does the actual work of routing packets from sender to receiver). If a router experiences the failure of an active supervisor, it can continue forwarding packets along known routes—with neighboring routers—while the routing protocol information is recovered and validated. When the control plane is restarted, it implements what is sometimes called “graceful restart,” incrementally rebuilding its routing protocol database even as the data plane continues to operate.
Systems (or components within systems) often have or embody expectations about the behaviors of its “information exchange” partners. The assumption of everything interacting with the errant component in the preceding example was that its accuracy did not degrade over time. The result was a system of parts that did not work together correctly to solve the problem they were supposed to.
The second concept we need to stress is what we mean by “interface.” Once again, we mean something beyond the simple case—a syntactic description of a component’s programs and the type and number of their parameters, most commonly realized as an API. That’s necessary for interoperability—heck, it’s necessary if you want your software to compile successfully—but it’s not sufficient. To illustrate this concept, we’ll use another “conversation” analogy. Has your partner or spouse ever come home, slammed the door, and when you ask what’s wrong, replied “Nothing!”? If so, then you should be able to appreciate the keen difference between syntax and semantics and the role of expectations in understanding how an entity behaves.
Here are some of the challenges that organizations face related to standards and interoperability:
- Ideally, every implementation of a standard should be identical and thus completely interoperable with any other implementation. However, this is far from reality. Standards, when incorporated into products, tools, and services, undergo customizations and extensions because every vendor wants to create a unique selling point as a competitive advantage.
- Standards are often deliberately open-ended and provide extension points. The actual implementation of these extension points is left to the discretion of implementers, leading to proprietary implementations.
- Standards, like any technology, have a life cycle of their own and evolve over time in compatible and non-compatible ways. Deciding when to adopt a new or revised standard is a critical decision for organizations. Committing to a new standard that is not ready or eventually not adopted by the community is a big risk for organizations. On the other hand, waiting too long may also become a problem, which can lead to unsupported products, incompatibilities, and workarounds, because everyone else is using the standard.
- Within the software community, there are as many bad standards as there are engineers with opinions. Bad standards include underspecified, overspecified, inconsistently specified, unstable, or irrelevant standards.
- It is quite common for standards to be championed by competing organizations, resulting in conflicting standards due to overlap or mutual exclusion.
- For new and rapidly emerging domains, the argument often made is that standardization will be destructive because it will hinder flexibility: premature standardization will force the use of an inadequate approach and lead to abandoning other presumably better approaches.
So what do organizations do in the meantime? What these challenges illustrate is that because of the way in which standards are usually created and evolved, we cannot let standards drive our architectures. We need to architect systems first and then decide which standards can support desired system requirements and qualities. This approach allows standards to change and evolve without affecting the overall architecture of the system.
I once heard someone in a keynote address say that “The nice thing about standards is that there are so many to choose from.”.
Modules have responsibilities. When a change causes a module to be modified, its responsibilities are changed in some way. Generally, a change that affects one module is easier and less expensive than if it changes more than one module. However, if two modules’ responsibilities overlap in some way, then a single change may well affect them both. We can measure this overlap by measuring the probability that a modification to one module will propagate to the other. This is called coupling, and high coupling is an enemy of modifiability.
Modifiability deals with change and the cost in time or money of making a change, including the extent to which this modification affects other functions or quality attributes.
Changes can be made by developers, installers, or end users, and these changes need to be prepared for. There is a cost of preparing for change as well as a cost of making a change. The modifiability tactics are designed to prepare for subsequent changes.
Tactics to reduce the cost of making a change include making modules smaller, increasing cohesion, and reducing coupling. Deferring binding will also reduce the cost of making a change.
Reducing coupling is a standard category of tactics that includes encapsulating, using an intermediary, restricting dependencies, colocating related responsibilities, refactoring, and abstracting common services.
Increasing cohesion is another standard tactic that involves separating responsibilities that do not serve the same purpose. Defer binding is a category of tactics that affect build time, load time, initialization time, or runtime.
- Source of stimulus. The source of the attack may be either a human or another system. It may have been previously identified (either correctly or incorrectly) or may be currently unknown. A human attacker may be from outside the organization or from inside the organization.
- Stimulus. The stimulus is an attack. We characterize this as an unauthorized attempt to display data, change or delete data, access system services, change the system’s behavior, or reduce availability.
- Artifact. The target of the attack can be either the services of the system, the data within it, or the data produced or consumed by the system. Some attacks are made on particular components of the system known to be vulnerable.
- Environment. The attack can come when the system is either online or offline, either connected to or disconnected from a network, either behind a firewall or open to a network, fully operational, partially operational, or not operational.
- Response. The system should ensure that transactions are carried out in a fashion such that data or services are protected from unauthorized access; data or services are not being manipulated without authorization; parties to a transaction are identified with assurance; the parties to the transaction cannot repudiate their involvements; and the data, resources, and system services will be available for legitimate use. The system should also track activities within it by recording access or modification; attempts to access data, resources, or services; and notifying appropriate entities (people or systems) when an apparent attack is occurring.
- Response measure. Measures of a system’s response include how much of a system is compromised when a particular component or data value is compromised, how much time passed before an attack was detected, how many attacks were resisted, how long it took to recover from a successful attack, and how much data was vulnerable to a particular attack.
One structural metric that has been shown empirically to correlate to testability is called the response of a class. The response of class C is a count of the number of methods of C plus the number of methods of other classes that are invoked by the methods of C. Keeping this metric low can increase testability.
Over the years, a focus on usability has shown itself to be one of the cheapest and easiest ways to improve a system’s quality (or more precisely, the user’s perception of quality).
To gain an appreciation for the importance of software safety, we suggest reading some of the disaster stories that arise when software fails. A venerable source is the ACM Risks Forum newsgroup, known as comp.risks in the USENET community, available at www.risks.org. This list has been moderated by Peter Neumann since 1985 and is still going strong.
Nancy Leveson is an undisputed thought leader in the area of software and safety. If you’re working in safety-critical systems, you should become familiar with her work. You can start small with a paper like [Leveson 04], which discusses a number of software-related factors that have contributed to spacecraft accidents. Or you can start at the top with [Leveson 11], a book that treats safety in the context of today’s complex, sociotechnical, software-intensive systems.
The Federal Aviation Administration is the U.S. government agency charged with oversight of the U.S. airspace system, and the agency is extremely concerned about safety. Their 2000 System Safety Handbook is a good practical overview of the topic [FAA 00].
IEEE STD12281994 (“Software Safety Plans”) defines best practices for conducting software safety hazard analyses, to help ensure that requirements and attributes are specified for safety-critical software [IEEE 94]. The aeronautical standard DO178B (due to be replaced by DO178C as this book goes to publication) covers software safety requirements for aerospace applications. A discussion of safety tactics can be found in the work of Wu and Kelly [Wu 06].
In particular, interlocks are an important tactic for safety. They enforce some safe sequence of events, or ensure that a safe condition exists before an action is taken. Your microwave oven shuts off when you open the door because of a hardware interlock. Interlocks can be implemented in software also. For an interesting case study of this, see [Wozniak]
Some Finer Points of Layers
A layered architecture is one of the few places where connections among components can be shown by adjacency, and where “above” and “below” matter. If you turn Figure 13.1 upside-down so that C is on top, this would represent a completely different design. Diagrams that use arrows among the boxes to denote relations retain their semantic meaning no matter the orientation.
The layered pattern is one of the most commonly used patterns in all of software engineering, but I’m often surprised by how many people still get it wrong.
First, it is impossible to look at a stack of boxes and tell whether layer bridging is allowed or not. That is, can a layer use any lower layer, or just the next lower one? It is the easiest thing in the world to resolve this; all the architect has to do is include the answer in the key to the diagram’s notation (something we recommend for all diagrams). For example, consider the layered pattern presented in Figure 13.2 on the next page. FIXME
But I’m still surprised at how few architects actually bother to do this. And if they don’t, their layer diagrams are ambiguous. Second, any old set of boxes stacked on top of each other does not constitute a layered architecture. For instance, look at the design shown in Figure 13.3, which uses arrows instead of adjacency to indicate the relationships among the boxes. Here, everything is allowed to use everything. This is decidedly not a layered architecture. The reason is that if Layer A is replaced by a different version, Layer C (which uses it in this figure) might well have to change. We don’t want our virtual machine layer to change every time our application layer changes. But I’m still surprised at how many people call a stack of boxes lined up with each other “layers” (or think that layers are the same as tiers in a multi-tier architecture).
Third, many architectures that purport to be layered look something like Figure 13.4. This diagram probably means that modules in A, B, or C can use modules in D, but without a key to tell us for sure, it could mean anything. “Sidecars” like this often contain common utilities (sometimes imported), such as error handlers, communication protocols, or database access mechanisms. This kind of diagram makes sense only in the case where no layer bridging is allowed in the main stack. Otherwise, D could simply be made the bottommost layer in the main stack, and the “sidecar” geometry would be unnecessary. But I’m still surprised at how often I see this layout go unexplained.
Sometimes layers are divided into segments denoting a finer-grained decomposition of the modules. Sometimes this occurs when a preexisting set of units, such as imported modules, share the same allowed-to-use relation. When this happens, you have to specify what usage rules are in effect among the segments. Many usage rules are possible, but they must be made explicit. In Figure 13.5, the top and the bottom layers are segmented. Segments of the top layer are not allowed to use each other, but segments of the bottom layer are. If you draw the same diagram without the arrows, it will be harder to differentiate the different usage rules within segmented layers. Layered diagrams are often a source of hidden ambiguity because the diagram does not make explicit the allowed-to-use relations.
Finally, the most important point about layering is that a layer isn’t allowed to use any layer above it. A module “uses” another module when it depends on the answer it gets back. But a layer is allowed to make upward calls, as long as it isn’t expecting an answer from them. This is how the common error handling scheme of callbacks works. A program in layer A calls a program in a lower layer B, and the parameters include a pointer to an error handling program in A that the lower layer should call in case of error. The software in B makes the call to the program in A, but cares not in the least what it does. By not depending in any way on the contents of A, B is insulated from changes in A.
Typical examples of systems that employ the publish-subscribe pattern are the following:
- Graphical user interfaces, in which a user’s low-level input actions are treated as events that are routed to appropriate input handlers
- MVC-based applications, in which view components are notified when the state of a model object changes
- Enterprise resource planning (ERP) systems, which integrate many components, each of which is only interested in a subset of system events
- Extensible programming environments, in which tools are coordinated through events
- Mailing lists, where a set of subscribers can register interest in specific topics.
Tactics are the “building blocks” of design from which architectural patterns are created. Tactics are atoms and patterns are molecules. Most patterns consist of (are constructed from) several different tactics, and although these tactics might all serve a common purpose— such as promoting modifiability, for example—they are often chosen to promote different quality attributes. For example, a tactic might be chosen that makes an availability pattern more secure, or that mitigates the performance impact of a modifiability pattern.
An architectural pattern
- is a package of design decisions that is found repeatedly in practice,
- has known properties that permit reuse, and
- describes a class of architectures.
Because patterns are (by definition) found repeatedly in practice, one does not invent them; one discovers them.
Tactics are simpler than patterns. Tactics typically use just a single structure or computational mechanism, and they are meant to address a single architectural force. For this reason they give more precise control to an architect when making design decisions than patterns, which typically combine multiple design decisions into a package. Tactics are the “building blocks” of design from which architectural patterns are created. Tactics are atoms and patterns are molecules.
An architectural pattern establishes a relationship between:
- A context. A recurring, common situation in the world that gives rise to a problem.
- A problem. The problem, appropriately generalized, that arises in the given context.
- A solution. A successful architectural resolution to the problem, appropriately abstracted.
Complex systems exhibit multiple patterns at once.
More sophisticated models of availability exist, based on probability. In these models, we can express a probability of failure during a period of time. Given a particular MTBF and a time duration T, the probability of failure is calculated using a formula.
Search of a Grand Unified Theory for Quality Attributes
How do we create analytic models for those quality attribute aspects for which none currently exist? I do not know the answer to this question, but if we had a basis set for quality attributes, we would be in a better position to create and validate quality attribute models. By basis set I mean a set of orthogonal concepts that allow one to define the existing set of quality attributes. Currently there is much overlap among quality attributes; a basis set would enable discussion of tradeoffs in terms of a common set of fundamental and possibly quantifiable concepts. Once we have a basis set, we could develop analytic models for each of the elements of the set, and then an analytic model for a particular quality attribute becomes a composition of the models of the portions of the basis set that make up that quality attribute.
What are some of the elements of this basis set? Here are some of my candidates:
- Time. Time is the basis for performance, some aspects of availability, and some aspects of usability. Time will surely be one of the fundamental concepts for defining quality attributes.
- Dependencies among structural elements. Modifiability, security, availability, and performance depend in some form or another on the strength of connections among various structural elements. Coupling is a form of dependency. Attacks depend on being able to move from one compromised element to a currently uncompromised element through some dependency. Fault propagation depends on dependencies. And one of the key elements of performance analysis is the dependency of one computation on another. Enumeration of the fundamental forms of dependency and their properties will enable better understanding of many quality attributes and their interaction.
- Access. How does a system promote or deny access through various mechanisms? Usability is concerned with allowing smooth access for humans; security is concerned with allowing smooth access for some set of requests but denying access to another set of requests. Interoperability is concerned with establishing connections and accessing information. Race conditions, which undermine availability, come about through unmediated access to critical computations.
These are some of my candidates. I am sure there are others. The general problem is to define a set of candidates for the basis set and then show how current definitions of various quality attributes can be recast in terms of the elements of the basis set. I am convinced that this is a problem that needs to be solved prior to making substantial progress in the quest for a rich enough set of analytic models to enable prediction of system behavior across the quality attributes important for a system.
For each possible problem with respect to a quality attribute requirement, the following questions consist of things like these:
- Are there mechanisms to detect that problem?
- Are there mechanisms to prevent or avoid that problem?
- Are there mechanisms to repair or recover from that problem if it occurs?
- Is this a problem we are willing to live with?
The problems hypothesized are scrutinized in terms of a cost/benefit analysis. That is, what is the cost of preventing this problem compared to the benefits that accrue if the problem does not occur? As you might have gathered, if the architects are being thorough and if the problems are significant (that is, they present a large risk for the system), then these discussions can continue for a long time. The discussions are a normal portion of design and analysis and will naturally occur, even if only in the mind of a single designer. On the other hand, the time spent performing a particular thought experiment should be bounded. This sounds obvious, but every greyhaired architect can tell you war stories about being stuck in endless meetings, trapped in the purgatory of “analysis paralysis.”
Analysis paralysis can be avoided with several techniques:
- “Time boxing”: setting a deadline on the length of a discussion.
- Estimating the cost if the problem occurs and not spending more than that cost in the analysis. In other words, do not spend an inordinate amount of time in discussing minor or unlikely potential problems.
Prioritizing the requirements will help both with the cost estimation and with the time estimation.
There have been many papers and books published describing how to build and analyze architectural models for quality attributes. Here are just a few examples.
Many availability models have been proposed that operate at the architecture level of analysis. Just a few of these are [Gokhale 05] and [Yacoub 02].
A discussion and comparison of different black-box and white-box models for determining software reliability can be found in [Chandran 10].
A book relating availability to disaster recovery and business recovery is [Schmidt 10].
An overview of interoperability activities can be found in [Brownsword 04].
Modifiability is typically measured through complexity metrics. The classic work on this topic is [Chidamber 94]. More recently, analyses based on design structure matrices have begun to appear [MacCormack 06].
Two of the classic works on software performance evaluation are [Smith 01] and [Klein 93].
A broad survey of architecture-centric performance evaluation approaches can be found in [Koziolek 10].
Checklists for security have been generated by a variety of groups for different domains. See for example:
- Credit cards, generated by the Payment Card Industry: www.pcisecurity-standards.org/security_standards/
- Information security, generated by the National Institute of Standards and Technology (NIST): [NIST 09].
- Electric grid, generated by Advanced Security Acceleration Project for the Smart Grid: www.smartgridipedia.org/index.php/ASAP-SG
- Common Criteria. An international standard (ISO/IEC 15408) for computer security certification: www.commoncriteriaportal.org
Work in measuring testability from an architectural perspective includes measuring testability as the measured complexity of a class dependency graph derived from UML class diagrams, and identifying class diagrams that can lead to code that is difficult to test [Baudry 05]; and measuring controllability and observability as a function of data flow [Le Traon 97].
A checklist for usability can be found at www.stcsig.org/usability/topics/articles/he-checklist.html
A checklist for safety is called the Safety Integrity Level: en.wikipedia.org/wiki/ Safety_Integrity_Level
Applications of Modeling and Analysis
For a detailed discussion of a case where quality attribute modeling and analysis played a large role in determining the architecture as it evolved through a number of releases, see [Graham 07].
The authors of the Manifesto go on to describe the twelve principles that underlie their reasoning:
- Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.
- Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.
- Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.
- Business people and developers must work together daily throughout the project.
- Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.
- The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.
- Working software is the primary measure of progress.
- Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.
- Continuous attention to technical excellence and good design enhances agility.
- Simplicity—the art of maximizing the amount of work not done—is essential.
- The best architectures, requirements, and designs emerge from self-organizing teams.
- At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.
Principle 11 says that, for best results, teams should be self-organizing. But self-organization is a social process that is much more cumbersome if those teams are not physically colocated. In this case we believe that the creators of the twelve Agile principles got it wrong. The best teams may be self-organizing, but the best architectures still require much more than this—technical skill, deep experience, and deep knowledge.
There is one line representing each of these three projects, starting near the Y axis and descending, at different rates, to the X axis at the 50 mark. This shows that adding time for upfront work reduces later rework. No surprise: that is exactly the point of doing more upfront work. However, when you sum each of those downward-trending lines (for the 10, 100, and 1,000 KSLOC projects) with the upward sloping line for the upfront (initial architecture and risk resolution) work, you get the second set of three lines, which start at the Y axis and meet the upward sloping line at the 50 mark on the X axis.
These lines show that there is a sweet spot for each project. For the 10 KSLOC project, the sweet spot is at the far left. This says that devoting much, if any, time to upfront work is a waste for a small project (assuming that the inherent domain complexity is the same for all three sets of lines). For the 100 KSLOC project, the sweet spot is at around 20 percent of the project schedule. And for the 1,000 KSLOC project, the sweet spot is at around 40 percent of the project schedule. These results are fairly intuitive. A project with a million lines of code is enormously complex, and it is difficult to imagine how Agile principles alone can cope with this complexity if there is no architecture to guide and organize the effort.
Early Design Decisions and Requirements That Can Affect Them.
Architectures are driven by architecturally significant requirements: requirements that will have profound effects on the architecture. Architecturally significant requirements may be captured from requirements documents, by interviewing stakeholders, or by conducting a Quality Attribute Workshop.
In gathering these requirements, we should be mindful of the business goals of the organization. Business goals can be expressed in a common, structured form and represented as scenarios. Business goals may be elicited and documented using a structured facilitation method called PALM.
PALM can also be used to discover and carry along additional information about existing requirements. For example, a business goal might be to produce a product that outcompetes a rival’s market entry. This might precipitate a performance requirement for, say, half-second turnaround when the rival features one-second turnaround. But if the competitor releases a new product with half-second turnaround, then what does our requirement become? A conventional requirements document will continue to carry the half-second requirement, but the goal-savvy architect will know that the real requirement is to beat the competitor, which may mean even faster performance is needed.
A number of authors have compared five different industrial architecture design methods. You can find this comparison at [Hofmeister 07] “A General Model of Software Architecture Design Derived from Five Industrial Approaches,” Journal of Stems and Software, Vol. 80, No. 1 (January 2007), pp. 106-126.
Informal notations. Views are depicted (often graphically) using general-purpose diagramming and editing tools and visual conventions chosen for the system at hand. The semantics of the description are characterized in natural language, and they cannot be formally analyzed. In our experience, the most common tool for informal notations is PowerPoint.
- Semiformal notations. Views are expressed in a standardized notation that prescribes graphical elements and rules of construction, but it does not provide a complete semantic treatment of the meaning of those elements. Rudimentary analysis can be applied to determine if a description satisfies syntactic properties. UML is a semiformal notation in this sense.
- Formal notations. Views are described in a notation that has a precise (usually mathematically based) semantics. Formal analysis of both syntax and semantics is possible. There are a variety of formal notations for software architecture available. Generally referred to as architecture description languages (ADLs), they typically provide both a graphical vocabulary and an underlying semantics for architecture representation. In some cases these notations are specialized to particular architectural views. In others they allow many views, or even provide the ability to formally define new views. The usefulness of ADLs lies in their ability to support automation through associated tools: automation to provide useful analysis of the architecture or assist in code generation. In practice, the use of such notations is rare.
Documenting an architecture is a matter of documenting the relevant views and then adding documentation that applies to more than one view.
Properties of modules that help to guide implementation or are input to analysis should be recorded as part of the supporting documentation for a module view. The list of properties may vary but is likely to include the following:
- Name. A module’s name is, of course, the primary means to refer to it. A module’s name often suggests something about its role in the system. In addition, a module’s name may reflect its position in a decomposition hierarchy; the name A.B.C, for example, refers to a module C that is a submodule of a module B, itself a submodule of A.
- Responsibilities. The responsibility property for a module is a way to identify its role in the overall system and establishes an identity for it beyond the name. Whereas a module’s name may suggest its role, a statement of responsibility establishes it with much more certainty. Responsibilities should be described in sufficient detail to make clear to the reader what each module does.
- Visibility of interface(s). When a module has submodules, some interfaces of the submodules are public and some may be private; that is, the interfaces are used only by the submodules within the enclosing parent module. These private interfaces are not visible outside that context.
- Implementation information. Modules are units of implementation. It is therefore useful to record information related to their implementation from the point of view of managing their development and building the system that contains them. This might include the following:
- Mapping to source code units. This identifies the files that constitute the implementation of a module. For example, a module Account, if implemented in Java, might have several files that constitute its implementation: IAccount.java (an interface), AccountImpl.java (the implementation of Account functionality), AccountBean.java (a class to hold the state of an account in memory), AccountOrmMapping.xml (a file that defines the mapping between AccountBean and a database table—object-relational mapping), and perhaps even a unit test AccountTest.java.
- Test information. The module’s test plan, test cases, test scaffolding, and test data are important to document. This information may simply be a pointer to the location of these artifacts.
- Management information. A manager may need information about the module’s predicted schedule and budget. This information may simply be a pointer to the location of these artifacts.
- Implementation constraints. In many cases, the architect will have an implementation strategy in mind for a module or may know of constraints that the implementation must follow.
- Revision history. Knowing the history of a module including authors and particular changes may help when you perform maintenance activities.
Software elements and environmental elements have properties in allocation views. The usual goal of an allocation view is to compare the properties required by the software element with the properties provided by the environmental elements to determine whether the allocation will be successful or not. For example, to ensure a component’s required response time, it has to execute on (be allocated to) a processor that provides sufficiently fast processing power. For another example, a computing platform might not allow a task to use more than 10 kilobytes of virtual memory. An execution model of the software element in question can be used to determine the required virtual memory usage. Similarly, if you are migrating a module from one team to another, you might want to ensure that the new team has the appropriate skills and background knowledge.
Another kind of view, which we call a quality view, can be tailored for specific stakeholders or to address specific concerns. These quality views are formed by extracting the relevant pieces of structural views and packaging them together. Here are five examples:
- A security view can show all of the architectural measures taken to provide security. It would show the components that have some security role or responsibility, how those components communicate, any data repositories for security information, and repositories that are of security interest. The view’s context information would show other security measures (such as physical security) in the system’s environment. The behavior part of a security view would show the operation of security protocols and where and how humans interact with the security elements. It would also capture how the system would respond to specific threats and vulnerabilities.
- A communications view might be especially helpful for systems that are globally dispersed and heterogeneous. This view would show all of the component-to-component channels, the various network channels, quality-of-service parameter values, and areas of concurrency. This view can be used to analyze certain kinds of performance and reliability (such as deadlock or race condition detection). The behavior part of this view could show (for example) how network bandwidth is dynamically allocated.
- An exception or error-handling view could help illuminate and draw attention to error reporting and resolution mechanisms. Such a view would show how components detect, report, and resolve faults or errors. It would help identify the sources of errors and appropriate corrective actions for each. Root-cause analysis in those cases could be facilitated by such a view.
- A reliability view would be one in which reliability mechanisms such as replication and switchover are modeled. It would also depict timing issues and transaction integrity.
- A performance view would include those aspects of the architecture useful for inferring the system’s performance. Such a view might show network traffic models, maximum latencies for operations, and so forth.
You can determine which views are required, when to create them, and how much detail to include if you know the following:
- What people, and with what skills, are available
- Which standards you have to comply with
- What budget is on hand
- What the schedule is
- What the information needs of the important stakeholders are
- What the driving quality attribute requirements are
- What the size of the system is
At a minimum, expect to have at least one module view, at least one C&C view, and for larger systems, at least one allocation view in your architecture document. Beyond that basic rule of thumb, however, there is a three-step method for choosing the views:
- Step 1: Build a stakeholder/view table. Enumerate the stakeholders for your project’s software architecture documentation down the rows. Be as comprehensive as you can. For the columns, enumerate the views that apply to your system. (Use the structures discussed in Chapter 1, the views discussed in this chapter, and the views that your design work in ADD has suggested as a starting list of candidates.) Some views (such as decomposition, uses, and work assignment) apply to every system, while others (various C&C views, the layered view) only apply to some systems. For the columns, make sure to include the views or view sketches you already have as a result of your design work so far. Once you have the rows and columns defined, fill in each cell to describe how much information the stakeholder requires from the view: none, overview only, moderate detail, or high detail. The candidate view list going into step 2 now consists of those views for which some stakeholder has a vested interest.
- Step 2: Combine views. The candidate view list from step 1 is likely to yield an impractically large number of views. This step will winnow the list to manageable size. Look for marginal views in the table: those that require only an overview, or that serve very few stakeholders. Combine each marginal view with another view that has a stronger constituency.
- Step 3: Prioritize and stage. After step 2 you should have the minimum set of views needed to serve your stakeholder community. At this point you need to decide what to do first. What you do first depends on your project, but here are some things to consider:
The decomposition view (one of the module views) is a particularly helpful view to release early. High-level (that is, broad and shallow) decompositions are often easy to design, and with this information the project manager can start to staff development teams, put training in place, determine which parts to outsource, and start producing budgets and schedules.
Be aware that you don’t have to satisfy all the information needs of all the stakeholders to the fullest extent. Providing 80 percent of the information goes a long way, and this might be good enough so that the stakeholders can do their job. Check with the stakeholder to see if a subset of information would be sufficient. They typically prefer a product that is delivered on time and within budget over getting the perfect documentation.
You don’t have to complete one view before starting another. People can make progress with overview-level information, so a breadth-first approach is often the best.
No matter what the view, the documentation for a view can be placed into a standard organization consisting of these parts:
Section 1: The Primary Presentation
The primary presentation shows the elements and relations of the view. The primary presentation should contain the information you wish to convey about the system—in the vocabulary of that view. It should certainly include the primary elements and relations but under some circumstances might not include all of them. For example, you may wish to show the elements and relations that come into play during normal operation but relegate error handling or exception processing to the supporting documentation. The primary presentation is most often graphical. It might be a diagram you’ve drawn in an informal notation using a simple drawing tool, or it might be a diagram in a semiformal or formal notation imported from a design or modeling tool that you’re using. If your primary presentation is graphical, make sure to include a key that explains the notation. Lack of a key is the most common mistake that we see in documentation in practice. Occasionally the primary presentation will be textual, such as a table or a list. If that text is presented according to certain stylistic rules, these rules should be stated or incorporated by reference, as the analog to the graphical notation key. Regardless of whether the primary presentation is textual instead of graphical, its role is to present a terse summary of the most important information in the view.
Section 2: The Element Catalog
The element catalog details at least those elements depicted in the primary presentation. For instance, if a diagram shows elements A, B, and C, then the element catalog needs to explain what A, B, and C are. In addition, if elements or relations relevant to this view were omitted from the primary presentation, they should be introduced and explained in the catalog. Specific parts of the catalog include the following:
- Elements and their properties. This section names each element in the view and lists the properties of that element. Each view introduced in Chapter 1 listed a set of suggested properties associated with that view. For example, elements in a decomposition view might have the property of “responsibility”—an explanation of each module’s role in the system—and elements in a communicating-processes view might have timing parameters, among other things, as properties. Whether the properties are generic to the view chosen or the architect has introduced new ones, this is where they are documented and given values.
- Relations and their properties. Each view has specific relation types that it depicts among the elements in that view. Mostly, these relations are shown in the primary presentation. However, if the primary presentation does not show all the relations or if there are exceptions to what is depicted in the primary presentation, this is the place to record that information.
- Element interfaces. This section documents element interfaces.
- Element behavior. This section documents element behavior that is not obvious from the primary presentation.
Section 3: Context Diagram
A context diagram shows how the system or portion of the system depicted in this view relates to its environment. The purpose of a context diagram is to depict the scope of a view. Here “context” means an environment with which the part of the system interacts. Entities in the environment may be humans, other computer systems, or physical objects, such as sensors or controlled devices.
Section 4: Variability Guide.
A variability guide shows how to exercise any variation points that are a part of the architecture shown in this view.
Section 5: Rationale
Rationale explains why the design reflected in the view came to be. The goal of this section is to explain why the design is as it is and to provide a convincing argument that it is sound. The choice of a pattern in this view should be justified here by describing the architectural problem that the chosen pattern solves and the rationale for choosing it over another.
If architecture is largely about the achievement of quality attributes and if one of the main uses of architecture documentation is to serve as a basis for analysis (to make sure the architecture will achieve its required quality attributes), where do quality attributes show up in the documentation? Short of a full-fledged quality view (see page 340), there are five major ways:
- Any major design approach (such as an architecture pattern) will have quality attribute properties associated with it. Client-server is good for scalability, layering is good for portability, an information-hiding-based decomposition is good for modifiability, services are good for interoperability, and so forth. Explaining the choice of approach is likely to include a discussion about the satisfaction of quality attribute requirements and tradeoffs incurred. Look for the place in the documentation where such an explanation occurs. In our approach, we call that rationale.
- Individual architectural elements that provide a service often have quality attribute bounds assigned to them. Consumers of the services need to know how fast, secure, or reliable those services are. These quality attribute bounds are defined in the interface documentation for the elements, sometimes in the form of a service level agreement. Or they may simply be recorded as properties that the elements exhibit.
- Quality attributes often impart a “language” of things that you would look for. Security involves security levels, authenticated users, audit trails, firewalls, and the like. Performance brings to mind buffer capacities, deadlines, periods, event rates and distributions, clocks and timers, and so on. Availability conjures up mean time between failure, failover mechanisms, primary and secondary functionality, critical and noncritical processes, and redundant elements. Someone fluent in the “language” of a quality attribute can search for the kinds of architectural elements (and properties of those elements) that were put in place precisely to satisfy that quality attribute requirement.
- Architecture documentation often contains a mapping to requirements that shows how requirements (including quality attribute requirements) are satisfied. If your requirements document establishes a requirement for availability, for instance, then you should be able to look it up by name or reference in your architecture document to see the places where that requirement is satisfied.
- Every quality attribute requirement will have a constituency of stakeholders who want to know that it is going to be satisfied. For these stakeholders, the architect should provide a special place in the documentation’s introduction that either provides what the stakeholder is looking for, or tells the stakeholder where in the document to find it. It would say something like this: “If you are a performance analyst, you should pay attention to the processes and threads and their properties (defined [here]), and their deployment on the underlying hardware platform (defined [here]).” In our documentation approach, we put this information in a section called the documentation roadmap.
Here’s what you can do if you’re an architect in a highly dynamic environment:
- Document what is true about all versions of your system. Your web browser doesn’t go out and grab just any piece of software when it needs a new plugin; a plugin must have specific properties and a specific interface. And it doesn’t just plug in anywhere, but in a predetermined location in the architecture. Record those invariants as you would for any architecture. This may make your documented architecture more a description of constraints or guidelines that any compliant version of the system must follow. That’s fine.
- Document the ways the architecture is allowed to change. In the previous examples, this will usually mean adding new components and replacing components with new implementations. In the Views and Beyond approach, the place to do this is called the variability guide (captured in Section 4 of our view template.
one of the valuable properties of architecture: you could build many different systems from one. And that’s what an abstraction is: a one-to-many mapping.
One of the most vexing realities about architecture-based software development is the gulf between architectural and implementation ontologies, the set of concepts and terms inherent in an area. Ask an architect what concepts they work with all day, and you’re likely to hear things like modules, components, connectors, stakeholders, evaluation, analysis, documentation, views, modeling, quality attributes, business goals, and technology roadmaps.
Ask an implementer the same question, and you likely won’t hear any of those words. Instead you’ll hear about objects, methods, algorithms, data structures, variables, debugging, statements, code comments, compilers, generics, operator overloading, pointers, and build scripts.
This is a gap in language that reflects a gap in concepts. This gap is, in turn, reflected in the languages of the tools that each community uses. UML started out as a way to model object-oriented designs that could be quickly converted to code—that is, UML is conceptually “close” to code. Today it is a de facto architecture description language, and likely the most popular one. But it has no builtin concept for the most ubiquitous of architectural concepts, the layer. If you want to represent layers in UML, you have to adopt some convention to do it. Packages stereotyped as
<<layer>>, associated with stereotyped
<<allowed to use>>dependencies do the trick. But it is a trick, a workaround for a language deficiency. UML has “connectors,” two of them in fact. But they are a far cry from what architects think of as connectors. Architectural connectors can and do have rich functionality. For instance, an enterprise service bus (ESB) in a service-oriented architecture handles routing, data and format transformation, technology adaptation, and a host of other work. It is most natural to depict the ESB as a connector tying together services that interact with each other through it. But UML connectors are impoverished things, little more than bookkeeping mechanisms that have no functionality whatsoever. The delegation connector in UML exists merely to associate the ports of a parent component with ports of its nested children, to send inputs from the outside into a child’s input port, and outputs from a child to the output port of the parent. And the assembly connector simply ties together one component’s “requires” interface with another’s “provides” interface. These are no more than bits of string to tie two components together. To represent a true architectural connector in UML, you have to adopt a convention—another workaround—such as using simple associations tagged with explanatory annotations, or abandon the architectural concept completely and capture the functionality in another component.
In addition to designing for testability, the architect can also do these other things to help the test effort:
- Insure that testers have access to the source code, design documents, and the change records.
- Give testers the ability to control and reset the entire dataset that a program stores in a persistent database. Reverting the database to a known state is essential for reproducing bugs or running regression tests. Similarly, loading a test bed into the database is helpful. Even products that don’t use databases can benefit from routines to automatically preload a set of test data. One way to achieve this is to design a “persistence layer” so that the whole program is database independent. In this way, the entire database can be swapped out for testing, even using an inmemory database if desired.
- Give testers the ability to install multiple versions of a software product on a single machine. This helps testers compare versions, isolating when a bug was introduced. In distributed applications, this aids testing deployment configurations and product scalability. This capability could require configurable communication ports and provisions for avoiding collisions over resources such as the registry.
Architecture Presentation (Approximately 20 slides; 60 Minutes)
Driving architectural requirements, the measurable quantities you associate with these requirements, and any existing standards/models/ approaches for meeting these (2–3 slides)
Important architectural information (4–8 slides):
- Context diagram—the system within the context in which it will exist. Humans or other systems with which the system will interact.
- Module or layer view—the modules (which may be subsystems or layers) that describe the system’s decomposition of functionality, along with the objects, procedures, functions that populate these, and the relations among them (e.g., procedure call, method invocation, callback, containment).
- Component-and-connector view—processes, threads along with the synchronization, data flow, and events that connect them.
- Deployment view—CPUs, storage, external devices/sensors along with the networks and communication devices that connect them. Also shown are the processes that execute on the various processors.
Architectural approaches, patterns, or tactics employed, including what quality attributes they address and a description of how the approaches address those attributes (3–6 slides):
- Use of commercial off-the-shelf (COTS) products and how they are chosen/integrated (1–2 slides).
- Trace of 1 to 3 of the most important use case scenarios. If possible, include the runtime resources consumed for each scenario (1–3 slides).
- Trace of 1 to 3 of the most important change scenarios. If possible, describe the change impact (estimated size/difficulty of the change) in terms of the changed modules or interfaces (1–3 slides).
- Architectural issues/risks with respect to meeting the driving architectural requirements.
A Typical Agenda for Lightweight Architecture Evaluation.
Division of Responsibilities between Project Manager and Architect. (See table in the book)
The plan for a project is initially developed as a top-down schedule with an acknowledgement that it is only an estimate. Once the decomposition of the system has been done, a bottom-up schedule can be developed. The two must be reconciled, and this becomes the basis for the software development plan.
Teams are created based on the software development plan. The software architect and the project manager must coordinate to oversee the implementation. Global development creates a need for an explicit coordination strategy that is based on more formal methods than needed for co-located development.
The implementation itself causes tradeoffs between schedule, function, and cost. Releases are done in an incremental fashion and progress is tracked by both formal metrics and informal communication.
Larger systems require formal governance mechanisms. The issue of who has control over a particular portion of the system may prevent some business goals from being realized.
To build the utility-response curve, we first determine the quality attribute levels for the best-case and worst-case situations. The best-case quality attribute level is that above which the stakeholders foresee no further utility. For example, a system response to the user of 0.1 second is perceived as instantaneous, so improving it further so that it responds in 0.03 second has no additional utility. Similarly, the worst-case quality attribute level is a minimum threshold above which a system must perform; otherwise it is of no use to the stakeholders. These levels— best-case and worst-case—are assigned utility values of 100 and 0, respectively.
We then determine the current and desired utility levels for the scenario. The respective utility values (between 0 and 100) for various alternative strategies are elicited from the stakeholders, using the best-case and worst-case values as reference points. For example, our current design provides utility about half as good as we would like, but an alternative strategy being considered would give us 90 percent of the maximum utility. Hence, the current utility level is set to 50 and the desired utility level is set to 90. In this manner the utility curves are generated for all of the scenarios.
One method of weighting the scenarios is to prioritize them and use their priority ranking as the weight. So for N scenarios, the highest priority one is given a weight of 1, the next highest is given a weight of (N–1)/N, and so on. This turns the problem of weighting the scenarios into one of assigning priorities. The stakeholders can determine the priorities through a variety of voting schemes. One simple method is to have each stakeholder prioritize the scenarios (from 1 to N) and the total priority of the scenario is the sum of the priorities it receives from all of the stakeholders.
If you want to improve your individual architectural competence, you should do the following:
- Gain experience carrying out the duties. Apprenticeship is a productive path to achieving experience. Education alone is not enough, because education without on-the-job application merely enhances knowledge.
- Improve your nontechnical skills. This dimension of improvement involves taking professional development courses, for example, in leadership or time management. Some people will never become truly great leaders or communicators, but we can all improve on these skills.
- Master the body of knowledge. One of the most important things a competent architect must do is master the body of knowledge and remain up to date on it. To emphasize the importance of remaining up to date, consider the advances in knowledge required for architects that have emerged in just the last few years. For example, the cloud and edge computing that we discuss in Chapters 26 and 27 were not important topics several years ago. Taking courses, becoming certified, reading books and journals, visiting websites and portals, reading blogs, attending architecture-oriented conferences, joining professional societies, and meeting with other architects are all useful ways to improve knowledge.
The Technical Duties of a Software Architect.
The Nontechnical Duties of a Software Architect.
The Nontechnical Skills of a Software Architect.
The Knowledge Areas of a Software Architect.
(See tables in the book)
Duty: creating an architecture
- How do you create an architecture?
- How do you ensure that the architecture is aligned with the business goals?
- What is the input into the architecture creation process? What inputs are provided to the architect?
- How does the architect validate the information provided? What does the architect do in case the input is insufficient or inadequate?
Duty: architecture Evaluation and analysis
- How do you evaluate and analyze an architecture?
- Are evaluations part of the normal software development life cycle or are they done when problems are encountered?
- Is the evaluation incremental or “big bang”? How is the timing determined?
- Does the evaluation include an explicit activity relating architecture to business goals?
- What are the inputs to the evaluation? How are they validated?
- What are the outputs from an evaluation? How are the outputs of the evaluation utilized? Are the outputs differentiated according to impact or importance? How are the outputs validated? Who is communicated what outputs?
Knowledge: architecture concepts
- How does your organization ensure that its architects have adequate architectural knowledge?
- How are architects trained in general knowledge of architecture?
- How do architects learn about architectural frameworks, patterns, tactics, standards, documentation notations, and architecture description languages?24.2 Competence of a Software Architecture Organization 473
- How do architects learn about new or emerging architectural technologies (e.g., multicore processors)?
- How do architects learn about analysis and evaluation techniques and methods?
- How do architects learn quality attributespecific knowledge, such as techniques for analyzing and managing availability, performance, modifiability, and security?
- How are architects tested to ensure that their level of knowledge is adequate, and remains adequate, for the tasks that they face?
Questions based on the Organizational coordination Model.
Questions based on the Organizational Coordination model focus on how the organization establishes its teams and what support it provides for those teams to coordinate effectively. Here are a couple of example questions:
- How is the architecture designed with distribution of work to teams in mind?
- How available or broadly shared is the architecture to various teams?
- How do you manage the evolution of architecture during development?
- Is the work assigned to the teams before or after the architecture is defined, and with due consideration of the architectural structure?
- Are the aspects of the architecture that will require a lot of inter-team coordination supported by the organization’s coordination/communication infrastructure?
- Do you colocate teams with high coordination? Or at least put them in the same time zone?
- Must all coordination among teams go through the architecture team?
Questions based on the Human Performance technology Model.
The Human Performance Technology questions deal with the value and cost of the organization’s architectural activities. Here are examples of questions based on the Human Performance Technology model:
- Do you track how much the architecture effort costs, and how it impacts overall project cost and schedule?
- How do you track the end of architecture activities?
- How do you track the impact of architecture activities?
- Do you track the value or benefits of the architecture?
- How do you measure stakeholder satisfaction?
- How do you measure quality?
Questions based on the Organizational learning Model.
Finally, a set of example questions, based on the Organizational Learning model, which deal with how the organization systematically internalizes knowledge to its advantage:
- How do you capture and share experiences, lessons learned, technological decisions, techniques and methods, and knowledge about available tooling?
- Do you use any knowledge management tools?
- Is capture and use of architectural knowledge embedded in your processes?
- Where is the information about “who knows what” captured and how is this information maintained?
- How complete and up to date is your architecture documentation? How widely disseminated is it.
The potential for reuse is broad and far-ranging, including the following:
- Requirements. Most of the requirements are common with those of earlier systems and so can be reused. In fact, many organizations simply maintain a single set of requirements that apply across the entire family as a core asset; the requirements for a particular system are then written as “delta” documents off the full set. In any case, most of the effort consumed by requirements analysis is saved from system to system.
- Architectural design. An architecture for a software system represents a large investment of time from the organization’s most talented engineers. As we have seen, the quality goals for a system—performance, reliability, modifiability, and so forth—are largely promoted or inhibited once the architecture is in place. If the architecture is wrong, the system cannot be saved. For a new product, however, this most important design step is already done and need not be repeated.
- Software elements. Software elements are applicable across individual products. Element reuse includes the (often difficult) initial design work. Design successes are captured and reused; design dead ends are avoided, not repeated. This includes design of each element’s interface, its documentation, its test plans and procedures, and any models (such as performance models) used to predict or measure its behavior. One reusable set of elements is the system’s user interface, which represents an enormous and vital set of design decisions. And as a result of this interface reuse, products in a product line usually enjoy the same look and feel as each other, an advantage in the marketplace.
- Modeling and analysis. Performance models, schedulability analysis, distributed system issues (such as proving the absence of deadlock), allocation of processes to processors, fault tolerance schemes, and network load policies all carry over from product to product. Companies that build realtime distributed systems report that one of the major headaches associated with production has all but vanished. When they field a new product in their product line, they have high confidence that the timing problems have been worked out and that the bugs associated with distributed computing— synchronization, network loading, and absence of deadlock—have been eliminated.
- Testing. Test plans, test processes, test cases, test data, test harnesses, and the communication paths required to report and fix problems are already in place.
- Project planning artifacts. Budgeting and scheduling are more predictable because experience is a high-fidelity indicator of future performance. Work breakdown structures need not be invented each time. Teams, team size, and team composition are all easily determined.
Software product lines rely on reuse, but reuse has a long but less than stellar history in software engineering, with the promise almost always exceeding the payoff. One reason for this failure is that until now reuse has been predicated on the idea of “If you build it, they will come.” A reuse library is stocked with snippets from previous projects, and developers are expected to check it first before coding new elements. Almost everything conspires against this model. If the library is too sparse, the developer will not find anything of use and will stop looking. If the library is too rich, it will be hard to understand and search. If the elements are too small, it is easier to rewrite them than to find them and carry out whatever modifications they might need. If the elements are too large, it is difficult to determine exactly what they do in detail, which in any case is not likely to be exactly right for the new application. In most reuse libraries, pedigree is hazy at best. The developer cannot be sure exactly what the element does, how reliable it is, or under what conditions it was tested. And there is almost never a match between the quality attributes needed for the new application and those provided by the elements in the library.
In any case, it is common that the elements were written for a different architectural model than the one the developer of the new system is using. Even if you find something that does the right thing with the right quality attributes, it is doubtful that it will be the right kind of architectural element (if you need an object, you might find a process), that it will have the right interaction protocol, that it will comply with the new application’s error-handling or failover policies, and so on.
This has led to so many reuse failures that many project managers have given up on the idea. “Bah!” they exclaim. “We tried reuse before, and it doesn’t work!”
Software product lines make reuse work by establishing a strict context for it. The architecture is defined; the functionality is set; the quality attributes are known. Nothing is placed in the reuse library—or “core asset base” in product line terms—that was not built to be reused in that product line. Product lines work by relying on strategic or planned, not opportunistic, reuse.
Getting Architecture Reviews into an Organization through the Back Door
If you search the web for “code review computer science,” you’ll turn up millions of hits that describe code reviews and the steps that are taken to perform them. If you search for “design review computer science,” you’ll turn up little that is useful.
Other disciplines routinely practice and teach design critiques. Search for “design critique” and you will find many hits together with instructions. A design is a set of decisions of whatever type that attempts to solve a particular problem, whether an art problem, a user interface design problem, or a software problem. Solutions to important design problems should be subject to peer review, just as code should be subject to peer review.
There is a wealth of data that points out that the earlier in the life cycle a problem is discovered and fixed, the less the cost of finding and fixing the problem. Design precedes code and so having appropriate design reviews seems both intuitively and empirically justified. In addition, the documents around the review, both the original design document and also the critiques, are valuable learning tools for new developers. In many organizations developers switch systems frequently, and so they are constantly learning.
This view is not universally shared. A software engineer working in a major software house tells me that even though the organization aspires to writing and reviewing design documents, it rarely happens. Senior developers tend to limit their review to a cursory glance. Code reviews, on the other hand, are taken quite seriously by the senior developers. My software engineer friend offers two possible explanations for this state of affairs:
- The code review is the last opportunity to affect what is built: “review this or live with it.” This explanation assumes that senior developers do not believe that the output of design reviews are actionable and thus wait to engage until later in the process.
- The code is more concrete than the design, and is therefore easier to assess. This explanation assumes that senior developers are incapable of understanding designs.
I do not find either of these explanations compelling, but I am unable to come up with a better one.
What to do?
What this software engineer did is to look for a surrogate process where a design review could be surreptitiously performed. This individual noticed that when the organization did code reviews, questions such as “Why did you do that?” were frequently asked. The result of such questions was a discussion of rationale. So the individual would code up a solution to a problem, submit it to a code review, and wait for the question that would lead to the rationale discussion.
A design review is a review where design decisions are presented together with their rationale. Frequently, design alternatives are explored. Whether this is done under the name of code review or design review is not nearly as important as getting it done.
Of course, my friend’s surreptitious approach has drawbacks. It is inefficient to code a solution that may have to be thrown away. Also, embedding design reviews into code reviews means that the designs and reviews end up being embedded in the code review tool, making it difficult to search this tool for design and design rationale. But these inefficiencies are dwarfed by the inefficiency of pursuing an incorrect solution to a particular problem.