Re: Object-relational impedence

From: H. S. Lahman <hsl_at_pathfindermda.com>
Date: Tue, 04 Mar 2008 19:14:48 GMT
Message-ID: <Ichzj.14289$e_.9713_at_trnddc03>


Responding to JOG...

>> All attempts by applications to access a DB's tables and columns
>> directly violates design principles that guard against close-coupling.
>> This is a basic design tenet for OO.  Violating it when jumping from OO
>> to RDB is, I think, the source of problem that are collectively and
>> popularly referred to as the object-relational impedance mismatch.

>
> I wondered if we might be able to come up with some agreement on what
> object-relational impedence mismatch actually means. I always thought
> the mismatch was centred on the issue that a single object != single
> tuple, but it appears there may be more to it than that.

First, I think it is important to clarify that the 'relational' in the mismatch isn't referring to the fact that the OO paradigm uses something other than set theory's relational model. The nature of the impedance mismatch lies in the way the OO and RDB paradigms implement the same relational model.

I think the lack of 1:1 tuple mapping is just a symptom of the mismatch. There are several contributors to the mismatch...

Applications (not just OO) are designed to solve specific problems so they are highly tailored to the particular problem in hand. In contrast, databases are designed to provide ad hoc, generic access to data that is independent of particular problem contexts. If one had to choose a single characterization of the mismatch, this would be it; everything else stems from it.

Object properties include behavior. Behaviors interact in much more complex ways than data. Managing behaviors is the primary cause of failing to map 1:1 between OO Class Diagrams and Data Models of the same subject matter. That's because managing behavior places additional constraints on the way the software is constructed.

OO relationships are instantiated at the object (tuple) level rather than the class (table) level. This allows much better tailoring of optimization to the problem in hand. It also focuses on capturing business rules and policies in the way relationships are instantiated. That, in turn, emphasizes preselecting sets of entities before they are actually accessed. Thus query-like searches for object collaborations are relatively rare in well-formed OO applications.

Corollary: the OO paradigm navigates relationship paths consisting of individual binary associations and sequentially processes object sets resulting from such navigation. Thus there is no direct equivalent of an RDB join in OOPL or AAL syntax. (One can argue that the query/join approach is less tedious, but the OO paradigm has additional goals to satisfy, such as limiting access to knowledge.)

Object identity is usually not explicitly embedded as an attribute of the object; OO applications are designed around address-based identity in computer memory. This profoundly changes the way one manages referential integrity. Thus OO developers will avoid class-level identity searches whenever possible.

The relations in OO generalizations cannot be instantiated separately; a single tuple resolves the entire generalization. This is the one situation where a Class Model and a Data Model can never map 1:1. The reason lies in the OO paradigm's support of polymorphism.

> I was hoping perhaps people might be able to offer perspectives on the
> issues that they have encountered. One thing I would like to avoid
> (outside of almost flames of course), is the notion that database
> technology is merely a persistence layer (do people still actually
> think that?) - I wonder if the 'mismatch' stems from such a
> perspective.

The short answer is that any OO application developer sees the DBMS as an implementation of a persistence layer.

I think it is important to distinguish between pure persistence in the form of an RDB and a bundle of specialized server-side applications that are layered on top of an RDB and form a DBMS. Some CRUD/USER processing can be quite complex, such as data mining, but from the end customer's perspective all the server-side applications are providing is data access and formatting.

Similarly, it is important to distinguish between CRUD/USER processing and other problems. In CRUD/USER processing the only problems being solved for the customer are data entry, data selection, and conversion to a convenient display representation. The RAD IDEs and layered model infrastructures already handle that sort of processing quite well (e.g., it is no accident that they employ form-based UIs that conveniently map into RDB tables) and applying OO development there would be largely redundant.

Thus OO developers always believe that a database is a persistence mechanism because they deal with problems outside CRUD/USER processing. IOW, the OO application's solution *starts* with accessing data from a persistent store and *ends* with shipping results off for display rendering. That problem solution doesn't care what kind of data access services the DBMS may provide; it just wants to access and store particular piles of data. Similarly, it doesn't care whether user communications are via GUI, web browser, or heliograph.

To put it more bluntly, from the OO application's solution perspective, the developer couldn't care less that the data was mined from multiple sources using exotic algorithms or whether it is stored in an RDB, an OODB, flat files, or on clay tablets. At the level of abstraction of the OO problem solution, only two services are required: "Save this pile of data I call 'X'" and "Give me the pile of data I call 'X'". Thus the entire interface for accessing persistence from an OO application's problem solution is typically just three messages of the form {message ID, [data packet]} that might look something like:

{SAVE_DATA, data ID, dataset} // to persistence

{GET_DATA, data ID} // to persistence

{HERE_IS_DATA, data ID, dataset} // response from persistence

The application solution will provide its own unique encode/decode of the message data packets into its objects and their attributes that is completely independent of the persistence schemas, etc.. Bottom line: the DBMS may provide all sorts of elegant CRUD/USER access services but the OO application doesn't care about that; that belongs to a different trade union.

<aside>
As a practical matter, the client-side does care because somehow those messages need to be mapped into the server-side DBMS services (e.g., creating SQL queries, performance caching, and optimizing joins for the DBMS). But to do that one only needs to provide the mapping once in a subsystem that is reusable by any application that accesses that DBMS. Typically that subsystem would be designed and implemented by someone who has specialized DBA skills to utilize the DBMS services in an appropriately clever fashion. IOW, the subsystem represents a fundamental separation of concerns from the specific problem solution by isolating and encapsulating specific mechanisms and optimizations related to persistence access.

Note that when developing large OO applications, one does this sort of subsystem encapsulation for *all* subsystems within the application; UI and DB subsystems just happen to be ubiquitous concerns. One does OO development because one wants maintainable applications. Hence separation of concerns and encapsulation at the subsystem level is critically important for decoupling implementations in different parts of the application.
</aside>

-- 
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
hsl_at_pathfindermda.com
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development".  Email
info_at_pathfindermda.com for your copy.
Pathfinder is hiring: 
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
Received on Tue Mar 04 2008 - 20:14:48 CET

Original text of this message