Re: Object-relational impedence
Date: Sat, 15 Mar 2008 16:36:05 +0100
Message-ID: <9zegwgxgrt1n.94zlo4cv87q2.dlg_at_40tude.net>
On Sat, 15 Mar 2008 05:58:23 -0700 (PDT), David BL wrote:
> On Mar 15, 6:12 pm, "Dmitry A. Kazakov" <mail..._at_dmitry-kazakov.de>
> wrote:
>> On Fri, 14 Mar 2008 18:59:49 -0700 (PDT), David BL wrote:
>>> I expect you like the idea of distributed OO, >> >> Well, distributed OO is a different thing to me. It is when an object is >> distributed over a set of nodes, a kind of wave, rather than a particle...
>
> That sounds like fragmented objects.
>
> http://en.wikipedia.org/wiki/Fragmented_object
Yep, that thing.
>>> However the literature is hardly >>> compelling. There is the problem of >> >>> - finding a consistent cut (ie that respects the >>> happened-before relation) >> >>> - the contradiction between transactions and orthogonal >>> persistence >> >>> - the contradiction between rolling back a transaction >>> and orthogonal persistence >> >> Complementarity, rather than mere contradiction.
>
> The following argument appears in "Concurrency, the fly in the
> ointment" by Blackburn and Zigman:
>
> The transactional model implicitly requires the dichotomy of two
> worlds - an internal one for the persistent data, and an external non
> persistent world that issues transactions over the first. This
> follows from the impossibility of an ACID transaction being invoked
> from within an (atomic) transaction - ie a transaction cannot be the
> basis for its own nested invocation.
>
> By definition of atomicity of a parent transaction, the durability of
> any nested (ie child) transaction is subject to the atomicity of the
> parent transaction. This is in conflict with an independent
> durability required by a child ACID transaction.
Clearly, durability of any effect is conditional. So what?
Anyway it is about composition of transactions, I don't see how this can collide with persistence. The latter is merely a matter of the scope where an object exists. Each object has a scope. Each object is persistent within it. Whether the scope is contained by the scope of OS, or its file system, or a cluster of hosts, is no matter to the issue (when scopes are nested.)
>>> - the impossibility of reliable distributed transactions >> >> There is no such thing as unconditionally reliable computing anyway.
>
> Sure, but it is customary to assume infallibility within a process and
> fallibility between processes.
Inter process communications are as [un]reliable as any other. In each case you should specify what is taken for granted (premises) and what is the subject of QoS enforcing.
>>> - the fact that synchronous messages over the wire can easily >>> be a million times slower than in-process calls >> >> Huh, don't buy multi-core processors, don't use any memory except registers >> etc. This is not an argument, so long no concrete time constraint put down. >> Below you mentioned HTTP as an example. It is milliard times slower, who >> cares? "Bright" minds use XML as a transport level, and for that matter, >> interpreted SQL...
>
> You missed the point. Fine grained interchanges of messages are
> useful within a process but are something to be avoided between
> processes. The penalty is so high that distributed computing systems
> must account for it in the high level design.
That's OK. Same can be said about distribution of objects. It is inefficient to distribute tightly coupled small objects. Don't do that. For that matter, don't do a separate SELECT for each row of the result set another SELECT, if you can avoid this.
> Between processes it is better to stream data asynchronously, such as
> the way OpenGL drawing commands are piped between client/server
> without any round trip delay for each drawing command.
No, asynchronous messaging does things only worse. If objects are coupled they need to know and synchronize their states *frequently*. Asynchronous messaging is less suited for this job and causes extra overhead. So if the system should rely on asynchronous messaging it has to be designed appropriately (which is difficult). This is determined by the domain.
Consider your OpenGL example in a safety-critical application. A typical requirement there is to verify that the graphical output was indeed done and the operator sees the output after maximal allowable delay. That would mean that you have to read the rendered image back. Would asynchronous messaging help you here?
Now my example. There exist a lot of synchronous communication hardware, again in mission critical areas. It would be just wasting time and resources to deploy asynchronous messaging over a time-triggered protocol.
>>> - the fallibility of distributed synchronous messages which >>> contradicts location transparency >> >> That depends on what is transparent to what. I don't see why >> synchronization should play any role here. I assume you meant something >> like routing to moving targets, then that would apply to both.
>
> By definition, the call of an asynchronous messages (a "post") can
> return without knowing whether the message was received, whereas a
> synchronous message must block.
Yes, but you have to resolve the target first. This can fail/block before the message sent. There is always a synchronous part in any asynchronous exchange. For example, you have to wait for the driver, that should allocate necessary resources, marshal your message and do all other sort stuff, before you continue. In a real-time system you can have this time bounded, but the same is true for synchronous messaging. You can start synchronous exchange, continue and synchronize later.
> That raises the question of what to
> do when the network fails. This impacts design by contract (in
> conflict with location transparency).
Certainly, the contract will have network failure as a legal state. This is a premise of reliable design. Same with asynchronous messages. You won't be able to certify the thing if you cannot detect non-delivery. Non-delivery is not a bug, it is a state.
>>> - the enormously difficult problem of distributed locking >>> * how to avoid concurrency bottlenecks >>> * when to release locks in the presence of network or machine >>> failures >>> * distributed deadlock detection. >>> * rolling back a distributed transaction. >> >> This is a sort of mixing lower and higher level synchronization >> abstractions. If you use transactions then locking is an implementation >> detail. Anyway all these problems are ones of concurrent computing in >> general, they are not specific to distributed computing and even less than >> that to OO. You can always consider concurrent remote tasks running local.
>
> The point is that concurrency interacts very badly with orthogonal
> persistence and location transparency - to the extent that it places
> serious doubts on whether orthogonal persistence and location
> transparency are useful concepts in the first place.
I don't see it this way. The very moment the scope of an object crosses the scope of a task, you have a problem. It is a general concurrency issue. You cannot have only task-local objects.
>>> - how to schema evolve a distributed OO system assuming >>> orthogonal persistence and location transparency. >> >>> - how to manage security when a process exposes many of its >>> objects for direct communication with objects in another >>> process. >> >> On per object basis.
>
> In reality security can only be controlled at the boundary between
> processes and that conflicts with location transparency. Allowing
> direct communication between objects opens up security holes
> everywhere. By contrast, the data centric approach allows the inter-
> process message protocol to be simple and implemented entirely within
> the DBMS layers.
You seem to imply that the trusted model of method invocation is the only possible one. It is not.
> I'm saying persistent data should be nothing more that persistent
> encoded values
I don't care about data, even less about persistent ones. Data persistency is a solution of some problem, I guess. Which one? Ah, maybe, synchronization of applications, those damned state machines...
> instead of snapshots (ie consistent cuts) of
> multithreaded or distributed state machines. The former is much
> simpler than the latter.
(If that were so you could use the content of main memory as data.)
The problem is to synchronize states of independent components.
>>> The easiest way for distributed applications to communicate is >>> indirectly via shared data rather than by direct communication. This >>> is implicit with a data-centric approach. >> >> Ooch, shared data is the worst possible way. Note how hardware >> architectures have been moving away from shared memory. Sooner or later it >> should hit software design.
>
> Really? Are you suggesting there is a trend away from SMP?
Certainly. How many levels of cache they have? Shared memory is a bottleneck. In the future we will have massively parallel distributed systems with much local and very little shared memory. The cost of CPU and local memory will dramatically drop relatively to the cost of shared memory.
> An alternative is for the applications to avoid shared data and
> special message protocols are developed to allow them to talk to each
> other. Do you agree that's not a very good solution?
It is not a matter of our preference. We have to program the architectures we have. Shared memory could survive only if the hardware would radically change. I don't see if, say, molecular or quantum computing models will make shared memory viable. I don't exclude this, but it looks unlikely.
-- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.deReceived on Sat Mar 15 2008 - 16:36:05 CET