Re: database systems and organizational intelligence
Date: Thu, 03 Jun 2004 18:29:25 GMT
Message-ID: <96Kvc.5024$n65.578_at_newssvr33.news.prodigy.com>
"Dawn M. Wolthuis" <dwolt_at_tincat-group.com> wrote in message
news:c9m1m5$i93$1_at_news.netins.net...
> "Eric Kaun" <ekaun_at_yahoo.com> wrote in message
> news:p41vc.5086$Zq.1808_at_newssvr32.news.prodigy.com...
> > Well, then I envy you. If you've only ever had the need to view data as
a
> > single graph, then your requirements have been much kinder to you than
> mine
> > have to me. Typically I see the need for many different graphs in
> different
> > contexts, which is difficult to support through graph-like transforms,
but
> > easy when based on relations.
>
> Hmmm. I'm thinking of the namespace as the root, so I guess I work with
> just one graph.
A graph is fine - well, not fine, but sufficiently general even if overly complex. But XML is a bit more specific - it's a directed acyclic graph. And then you see IDs and REFs superimposed to give it additional power, which to me just adds complexity and confusion, since they're all untyped (and therefore 2 elements of the same name could have REFs to other elements of different types - probably not what you wanted).
I agree that if all you need is a tree, and you can identify a root that serves your needs, then XML is fine. In my experience, the number of applications that fit that bill is vanishingly small.
> > Date's point is
> > simply that the DBMS sees a single value at any given tuple/attribute
> > "point". That single value can be "complex", but that's a type you've
> > defined, and the DBMS doesn't care about that - you have to do the work.
> No
> > special treatment for types.
>
> Which means you have to do the work when retrieving the data too.
Well, the DBMS can provide some. It's just that the types you write aren't any more or less "special" than a system-provided type. You can even see relational algebra as a set of operations over a relation type. The only thing special is that the RDBMS holds only a set of relvars (relation-valued variables) - that's it.
And you only need to reassemble data that you've dis-assembled, and hopefully you did that for a good reason (e.g. to derive new facts from it in concert with other disassembled data). Any form of analysis and design involves selecting and discarding something, and decide what your basic "atoms" (types) are is one of those choices.
Also keep in mind that XML is a domain, and that "boolean validate(XMLDoc, XMLSchema)" might be a valid operation defined over that domain. In that sense XML is also a type generator, though... and there's still no reason a DBMS has to know about it, other than as a convenience built-in type.
> There is
> more than one database hosted by a SQL-DBMS where the users opted to put a
> delimted list in a field, but then they have work to cope with that.
True, but all of these have pros and cons:
a. comma-delimited string list
b. list elements broken into another relation with an order of some sort (if
it's relevant to the business in a non-reporting context)
c. a "native" list type
d. a list type specific to the scalars being stored
> > Well, if you can use constraints specified in XML format, you're a
better
> > human than me.
>
> Since XML is hardly a dictatorial format, it is pretty easy to specify a
> constraint with it -- just stick a SQL statement between tag delimiters
;-)
> [ignore that]
Heh. But as a serious aside, we're doing just that here at work - embedding Jelly (an XML-based Java tag language, sort of) script fragments into another custom XML schema. It's a nightmare, but I can't dissuade anyone.
And yes, XML is dictatorial. Everything is subservient to the root element, either directly or indirectly. Messy when you start spitting out multiple XML docs based on different cross-sections of some business domain, and they have overlapping elements not at the same level of the tree... then you essentially end up with.
> > But certainly there are some things you actually want the users not to
> > override, right? In any event, you argue for a domain-specific language
> > interpreted in your app - a good thing. But you have to keep in mind
that
> > just because users can specify some things doesn't mean they'll be good
at
> > it, and your app has to have enough structure to not fall apart when
they
> > screw up.
>
> Yes, it is a risk-assessment thing. SQL-RDBMS implementors often treat
all
> constraints as if they have an equal risk related to every constraint.
Agreed, it's a messy line - any app that uses, for example, a rules engine has to know at what point the app ends and the rules begin. And then there's always the tendency to drift, but at a certain point what you're doing in rules is programming, and you haven't saved yourself anything because the users will still become frustrated. So you have to make the call, no doubt about it.
> > > We don't want users knocking down supporting walls without
> > > involving IT professionals, but we have got to let them hang pictures
> > > and do other makeovers of their applications without requiring an
extreme
> > makeover (using doctors).
> >
> > Yes, true. This can be a slippery slope, but I think this sort of
> necessity
> > argues for better typing, not less typing.
>
> I'm still trying to figure out what I think about typing -- what, when,
> where, how -- there is something in current SQL-DBMS approach that really
> bugs me and I haven't pinpointed it but it is related to the brittle
nature
> of many database implementations.
Couldn't agree more - shoddy support for user-defined types is one of the major failings of SQL and SQL DBMSs (which support even the SQL standard only in varying degrees). Date offers a good analysis of SQL's failings in the relational telephone book.
> > > Relational theory says it is about how to view the data, not how it is
> > > stored. In that case, the users know how they want to view it and the
> > > constraints should originate from there.
> >
> > Well, if by users you mean all of them, then yes. VPs, CEOs,
accountants,
> > auditors, etc. are all users, to some extent. You have to take the body
of
> > requirements into account, not just Joe at the data-entry desk.
>
> Yup, I had even forgotten about Joe, but it was a collective statement,
not
> singling out one user's requirements.
Good, I was hoping you weren't. Then in my experience, those views vary GREATLY across the aggregate user base. For example, in an automotive paint formula database I worked on, the usual customer service call started with a color code, then drilled into a "variant", then into available paint systems, finally into a formula. However, another fairly common call began with a system and drilled up to color codes, and then "sideways" (with respect to that previous model) to car models and years. Still another dealt entirely with codes and models. Of course, the labs wanted a different view, starting with paint systems and accumulating various aggregate counts of color codes, variants, etc. Oh, and...
... but you get the picture. If I'd had to use XML to model any of that, I'd have gone stark raving mad. Using XML as query output, however, wouldn't have been so bad - but the real data relationships (sic) were far more complex than anything I've seen done in XML.
> > > I want to design constraints from front to back and not the other way
> > around.
> >
> > Why? They come from both directions. It depends on which interested
> parties
> > ("users" is too specific) you mean. The government has requirements,
> too...
>
> When I typed that, I knew you would have legitimate disagreement, but I
> still pretty-much agree with my statement because that government
> requirement comes into play when a user identifies it as required for the
> organization. The government doesn't interface directly with DBA's (or
> shouldn't) in non-government organizations. So I don't see another
> direction, other than end-users, from which these requirements should
come,
> but if conflicting requirements come in, that must be negotiated and we
> wouldn't want to implement something that would reduce the system qualtiy
> (reliability, security). But semantic constraints should all arise from
the
> users of the system.
Yes, I completely agree with the above.
> > Overconstraint is a problem too, as you indicate. No need to specify
> numbers
> > only for an ID that the users determine, typically. And even if it is
> > system-generated (and thus probably numeric), that's no reason to store
it
> > only as digits.
>
> Agreed.
A simple rule of thumb in any language: if ya ain't doin' math or array indexin' with it, it ain't a number.
> > Absolutely. This is where declarative constraints become absolutely
> > essential - the engine should be able to do all of the above, with
little
> > work from the developer other than stating the required constraint, and
> > perhaps a little about how the result is to be reported.
>
> That works for me. I don't care if the constraints are declarative as
long
> as the application can use a service to get what it needs. That service
> (future-style DBMS, perhaps) can have declarative constraints, XML docs,
OO
> objects or whatever as input to the constraints engine.
Good enough. Another interesting question (to me) is to what extent the application "uses" the service, as opposed to the service being used to generate or contrain the application. Code generation again... surely with a relational data store, and a slight bit of path definition, I can generate a good-enough UI to maintain the data along that path? And if the UI mappings are flexible, I can move and color and do all those other UI niceties...
Obviously this takes some work, but it's amazing how many reports and UIs and such are essentially the same - fixed-length paths of data traversal, superimposed on a relational design. Why not XML or such, since we're talking paths? Because you can lay any number of traversal paths on top of a relational store, whereas a predefined path gets you 1 UI (or at least a much more limited number).
> > Any SQL DBMS worth its salt can handle variable-length strings, but I
> still
> > disagree about the XML. I have yet to see a case where XML didn't
> completely
> > annoy both end user and developers.
>
> I can sympathize with that, but having coded many JCL (job control
language)
> "card decks" in a previous life, as well as passing lots of common-quote
> files around, I'm fine with replacing parm declarations and data passing
> with XML documents. It lacks elegance on the one hand, but provides a
> viable standard that is AT LEAST NOT FLAT RELATIONS!! (ducking)
Data passing is one use for XML - I think it works effectively only when the ratio of data (text) to markup is small, but that's a side issue. When the interfaces are thin (fewer discrete styles of requests), it works. When the client can fire off any number of queries, you end up with many overlapping XML schema, all essentially views derived from something that isn't hierarchical at all.
> I really, really would like to see some of the expense of data handling
> removed across the board in our industry and it appears to me that it is
in
> the interface with the SQL-DBMS that there is a huge cost that could be
> lessened. But that's just a hunch at this point ... still workin' on it.
> Cheers! --dawn
Could be. So there are two directions to go in: make languages more relational so the interface is intuitive, or make the database more hierarchical. I prefer the former, obviously... but I think that mapping layers are always going to be problematic, because of this mismatch. I'm speaking on this primarily from Java and Oracle experience, with some Castor (XML mapping) and TopLink (SQL mapping) as intermediaries.
- erk