Re: In an RDBMS, what does "Data" mean?
Date: Thu, 10 Jun 2004 19:48:04 GMT
Message-ID: <UV2yc.2595$tp6.675_at_newssvr15.news.prodigy.com>
"Dawn M. Wolthuis" <dwolt_at_tincat-group.com> wrote in message
news:ca874t$g9t$1_at_news.netins.net...
> "Eric Kaun" <ekaun_at_yahoo.com> wrote in message
> news:5k2xc.6205$4b2.1710_at_newssvr32.news.prodigy.com...
> > I am all for lowering this cost - decreasing the "impedance mismatch",
so
> to
> > speak. However, I think my ideas move in the opposite direction - making
> > application languages more relational, rather than DBMSs more procedural
> (or
> > OO, if you like).
>
> And the likelihood of that is ... NIL (choosing not to use that NULL set
> designation). Why? Because people tend to choose solutions that work.
If
> there were overwhelmingly good evidence that you get a better bang for the
> buck by using relational theory, that would be a different story. I'd
> strongly suggest we nudge relational databases toward pragmatism ;-)
Several things here:
1. I doubt "overwhelmingly good evidence" motivated people to pick up Pick
(or any other technology)
2. People tend to use that with which they're familiar
3. The market doesn't necessarily guarantee anything (and no, I'm not
anti-capitalism) about what it produces
4. The gadgeteering which makes software development fun also produces a
tendency to wallow in what you know, and in what seems "cool", orthogonal to
any actual value
5. Finally, the criteria businesses have for their solutions also factors in
learning ability, prevalence of those taught in technology X, etc.
> > Agreed - however, while my experience comes from a large company, it's
work
> > done for a relatively small business unit. I was the only developer on
> > several of the projects, and my user base was fairly small. I was DBA,
> > developer, customer support, etc. And I still found the relational
> metaphor
> > (even though I had to use SQL) much easier than XML.
>
> Didn't some of that have to do with having to perform conversions to and
> from XML which might not have been necessary if the data were stored in
the
> way it was sent? OR was it the loosey-gooseyness of it where there are
not
> as many texts with rules for "how to"?
Good questions.
- The XML was mapped into Java objects (actually object graphs), which turns out to be trivial.
- However: that requires lots and lots of redundant and overlapping methods to query that object graph (e.g. I want to find a certification with a destination URL of "http:blah", so I write a method, then later I need to find one by ID, etc. etc.). I can get around it (now) using expression-parsing libraries (JXPath, Jelly, others). Still, those aren't type-checked, which gives them some agility but gives me less comfort (I've seen how easy it is for them to go wrong type-wise)
- Doing "agile XML", at least here, resulted in multiple files with overlapping attributes. True, some of that was just bad design, but from what I've seen here, those apps using Oracle are more disciplined. Maybe because they have to be? Are just encouraged to be? Not sure...
- XML's type system is impoverished - some validation is easy, but no constraints
- We use libraries for XML, including Castor to generate Java objects, so the how-to isn't lacking in this case.
> > I've never used Pick -
> > sounds like their environment gives them a lot of power, and while
that's
> > nice, I'd still never think of thinking of an invoice as a single
> > proposition or "object". It's not.
>
> Perhaps you've never seen one? ;-)
> > It's a fairly complex series of them.
>
> That too, but through how many portals would you want to have to go to
> collect all such? This has to do with how the "user" (application
developer
> or dba, for example) should view the data.
> > Just like an "order", an invoice is a fairly complex confluence of
> > phenomena, and not even a static one (modifications / confirmations to
> > various invoice "pieces" was common in my world, as an invoice was often
> > correlated with multiple shipments and warehouses).
> >
> > > I can't speak for Anthony and Dawn, but I place more value not on the
> > > original inputs but the original concept. An invoice _is_ something
> that
> > > usually has multiple items ordered.
>
> Yes and I'm trying to narrow that down a bit while trying to tap into just
> how I do database design given that I don't start with 1NF. It has to do
> with people, places and things and entities that are not functional
> dependent on any other entities in the system. What is that top level of
> nodes after ENTITY in a system, such as PEOPLE PLACES THINGS.
Ah, I see. Yes, I agree that those drive UIs, reports, etc. - at least for a while. I focus on those technologies that will make that part easy, AND give me some assurance in their consistency and that I can drive more complex requirements easily. And those complex ones always arise quickly, I've found... if I've oversimplified early (and I've done the entity/object style of design before), I usually regret it. Sometimes that's warranted, if time-to-market is the critical success factor.
> > And I disagree. An invoice is many somethings. If your questions deal
only
> > with the set (e.g. presenting an invoice on a screen), then great -
treat
> it
> > as one. But when you're attempting to analyze the distribution of parts
> > across warehouses and across time, "viewing" the invoice as a number of
> > components is far, far more useful.
>
> I see where you are coming from. No, an invoice is just one of these
> things, but the data from the invoice is also available through other data
> portals (for lack of a better word -- don't make me use the word "view"!)
> such as warehouses and parts. I can see that one difference is that the
> same data from my perspective is available as an invoice and as
> parts-invoiced. These are different entities with the same or similar
data
> accessed. Each portal can see everything you can "get to" from there (via
> declared links as one might have in a join statement).
> > So it depends on your needs, but I'd far
> > rather place my bet on something that allows me to scale my queries and
> > reports to more detailed questions than one that restricts me. And I
still
> > think having to correlate multiple line-item attributes across multiple
MV
> > attributes in a single File is nonsensical and error-prone.
>
> I'll grant that there are pros and cons and not everyone designs an
invoice
> identically no matter what the database, but when you add in the virtual
> fields (derived data or data found elsewhere), the INVOICE vocabulary for
> everyone has what it needs to show an invoice.
And I think I'm seeing more and more value to a path-like / hierarchical expression as a user tool. I see it as best layered atop relational, since I anticipate more views (if my data is useful, and I'm trying to help the business's departments interoperate) but I think we agree philosophically with the notion of packaging for the user.
> > > It is an object in and of itself that
> > > needs no "chopping up", so to speak.
> >
> > Yes, it does. "Analysis" means chopping up. We gain power in chopping
up.
>
> and putting back together
>
> > Our problems are solvable when they're chopped; our solutions are
scalable
> > and provable when they're chopped.
>
> again, I think you are confusing something here -- perhaps physical and
> logical (although I think I've ascertained that would not be like you) but
> perhaps it is your notion that data can only be accessed through one
place -
> it's base relation. Remove that obstacle -- free yourself. Yes, we still
> divide it all up, but into wholes, not pieces.
I agree, and didn't mean to give the impression that data should only be accessed through base relations. Far from it. Relations are a necessary (to me) but not sufficient condition for good application design.
> > Domains are intellectually tractable when
> > they're separated. Holism may be fine in medicine (???) where human
> > psychology is involved, but any translation of a "real world" domain to
an
> > automated system involves "chopping up." You can either acknowledge it
and
> > chop in a rational way, or pay the price later on.
>
> yes, there is some chopping up and the functional dependency thing takes
you
> quite far for that, even if you allow for both scalar values and compound
> ones (such as lists).
For users, yes, lists are useful (I'd argue that sets are more often, and that relations are even better, but I'll lighten up on that). The other linchpin of relational, of course, is types. I distrust technologies with weak typing, but that's a different discussion; suffice it to say that having a LINE_ITEMS attribute in a file would make me far less queasy if the elements of that list were real objects, with real operations defined over them.
> > > This is where simpler means don't destroy the properties of the
invoice
> in
> > > order to make the data fit into an arbitrary data model with
> tautological
> > > axioms and theorems.
> >
> > Tautological? Arbitrary? Any logical model is arbitrary; an invoice has
no
> > shape, or at least none beyond that of a piece of paper, and as I've
said,
> > if all they want to do is store the invoice, let's scan the thing into a
> JPG
> > and be done with it.
>
> No, the data needs to be available to other entities as well, as you
pointed
> out.
Sure, I was being facetious - so there are 2 questions:
1. What is the nature of the "other entities" that will need to use the
data?
2. In what form does the data need to be to provide those entities with easy
access; and even to make those entities easy to develop?
> > "Making the data fit" is also nonsense; whatever physical and logical
> model
> > you choose, you're pushing the data into something. You can either push
it
> > into something with maximum power or a lesser degree of power. Perhaps
you
> > gain short-term efficiency; in my experience with XML, you gain squat.
> >
> > > Keep the business objects as close to what they are.
> >
> > So forgetting an invoice for a moment, what "is" a paint color? A paint
> > formula? A carmaker code? A digital certificate store? What's their
> "natural
> > form"?
>
> It is relational folks who become democratic about this and start thinking
> about understanding the nature of any particular noun outside of its use
in
> "this" context. Define it based on its use and if a new use comes up,
> redefine it if necessary, otherwise add qualifiers to it.
> > That can do what - model arbitrary data in its "natural form", whatever
> that
> > means? I agree. If you show that to me, I'll use it.
>
> as entities. Still working on how to show it.
I'm getting the idea.
> > I hope so - that would be nice. I think XPath and XQuery, while
> convoluted,
> > are reasonable enough operators over an XML type / type generator. I
just
> > see far more benefit from the structures and declarative constraints of
> > relational.
>
> Have you found that when you map from xml to relational, you don't need to
> add anything to the information in your source, but when you go the other
> direction, you need to add data (such as ordering)?
Most of the XML I deal with requires no ordering, so that's a wash. I think XML is a relatively poor notation for anything requiring explicit ordering, but that's just my gut feel. Usually I find hints that the XML designer really wanted relational; they've got IDs and IDREFs, and then in the code they're manually coding searches through the hierarchy - which is where an in-memory RDBMS would be nice. It's not so horrid now, but in this industry (print industry), there's a standard called JDF that is currently manifesting itself as a 1.36MB set of XML Schema specs. Needless to say, there are LOTS of cross-links, and regardless of the storage technology, relations would have helped break this down considerably... even with the ordering requirements (which are there, but much less than the cross-linked references to node IDs).
- erk