Re: Why doesn't Oracle care about Linux as IBM does?

From: leebert <leebertarian_at_yahoo.com>
Date: Fri, 17 Aug 2001 03:22:47 -0500
Message-ID: <3B7CD457.ED96D293@yahoo.com>

Mark Townsend wrote:

> * Data to partitioned - requiring the addition of a partition key -
> resulting in extensive DDL changes of many tables and indexes. In some
> cases, the ability to use secondary indexes is also lost, and constraint
> checking across a table can also become cumbersome.

Case specific. So far we've suffered only minor changes to DDL, if any. Sometimes the gotcha is matching the partitioning key to other constraints. But very minor changes to DDL & schema that I've seen so far. And yes we've checked our key partitioning and most everything is keeping colocated.

> * Co-location of commonly joined tables on the same node to
> avoid cross-node
> joins and data shipping - imposing a design limitation on the physical
> schema and data placement. For instance, sales data may need to be
> partitioned on product id simply to manage data placement, rather
> than on
> sales date, which is a more useful partitioning scheme for
> data management
> (archiving etc), and query optimization.

The basic hash algorithm works well w/in a logical group of tables, but if the schema uses natural keys (instead of artificially/sequential generated) so much the better. But there's also the ability to set a special partition map to get the exact clustering you want by (binding?) a (large) hash table to the database.

> * Physical replication of tables that cannot be partitioned (i.e a node
> specific table name periofically refreshed from the data in a 'master'
> table) - typically this apporach is limited to read only tables,
> or tables
> that can bare some staleness in their data, due to the cost of a 2 phase
> commit across two or more tables on two or more nodes (and this cost
> increases exponentially as more and more nodes are added to the cluster)

If I understand you correctly, the answer to that is automated summary tables can be used to replicate a table across nodes. Yes, this is not transparent, but I can't see how it creates an exponential cost.

> * To optimise transaction shipping (called local bypass),
> the application
> often needs to be changed to look up the partition map in
> oder to route the
> right transcation to the right node.

DB2 will do this internally. Maybe not most optimal, but beats custom-coding an application if you can avoid it. Yes it adds overhead to DB2. If it's a busy but not massive OLTP application, switched Gig Ethernet can address a good deal of that issue.

The real answer is, it depends.

> The same approach is
> required for data
> load etc, and backup/restore procedures will also need to be
> adapted to become node specific

loading data has limitations, autoloader tool helps greatly.

when backup/restores already tend to be table-space specific, in most cases would only be slightly different in a node-specific context.

> Using EEE for OLTP
> requires some significant hoops to be jumped through.

Ask me in 3 months how it turns out. So far we are running 80 complex txn's/sec on a crippled Intel test cluster w/ a baby SAN device.

There's been some minor hoops so far.

> The speed of light issues of using a cluster are there
> - TANSTAAFL - the 'cost' of managing transcations across more
> than one machine can simply not
> be avoided.

No doubt the transparency of OPS & DB2 MVS is nicer, but the management overheads of OPS and mainframes are notorious. Pick your trade-offs.

EEE inter-node overhead notwithstanding, if the work can be spread out sufficiently by chain-ganging an optimal number of machines, then txn-forwarding & cross-node txn overheads can be kept tolerable, assuming decent colocation / partitioned keys... OLTP tends to not span logical groups too much - small hit here, small hit there, data flow tends to follow functional groups. Yes there's going to be cross-node 2-phase commits where the partitioning keys cross functional groups, but it's not like there's going to be cartesian products all day long.

Also, part of the answer is switched gig ethernet & keeping the # of physical nodes, relative to data volume & traffic patterns, optimal. That is, that the overhead of internode work isn't disproportionate to the volume of data. Back to decent colocation, building a pmap if you have to.

Understood that EEE will scale further on a given hardware set for r-o OLAP (with increasing pressures for the customizations you cited?), w/ OLAP the internode 2-phase txn's aren't so much the source of overhead as are the aggregates coming back to any particular controlling/collector node (2-phase sorts or group by's).

But for *most* OLTP apps, if colocation is generally maintained, the inter-node traffic / data volume ratio shouldn't get out of hand (part of the answer being to avoid mixing heavier OLAP with OLTP, which brings in replication, a completely separate can o' worms....). However, there are very large SAP installations on EEE. SAP, being an ERPS, has a very heavy mix of OLAP & OLTP.

Yes a EEE cluster *is* more complex to manage from one standpoint, but OTOH, Oracle OPS is reknown as being more complex to tune otherwise. Pick your poisons.

To be completely honest, the greatest problem I see is the increasing chance of logical inconsistency across nodes during recoveries as the size of the cluster increases in a busy OLTP environment. Since the risk is hard to assess, I can only imagine that the risk is higher than I wish it were otherwise.

/leebert Received on Fri Aug 17 2001 - 03:22:47 CDT