Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: PLS/Excalibur vs. Oracle Text Cartridge

Re: PLS/Excalibur vs. Oracle Text Cartridge

From: Al Landeck <landeck_at_math.ohio-state.edu>
Date: 1998/05/22
Message-ID: <6k4f6h$r28$1@mathserv.mps.ohio-state.edu>

Sorry it has taken me so long to respond, I don't read this group(s) more than twice a week... :-)

In article <355c2370.2584906_at_newshost.us.oracle.com>, Joel R. Kallman <jkallman_at_us.oracle.com> wrote:
>Note that this is *not* a flame. I am simply correcting the
>inaccurate statements contained in this response
>
>On 15 May 1998 00:39:46 GMT, landeck_at_math.ohio-state.edu (Al Landeck)
>wrote:
>
>>Before going any further it is important to note that I am an Informix
>>employee in the DataBlade engineering group.
>>
>>From what I have seen of the ConText middleware (because it is a cartridge
>>in marketing terms only) it seems buggy and slow. Typical of many of these
>
>From what I have personally deployed and personally seen at many
>customer sites, it is very fast and very scalable.

I have my doubts about how scalable this will end up being for very large deployments. Part of the reason is how the Oracle query optimiser needs to handle ConText calls. If you perform a join and use a 'contains()' clause in your SQL (Contains is a common feature of text/database engines) the optimiser seperates out the contains() and sends that across the shared memory interface to the ConText application. That architecture has a definite potential for performance problems as it tries to scale up.
>
>>Oracle Data Cartridge solutions is that the data is stored outside the
>>database.
>
>ConText stores *its* data in an Oracle database. The application
>developer has the design option of storing the text information they
>want to index either inside the database (in a number of differing
>column datatypes), or the text could be referenced by a filesystem
>specification or URL.

While you have the option of where the data is stored, any large systems are going to need to store their information via filesystem/URL. Certain quantities of data just exceed the ability of these database systems to store them internally. The trick becomes making data that is stored outside the database fall under full transactional control. If you have 500,000 text documents of mixed types (MS Word, PDF, HTML, ASCII Text) and assume that the size ranges between 2000 bytes (for smallish Text Files) and 4 MB (for large Word documents) with the mean somewhere around 500 KB you get a document repository of 1/4 of a Terabyte without even trying.

In this scenario, with the way ConText handles data, it keeps indices within the Database and files outside. It then interfaces between the external files with a shared memory interface. Oracle has a number of triggers implemented at a low level which get handled in SPL by the ConText application which runs side-by-side with the Oracle database server. What this means is that if one gets a server crash mid-transaction doing ConText update or write work then the changes are un-logged and you have just lost the transactional integrity for your database.

>
>>As a result, queries don't seem to scale very well and are
>>held up by the CORBA architecture underlying the systems. The other
>
>There is not a CORBA architecture underlying the integration between
>the ConText cartridge and the Oracle server. I have no idea where
>this comment emanated from.

You are correct, I was mistaken, CORBA is the direction cartridges in general are going, I had assumed that ConText followed that model and was already there.
>
>>problem with ConText as that if you should want to hook up any of the
>>other cartridges with it, you need to purchase the Oracle Universal server
>
>The ConText Cartridge is available as an option to both the Oracle8
>Server and the Enterprise Edition. The Video Cartridge is available
>as an option to both the Oracle8 Server and the Enterprise Edition.
>Check out http://www.oracle.com/st/products/uds/oracle8/.
>
>>that was modified to handle that sort of data. So in reality Oracle has
>>about 6 different Universal Servers. In other words, 2 cartridges, 2 servers
>>and all of the problems that go along with distributed joins and CORBA.
>>
>>The Verity DataBlade seems to be much better in the most recent release.
>>The query performance is roughly 10 times what it was 6 months ago and
>>the early complaints of bugs are now addressed. The PLS datablade only
>>exists for the Illustra server. The other DataBlade to look at is the
>>Excalibur Text DataBlade. You can infact use both simultaneously on the
>>same data and create both Verity and Excalibur text indices on the columns
>>in question. Each DataBlade exposes a slightly different set of search
>>capabilities. We currently have some newspapers with 100,000+ documents
>>each with query times in the 10 second range.
>
>See:
>
>http://www.oracle.com/st/cartridges/context/html/context_customers.html
>
>to see ConText in action at many publicly available sites (CNN, PR
>Newswire, etc.)
>
>>
>>It is important to remember that if all you are going to be storing is text
>>documents then a straight PLS or Verity or Fulcrum system will probably
>>outperform both ConText and Informix, since that is all it designed to do.
>>The real power of these systems comes when you are relating different kinds
>>of data in the database together and need to do queries. An example of
>>this could be "Find documents that contain foo and were submitted within
>>100 miles of our sales office". Or another one might be "Find all Video clips
>>that contain Jerry Seinfeld and also show the contracts specifying the rights
>>to that particular clip."
>>
>>I hope this helps
>>Al Landeck
>>landeck_at_informix.com
>>Principal Engineer, Video DataBlade
>>
>>In article <6jegcb$7hg$1_at_apple.news.easynet.net>,
>>Wai dat Chan <waidat_at_flirble.org> wrote:
>>>
>>>Hi all,
>>>
>>>I was wondering if anyone out there has had experience of both
>>>using the Oracle Text Cartridge and either PLS or Verity text
>>>datablade.
>>>
>>>If anyone could provide me with a brief comparison of the two,
>>>or pros and cons/limitations I'd be very grateful.
>>>
>>>We currently use PLS indexing on about 5000 records, each with
>>>about 1k -> 7k worth of text. Speed is a priority.
>>>
>>>
>>>Ta lots,
>>>
>>>
>>>Wd.
>>
>>
>
>Thanks!
>
>Joel
>
>Joel R. Kallman
>Oracle Government, Education, & Health
>Columbus, OH http://govt.us.oracle.com
>jkallman@us.oracle.com http://www.oracle.com
>
>----
>The statements and opinions expressed here are my own
>and do not necessarily represent those of Oracle Corporation.
Received on Fri May 22 1998 - 00:00:00 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US