Advice on how to query data

From: Monty <mmontreaux_at_hotmail.com>
Date: 26 Mar 2005 04:07:14 -0800
Message-ID: <1111838834.904356.245630@l41g2000cwc.googlegroups.com>

Hi, I'm posting to the newsgroup in an attempt to formulate ideas on how to tackle the poor performance of an SQL query I have written. I already have an SQL solution but what I am interested in is one that is far better. I will elaborate more below but here is the project background first.

The organisation I am contracting for is using a 3rd party customer relationship management (CRM) system. This is how it is used. Contact can be made with a customer or prospect, for example customer 100
"Customer A Ltd" (see code below). There are a number of different
kinds on contact - first contact may be made (ContactActID=1), the customer may be phoned (ContactActID=5), the customer may phone the organisation (ContactActID=17), a saleman may call them (ContactActID=51), a contract may be issued (ContactActID=400), a non disclosure agreement may be signed (ContactActID=105), they may phone to ask to be taken off a mailing list (ContactActID=43), mail may be returned "Recipient Unknown" (ContactActID=26) etc. There are currently 46 different ContactActID's but the system has provision for a total possible number 600. The CRM system is built around Lotus notes but replication has been configured into an Oracle database. The applications that are being built on top of this information use the data in Oracle database (and not Lotus Notes).

Every time contact is made, this is recorded through Lotus Notes. In the Oracle database, the basics of what is replicated is a new row in table "contactActs". Table "contactActs" has a mandatory m:1 participation condition with table "contacts" and implemented through a NOT NULL FK constraint. Although things are far more complex than this, for the purposes of a testbed and explanation here, this is the story. Here is some typical code to to create the two necessary tables and some dummy data for testing (this has been produced on Solaris SPARC running 10g [Enterprise Edition] (EE) but the production system currently uses 9i EE/same hardware).

CREATE TABLE contacts(ID NUMBER PRIMARY KEY, orgName VARCHAR2(40)); CREATE TABLE contactActs(ID NUMBER

CONSTRAINT FK_Contacts REFERENCES contacts(ID),

ContactActID NUMBER NOT NULL);

INSERT INTO contacts
VALUES(100,'Company A Ltd');
INSERT INTO contacts
VALUES(101,'Organisation B plc');
INSERT INTO contacts
VALUES(102,'Organisation C plc');
INSERT INTO contacts
VALUES(103,'Organisation D partnership');

--create a whole lot of dummy data
--to test query efficiency
DECLARE
randActivity NUMBER;
randContact NUMBER;
BEGIN
DBMS_RANDOM.SEED(TO_CHAR(SYSDATE,'sshhmm')); FOR noActivities IN 1..100000 LOOP
randContact:=TRUNC(DBMS_RANDOM.VALUE(100, 103.5)); randActivity:=ROUND(DBMS_RANDOM.VALUE(1,600));

  IF NOT (randContact=103 AND randActivity=51) THEN    INSERT INTO contactActs
     VALUES(randContact,randActivity);
  END IF; END LOOP;
END;
/

CREATE INDEX ixcontactActs
ON contactActs(id,ContactActID);

Although I do not have the sourcecode of the CRM system that populates the Oracle database, I am (because I am one of the developers) in control of the system that queries the information replicated within the Oracle database and can therefore have all the sourcecode to optimize these SQL queries in this part of the system. Part of the enhanced system is the ability to search this contact activity information on an ad-hoc basis over a corporate intranet (ie someone is performing a search on the CRM system data in Oracle using the components we are writing to find out who has made customer contact & when, get the full story of contact activity prior to picking up the phone and so on). There are options in this component of the system to enable the person to filter down the resultset. Without going into the detail, there may be a requirement to (which demonstrates this filtering) :-

Give me the organisation name for all customers where the customer has phoned us (ContactActID=17) and also a salesman has called (ContactActID=51). In the real system we would retrieve other details such as dates, salesman notes etc & display this in a browser but this is outside the scope of this NNTP posting.

In terms of SQL, the way this could be coded is (where 17, and 51 were appropriate bind variables for some otherwise static SQL).

SELECT orgName
FROM contactActs acts1,contactActs acts2,contacts con1 WHERE con1.ID=acts1.ID

     AND con1.ID=acts2.ID
     AND acts1.contactActID=17
     AND acts2.contactActID=51
      GROUP BY orgName;

When run against the dummy data created above, this query gives the required result (there will almost certainly be other ContactActID's other than 17 and 51 and this is okay as long as the query returns hits with both). I could use the same statement for ContactActID=43 and 26 or any other two valid ContactActID's without Oracle having to reparse a different bit of SQL. The use of bind variables is critically important because of the often slow query time we are observing and heavy overhead on our database server in addition to just good design. This is also documented many times in the Oracle docs and very clearly in Kyte/Expert One to One Oracle where he writes (p436) "We have seen for example that by not using bind variables, you might spend 90 percent of your execution time parsing queries instead of actually running them....".

The problem I have is that the user of the system may not select 2 (in the above example 17 and 51) different ContactActID to query, but 3, or 4, etc. Here I would have to construct a new SQL query to accomodate 3 ContactActID, or 4, or 5, or 6 etc thereby having to reparse different SQL statements all the time.

For example, if the query were performed for customer 100 where there were ContactActID's 17, 51, 24, 54, 43, and 44 (where all these numbers were bind variables). I could use the following similar SQL. Of course it would need to be reparsed on the database server.

SELECT orgName
FROM contacts con,contactActs acts1,contactActs acts2,

        contactActs acts3,contactActs acts4,
        contactActs acts5,contactActs acts6
    WHERE con.ID=100
     AND con.ID=acts1.ID
     AND con.ID=acts2.ID
     AND con.ID=acts3.ID
     AND con.ID=acts4.ID
     AND con.ID=acts5.ID
     AND con.ID=acts6.ID
     AND acts1.contactActID=17
     AND acts2.contactActID=51
     AND acts3.contactActID=24
     AND acts4.contactActID=54
     AND acts5.contactActID=43
     AND acts6.contactActID=44
      GROUP BY orgName;

If a query was performed with a different 6 (not 17, 51, 24 etc) contactActID's, the server wouldn't have to parse the SQL statement again and I would get the performance benefit from this. Not unsurprisingly however, queries like the one above are VERY SLOW and not suitable for a production environment (often taking tens of seconds using test data typical of what I have created above). Further, if the user decides to query 30 fields and not 6 as shown above, things have clearly gotton out of hand and would start to effect the performance on other systems. Having explained this in this much detail, I write to ask is a better approach to querying information structured like this. Although I do not have control of the structures of tables
"contactActs" and "contacts", I have control over most everything else
so could structure that for maximum efficiency. I just need advice on alternate solutions.

Other solutions that has been suggested in the organisation, are:

Put a trigger on table "contactActs" to populate a third table containing the contact activity information with 600 columns, with 600 indexes, one for each contactActID, and using a huge SQL query to query each column in turn populating the values via bind variables etc. Not elegant. There is no objection to restructuring the data to make the query efficient. The information would have to be sourced from the tables "contactActs" and "contacts" (via triggers etc) but the structure of these two tables cannot change.
A suggestion was made to use the Analytic Functions. We do not currently have significant expertise with Analysic Functions but after a brief investigation concluded these would not aid us here anyway.

Further, over the coming 3 years, we wish to move away from the Lotus Notes system and do not want to currently further enhance integration with it (that is I do not want to query the Lotus Notes database directly to retrieve the customer contact history).

Ideas for other or more performant options anyone? And thank you for reading this far.
Thank you
Monty Received on Sat Mar 26 2005 - 06:07:14 CST