Home » RDBMS Server » Performance Tuning » Perfomance problem with select distinct from cartesian join with OR-ed filter (Production: (11.2.0.1, Windows Server x64); Test: (9.2.01, Windows XP))
Perfomance problem with select distinct from cartesian join with OR-ed filter [message #522536] |
Mon, 12 September 2011 04:21 |
|
malishich
Messages: 8 Registered: September 2011 Location: ALA
|
Junior Member |
|
|
Hello all.
Having production system: 11.2.0.1 on Windows Server x64
Test system: 9.2.0.1 on Windows XP
Problem preface: to get all unique CASEID which should be checked up by biometric system.
What i should check - all CASEs for different PERSONs having same PHONEs at least among one phone type (1..4).
Real table contains little bit more than 10 million records.
I made test scripts.
Below the DDL for test table creation:
------------------------------------------
-- Create CASEINFO test table
------------------------------------------
DROP TABLE CASEINFO;
CREATE TABLE CASEINFO
(
CASEID INTEGER,
PERSON INTEGER,
SEX INTEGER,
FED INTEGER, REG INTEGER,
PHONE1 VARCHAR2(24),
PHONE2 VARCHAR2(24),
PHONE3 VARCHAR2(24),
PHONE4 VARCHAR2(24)
);
ALTER TABLE CASEINFO ADD PRIMARY KEY (CASEID) USING INDEX;
CREATE INDEX I_PERSON ON CASEINFO (PERSON);
CREATE INDEX I_SEX ON CASEINFO (SEX);
CREATE INDEX I_FED ON CASEINFO (FED);
CREATE INDEX I_REG ON CASEINFO (REG);
CREATE INDEX I_FED_REG ON CASEINFO (FED,REG);
CREATE INDEX I_PHONE1 ON CASEINFO (PHONE1);
CREATE INDEX I_PHONE2 ON CASEINFO (PHONE2);
CREATE INDEX I_PHONE3 ON CASEINFO (PHONE3);
CREATE INDEX I_PHONE4 ON CASEINFO (PHONE4);
CREATE INDEX I_PHONES_12 ON CASEINFO (PHONE1,PHONE2);
CREATE INDEX I_PHONES_13 ON CASEINFO (PHONE1,PHONE3);
CREATE INDEX I_PHONES_14 ON CASEINFO (PHONE1,PHONE4);
CREATE INDEX I_PHONES_23 ON CASEINFO (PHONE2,PHONE3);
CREATE INDEX I_PHONES_24 ON CASEINFO (PHONE2,PHONE4);
CREATE INDEX I_PHONES_34 ON CASEINFO (PHONE3,PHONE4);
Below i've put SQL/DLL to make test data.
Be aware, number of records inserted 2 millions.
If you need other amount # for test, please set
PERSON_COUNT := #/8;
------------------------------------------
-- fill CASEINFO with sample data
------------------------------------------
DECLARE
I INTEGER;
R1 INTEGER;
R2 INTEGER;
N INTEGER;
M INTEGER;
OBJECTS_PER_PERSON INTEGER;
APERSON INTEGER;
PERSON_COUNT INTEGER;
CRIMINAL_PERCENT INTEGER;
RNDSEED NUMBER(20);
APHONE VARCHAR2(24);
BEGIN
OBJECTS_PER_PERSON := 8;
PERSON_COUNT := 250000;
CRIMINAL_PERCENT := 3;
N := OBJECTS_PER_PERSON * PERSON_COUNT;
DBMS_OUTPUT.PUT_LINE('Record count '||TO_CHAR(N)||' ...');
FOR I IN 0..N-1 LOOP
APERSON := MOD(I, PERSON_COUNT);
INSERT INTO CASEINFO
(
CASEID,
PERSON,
SEX,
FED,
REG,
PHONE1,
PHONE2,
PHONE3,
PHONE4
)
VALUES
(
I+1,
APERSON+1,
MOD(APERSON,2),
MOD(I,13)+1,
MOD(I,97)+1,
'+7727'||
TRIM(TO_CHAR(APERSON+1,'0000009'))||'1',
'+7727'||
TRIM(TO_CHAR(APERSON+1,'0000009'))||'2',
'+7727'||
TRIM(TO_CHAR(APERSON+1,'0000009'))||'3',
'+7727'||
TRIM(TO_CHAR(APERSON+1,'0000009'))||'4'
);
IF MOD(I,1000)=0 THEN
COMMIT;
--DBMS_OUTPUT.PUT_LINE('Done '||TO_CHAR(I)||' ...');
END IF;
END LOOP;
COMMIT;
------------------------------------------
-- Lets make "CRIMINAL" CASEs among which we're searching for
--on 11.2.0.1 next line isn't working
--SELECT hsecs INTO RNDSEED FROM gv$timer;
DBMS_RANDOM.INITIALIZE(131071);
M := TRUNC(N * CRIMINAL_PERCENT / 100, 0);
DBMS_OUTPUT.PUT_LINE('Criminal count '||TO_CHAR(M)||' ...');
FOR I IN 0..M-1 LOOP
--Phone1 => Phone2
R1 := MOD(ABS(DBMS_RANDOM.RANDOM), N)+1;
R2 := MOD(ABS(DBMS_RANDOM.RANDOM), N)+1;
APHONE := '+7-CRIMINAL-'||
TRIM(TO_CHAR(R1,'0000009'))||'-12';
UPDATE CASEINFO SET PHONE1=APHONE WHERE (CASEID=R1);
UPDATE CASEINFO SET PHONE2=APHONE WHERE (CASEID=R2);
--Phone1 => Phone3
R1 := MOD(ABS(DBMS_RANDOM.RANDOM), N)+1;
R2 := MOD(ABS(DBMS_RANDOM.RANDOM), N)+1;
APHONE := '+7-CRIMINAL-'||
TRIM(TO_CHAR(R1,'0000009'))||'-13';
UPDATE CASEINFO SET PHONE1=APHONE WHERE (CASEID=R1);
UPDATE CASEINFO SET PHONE3=APHONE WHERE (CASEID=R2);
--Phone1 => Phone4
R1 := MOD(ABS(DBMS_RANDOM.RANDOM), N)+1;
R2 := MOD(ABS(DBMS_RANDOM.RANDOM), N)+1;
APHONE := '+7-CRIMINAL-'||
TRIM(TO_CHAR(R1,'0000009'))||'-14';
UPDATE CASEINFO SET PHONE1=APHONE WHERE (CASEID=R1);
UPDATE CASEINFO SET PHONE4=APHONE WHERE (CASEID=R2);
--Phone2 => Phone3
R1 := MOD(ABS(DBMS_RANDOM.RANDOM), N)+1;
R2 := MOD(ABS(DBMS_RANDOM.RANDOM), N)+1;
APHONE := '+7-CRIMINAL-'||
TRIM(TO_CHAR(R1,'0000009'))||'-23';
UPDATE CASEINFO SET PHONE2=APHONE WHERE (CASEID=R1);
UPDATE CASEINFO SET PHONE3=APHONE WHERE (CASEID=R2);
--Phone2 => Phone4
R1 := MOD(ABS(DBMS_RANDOM.RANDOM), N)+1;
R2 := MOD(ABS(DBMS_RANDOM.RANDOM), N)+1;
APHONE := '+7-CRIMINAL-'||
TRIM(TO_CHAR(R1,'0000009'))||'-24';
UPDATE CASEINFO SET PHONE2=APHONE WHERE (CASEID=R1);
UPDATE CASEINFO SET PHONE4=APHONE WHERE (CASEID=R2);
--Phone3 => Phone4
R1 := MOD(ABS(DBMS_RANDOM.RANDOM), N)+1;
R2 := MOD(ABS(DBMS_RANDOM.RANDOM), N)+1;
APHONE := '+7-CRIMINAL-'||
TRIM(TO_CHAR(R1,'0000009'))||'-34';
UPDATE CASEINFO SET PHONE3=APHONE WHERE (CASEID=R1);
UPDATE CASEINFO SET PHONE4=APHONE WHERE (CASEID=R2);
--
COMMIT;
END LOOP;
COMMIT;
DBMS_RANDOM.TERMINATE;
END;
Below SQL select to check the data in created table.
------------------------------------------
-- Check test data counters
------------------------------------------
SELECT 'TOTAL',count(*) from CASEINFO
UNION ALL
SELECT 'LEGAL',count(*) from CASEINFO where
(PHONE1 = '+7727'||
TRIM(TO_CHAR(PERSON,'0000009'))||'1') AND
(PHONE2 = '+7727'||
TRIM(TO_CHAR(PERSON,'0000009'))||'2') AND
(PHONE3 = '+7727'||
TRIM(TO_CHAR(PERSON,'0000009'))||'3') AND
(PHONE4 = '+7727'||
TRIM(TO_CHAR(PERSON,'0000009'))||'4')
UNION ALL
SELECT 'CRIMINAL',count(*) from CASEINFO where
(PHONE1 <> '+7727'||
TRIM(TO_CHAR(PERSON,'0000009'))||'1') OR
(PHONE2 <> '+7727'||
TRIM(TO_CHAR(PERSON,'0000009'))||'2') OR
(PHONE3 <> '+7727'||
TRIM(TO_CHAR(PERSON,'0000009'))||'3') OR
(PHONE4 <> '+7727'||
TRIM(TO_CHAR(PERSON,'0000009'))||'4')
UNION ALL
SELECT 'PHONE1',count(*) from CASEINFO where
(PHONE1 <> '+7727'||
TRIM(TO_CHAR(PERSON,'0000009'))||'1')
UNION ALL
SELECT 'PHONE2',count(*) from CASEINFO where
(PHONE2 <> '+7727'||
TRIM(TO_CHAR(PERSON,'0000009'))||'2')
UNION ALL
SELECT 'PHONE3',count(*) from CASEINFO where
(PHONE3 <> '+7727'||
TRIM(TO_CHAR(PERSON,'0000009'))||'3')
UNION ALL
SELECT 'PHONE4',count(*) from CASEINFO where
(PHONE4 <> '+7727'||
TRIM(TO_CHAR(PERSON,'0000009'))||'4');
The PROBLEM is that i am experiencing HUGE perfomance problems on both test and production systems with that query:
select distinct b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and
(
(a.phone1=b.phone1) or
(a.phone1=b.phone2) or
(a.phone1=b.phone3) or
(a.phone1=b.phone4) or
(a.phone2=b.phone1) or
(a.phone2=b.phone2) or
(a.phone2=b.phone3) or
(a.phone2=b.phone4) or
(a.phone3=b.phone1) or
(a.phone3=b.phone2) or
(a.phone3=b.phone3) or
(a.phone3=b.phone4) or
(a.phone4=b.phone1) or
(a.phone4=b.phone2) or
(a.phone4=b.phone3) or
(a.phone4=b.phone4)
);
This query takes almost 90 minutes to execute.
And i do not know how to avoid this.
Full SQL file to make test attached.
Any help appreciated.
Even data restructurization.
|
|
|
|
|
|
|
Re: Perfomance problem with select distinct from cartesian join with OR-ed filter [message #522553 is a reply to message #522550] |
Mon, 12 September 2011 06:30 |
|
malishich
Messages: 8 Registered: September 2011 Location: ALA
|
Junior Member |
|
|
According my test data actually i will have in worst case (because of usage DBMS_RANDOM.RANDOM) 720 000 UNIQUE records.
On small data amount (10 000 records) it works great.
Even - i've made new same table from real and copied into it first 20 000 records - work great with DISTINCT.
But for whole table of 10 million records - it is awesome...
At now guys recommended me to split my query to
simple small queries and make UNION or results.
Code below:
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone1=b.phone1)
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone1=b.phone2)
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone1=b.phone3)
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone1=b.phone4)
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone2=b.phone1)
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone2=b.phone2)
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone2=b.phone3)
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone2=b.phone4)
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone3=b.phone1)
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone3=b.phone2)
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone3=b.phone3)
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone3=b.phone4)
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone4=b.phone1)
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone4=b.phone2)
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone4=b.phone3)
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone4=b.phone4)
Above query work MUCH better (execution was done just now for almost 175 seconds) and it returned 286 083 UNIQUE records.
But i need faster... sure... we didn't stop on half a way!...
|
|
|
|
Re: Perfomance problem with select distinct from cartesian join with OR-ed filter [message #522556 is a reply to message #522553] |
Mon, 12 September 2011 06:36 |
cookiemonster
Messages: 13958 Registered: September 2008 Location: Rainy Manchester
|
Senior Member |
|
|
This:
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone1=b.phone2)
Will return exactly the same data as:
UNION
select b.caseid
from CASEINFO a, CASEINFO b
where (a.person<>b.person) and (a.sex=b.sex) and (a.phone2=b.phone1)
So you've got more checks than you need.
However I'd change the data model to third normal form and have a seperate table for phone numbers. That'll simplify the sql further and most likely speed it up.
|
|
|
Re: Perfomance problem with select distinct from cartesian join with OR-ed filter [message #522557 is a reply to message #522556] |
Mon, 12 September 2011 06:47 |
|
malishich
Messages: 8 Registered: September 2011 Location: ALA
|
Junior Member |
|
|
I think (a.phone1=b.phone2) and (a.phone2=b.phone1) are NOT same, because selecting always b.caseid. But i need all CASEs.
So if i have this data (assume other checks done):
CASEID, PHONE1, PHONE2
1, 1_1, 1_2
2, FOO, 2_2
3, 3_1, 3_2
4, 4_1, FOO
5, 5_1, 5_2
I should get both CASEID: 2 and 4
|
|
|
|
|
|
Re: Perfomance problem with select distinct from cartesian join with OR-ed filter [message #522722 is a reply to message #522706] |
Tue, 13 September 2011 05:52 |
|
malishich
Messages: 8 Registered: September 2011 Location: ALA
|
Junior Member |
|
|
I had checked this your query on 2 million records (casedata 2 million, casephone 8 millions, because of 4 phones for each person). Results correct, but...
Your query takes 10 minutes to execute on 11.2.0.1
The fastest solution I found at now is,
at first, make single special data table.
In this table all fields NON UNIQUE (but indexed).
CREATE TABLE CASELIST
(
CASEID INTEGER,
PERSON INTEGER,
SEX INTEGER,
PHONE VARCHAR2(24)
);
CREATE INDEX I_CASELIST_CASEID ON CASELIST (CASEID);
CREATE INDEX I_CASELIST_PERSON ON CASELIST (PERSON);
CREATE INDEX I_CASELIST_SEX ON CASELIST (SEX);
CREATE INDEX I_CASELIST_PHONE ON CASELIST (PHONE);
And the select (which executes 76 seconds):
select c.CASEID
from CASELIST c,
(
select b.PHONE
rank() over
(partition by a.PHONE order by a.PERSON) RNK
from CASELIST a,
(
select PHONE
from CASELIST
having count(*)>1
group by PHONE
) b
where (a.PHONE = b.PHONE)
) d
where (d.RNK=2) and (d.PHONE=c.PHONE)
That the situation with cartesian joins in all SQL engines.
Actually i've checked up this on Oracle, SQL Server, PostgresSQL,MySQL,Firebird.
Only oracle can give me such speed.
Others cant (or even cant execute due to SQL lang limitations).
Searching more... Need 30 seconds... at
|
|
|
|
Goto Forum:
Current Time: Mon Nov 25 04:09:25 CST 2024
|