Home » Server Options » Text & interMedia » Oracle Text - wildcard expansion too many rows
Oracle Text - wildcard expansion too many rows [message #312660] Wed, 09 April 2008 10:00 Go to next message
SaraC
Messages: 81
Registered: August 2005
Member
Just typed a big message and lost it all so will try to keep this shorter.

Using Oracle Text 10g, implementing a CONTAINS search. Have created a stoplist using ctx.ddl however it doesnt get any terms added to it since we want to search on all terms including stopwords. so all terms are indexed for searching. ctxsys.context Index has been created, with printjoins specified so special chars are indexed too.

I want to search for all terms in any order.
Trying to run a search using
CONTAINS(term, '(%pain% AND %in% AND %ear%)
gets the message Wildcard query expansion returns too many results error - due to the use of 'in' (ive tried the other terms without in it returns results.

'In' matches a very large number of terms in our dataset >15000 ( there is no max rows defined explicitly as I thought the default was 5000)
The wierd thing is that when I use
CONTAINS(term, '(%pain% AND %the% AND %chest%)
where %the% on its own matches well over 15000 records, a result is returned with no error.
Is this a problem with %in%?
Thanks for any info
Sara
Re: Oracle Text - wildcard expansion too many rows [message #312751 is a reply to message #312660] Wed, 09 April 2008 13:03 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9101
Registered: November 2002
Location: California, USA
Senior Member
In Oracle 10g, the default value for wildcard_maxterms is 5,000 and the maximum that you can set it to is 15,000. In Oracle 11g, the default is 20,000 and the maximum is 50,000. So, until you upgrade, all you can do is set it to 15,000 and find a way to narrow your search terms, perhaps settling for adding "in" as a stopword.


Re: Oracle Text - wildcard expansion too many rows [message #312991 is a reply to message #312751] Thu, 10 April 2008 05:18 Go to previous messageGo to next message
SaraC
Messages: 81
Registered: August 2005
Member
Thanks for your reply Barbara.
I just checked and the database that the Oracle Text query is running on is on 11g now and still with no wildcard_maxterms set, when creating the preferences etc.
Ive checked the tokens created in the index

IN has more than 20000 tokens
CONTAINS(term, '(%in%)', 1) > 0
returns an error - wildcard expansion too many terms

OF has more than 50000 tokens
CONTAINS(term,'(%of%)',1) > 0
returns more than 50,000 records


CONTAINS(term, '(%pain% AND %in% AND %ear%)', 1) > 0
I expect to return only 2or 3 rows but get the expansion error

CONTAINS(term, '(%pain% AND %of% AND %back%)', 1) > 0
Where the %of% has more tokens than the 50,000 max
Including the above statement in my query still manages to return 6 records.

I cant see how the %of% query can return records where part of the search text has to process more than the max records, but the %in% query cant
I presume it could be because '%in%' is part of more words in my dataset that '%of%' is.

Thx
Sara
Re: Oracle Text - wildcard expansion too many rows [message #313031 is a reply to message #312991] Thu, 10 April 2008 07:46 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9101
Registered: November 2002
Location: California, USA
Senior Member
You should set the wildcard_maxterms to the maximum by creating a wordlist preference, setting the wildcard_maxterms attribute, and using the wordlist in your index parameters:

BEGIN
  CTX_DDL.CREATE_PREFERENCE ('your_wordlist', 'BASIC_WORDLIST');
  CTX_DDL.SET_ATTRIBUTE ('your_wordlist', 'WILDCARD_MAXTERMS', 50000);
END;
/
CREATE INDEX your_index ON your_table (term)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('WORDLIST your_wordlist')
/


The expansion is determined by the number of distinct tokens in your index:

 
SELECT COUNT (DISTINCT token_text) FROM dr$your_index$i WHERE INSTR (token_text, 'IN') > 0
/


What you are discovering is why stoplists exist. Why not just use the default stoplist, with the common words like "of" and "in" as stopwords? That would solve your problem. Do you really expect to get different results searching for '%pain% and %ear%' instead of '%pain% and %in% and %ear%';

Re: Oracle Text - wildcard expansion too many rows [message #313917 is a reply to message #312660] Tue, 15 April 2008 03:07 Go to previous messageGo to next message
SaraC
Messages: 81
Registered: August 2005
Member
Thanks Barbara

We have set the wildcard_maxterms to the maximum of 50000 which allows the search to return results.


I realise that the error could still be returned depending on the terms in the database and the search terms, but its less likely now. So its resolved - for now.

I know it would be easier to use the stoplist - but the requirements say I have to search on stopword terms.


Thanks
Re: Oracle Text - wildcard expansion too many rows [message #318249 is a reply to message #312660] Tue, 06 May 2008 04:15 Go to previous messageGo to next message
SaraC
Messages: 81
Registered: August 2005
Member
Another related question.

After changing the wildcard_maxterms to solve the immediate problem, I was asked these questions relating to upping the wildcard_maxterms attribute from default of 20000 to max of 50000.

Will changing this attribute add any additional stress to the db server?

Will it affect the db configuration in any way? - I thought not since it is an attribute of the index and can be changed at index build time.

Will the change result in any additional SGA usage on the db?

Can anyone point me to somewhere where I can find this sort of information please?
Ive had a look in Oracle Text Developers Guide.pdf but couldn't see anything to answer my queries.

Many thanks
Sara

[Updated on: Tue, 06 May 2008 04:19]

Report message to a moderator

Re: Oracle Text - wildcard expansion too many rows [message #323204 is a reply to message #318249] Tue, 27 May 2008 13:33 Go to previous message
quadzilla
Messages: 1
Registered: May 2008
Junior Member
if you can afford the space and resources using the prefix and suffix options can help too.
Previous Topic: about WHERE CONTAINS.
Next Topic: Phrase querying
Goto Forum:
  


Current Time: Thu Dec 26 16:17:31 CST 2024