Home » Server Options » Text & interMedia » Optimization taking more and more time each day (Linux, Oracle10g2)
Optimization taking more and more time each day [message #463188] Wed, 30 June 2010 02:30 Go to next message
leon_buijsman
Messages: 13
Registered: March 2009
Location: Rotterdam
Junior Member

We are running a batch job to fully optimize the OracleText index each night. When we released it, it took 80 minutes to run a stats report, do the optimization, run a second stats report.

This is the resulting end report:
OPTIMISATION PROCESS
---------------------------------------------------------------------
started : 29-JUN-10
elapsed time : 157 minutes and -15 seconds

INDEX GENERIC STATS
---------------------------------------------------------------------
unique tokens : 2,069,739

PRE OPTIMISATION STATS SUMMARY
---------------------------------------------------------------------
size : 692,311,627 (660.24 MB)
garbage docs : 249
index rows : 2,515,887
fragmentation : 12 %

POST OPTIMISATION STATS SUMMARY
---------------------------------------------------------------------
size : 691,916,408 (659.86 MB)
garbage docs : 5
index rows : 2,221,628
fragmentation : 0 %

Now comes the problem: each day the report takes much longer to run. Right now almost 2,5 hours per night. The index itself has grown marginally (5% in size per month), but the time for doing the optimization has doubled last month.

Fragmentation is back to 0% after optimization, but did anybody come across a similar issue? And what can we do to speed this up?
Re: Optimization taking more and more time each day [message #470136 is a reply to message #463188] Sun, 08 August 2010 16:11 Go to previous messageGo to next message
Alien
Messages: 292
Registered: June 1999
Senior Member
It's a late reply. But can can you give more information.
Are you doing a token,fast or full optimize.
What are the specifics of the indexes you're optimizing?
Did you take into account the growth of the base-tables?

Finally, you might want to consider using maxtime on your optimization. It's designed for batch optimization.

Regards,

Arian
Re: Optimization taking more and more time each day [message #470137 is a reply to message #470136] Sun, 08 August 2010 16:30 Go to previous messageGo to next message
leon_buijsman
Messages: 13
Registered: March 2009
Location: Rotterdam
Junior Member

Thanks for your reply, I was just thinking of adding more information to get this one answered!

Q: Are you doing a token,fast or full optimize.
FULL once every night.

Q: What are the specifics of the indexes you're optimizing?
An index on a table containing documents (in blobs) in total amount of 293.000.

Q: Did you take into account the growth of the base-tables?
Not sure what you mean with that. Current index is 700Mb large. Number of documents growing at 10.000 per month and number of tokens around 100.000 per month. The index grows 25Mb per month.

The total time it takes to do full optimize:
june: 30 minutes
july: 60 minutes
august: 80 minutes

Do you have any clue what is making this thing do longer and longer on optimizing. The query response time is OK, by the way.
Re: Optimization taking more and more time each day [message #470138 is a reply to message #470137] Sun, 08 August 2010 16:35 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9101
Registered: November 2002
Location: California, USA
Senior Member
Do you synchronize on commit or before you optimize or what?
Re: Optimization taking more and more time each day [message #470160 is a reply to message #470138] Mon, 09 August 2010 03:00 Go to previous messageGo to next message
leon_buijsman
Messages: 13
Registered: March 2009
Location: Rotterdam
Junior Member

The index is synced every 15 minutes.
Re: Optimization taking more and more time each day [message #470367 is a reply to message #470160] Mon, 09 August 2010 12:08 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9101
Registered: November 2002
Location: California, USA
Senior Member
The more dml and the more frequently that you synchronize, the more fragmentation it causes in your domain index tables and therefore the longer it takes to optimize. Can you either synchronize less often or optimize more often?
Re: Optimization taking more and more time each day [message #470385 is a reply to message #470367] Mon, 09 August 2010 13:32 Go to previous messageGo to next message
leon_buijsman
Messages: 13
Registered: March 2009
Location: Rotterdam
Junior Member

Hi Barbara,

But that hardly explains why it takes longer every day in my view. It would take long, which we would take for granted. My main concern is that the optimize time is going up with the number of tokens and the index size slightly higher.

What I do right now is running ctx_report.index_stats, do optimization, running ctx_report.index_stats. This allows us to see difference in size, fragmentation, etc. Both ctx_report.index_stats are running for an hour as well and there is marginal difference between the first and second run.

But, maybe I am missing a step. What would you do with an index that gets new/updated small Word documents in at a rate of 200 per hour which have to be available for textsearch every 15 minutes?
Re: Optimization taking more and more time each day [message #470393 is a reply to message #470385] Mon, 09 August 2010 14:25 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9101
Registered: November 2002
Location: California, USA
Senior Member
Please read through the following article and see if anything helps. It is old (10g) but most of it still applies:

http://www.oracle.com/technology/products/text/htdocs/Index_Maintenance.html

[Updated on: Mon, 09 August 2010 14:34]

Report message to a moderator

Re: Optimization taking more and more time each day [message #471428 is a reply to message #470393] Sat, 14 August 2010 06:47 Go to previous messageGo to next message
leon_buijsman
Messages: 13
Registered: March 2009
Location: Rotterdam
Junior Member

Hi Barbara,

I have gone through the document. Rebuilding the $X index was a missed step, but did not make any change.

I started to compare our old textindex (which did run the ctx_report.index_starts in 20 minutes compared to 111 minutes last night on similar size index). That index had one major difference: memory settings. These were set at 500Mb and this is what we ran for the new index (i.e. the max default setting):

ctx_ddl.create_preference('WD_CAND_DOC_LOB_TEXT_STO','BASIC_STORAGE');
ctx_ddl.set_attribute('WD_CAND_DOC_LOB_TEXT_STO','I_TABLE_CLAUSE','tablespace WD_I storage (initial 50M)');
ctx_ddl.set_attribute('WD_CAND_DOC_LOB_TEXT_STO','K_TABLE_CLAUSE','tablespace WD_I storage (initial 50M)');
ctx_ddl.set_attribute('WD_CAND_DOC_LOB_TEXT_STO','R_TABLE_CLAUSE','tablespace WD_I storage (initial 50M) lob (data) store as (cache)');
ctx_ddl.set_attribute('WD_CAND_DOC_LOB_TEXT_STO','N_TABLE_CLAUSE','tablespace WD_I storage (initial 50M)');
ctx_ddl.set_attribute('WD_CAND_DOC_LOB_TEXT_STO','I_INDEX_CLAUSE','tablespace WD_I storage (initial 50M) compress 2');
ctx_ddl.set_attribute('WD_CAND_DOC_LOB_TEXT_STO','P_TABLE_CLAUSE','tablespace WD_I storage (initial 50M)');
ctx_ddl.set_attribute('WD_CAND_DOC_LOB_TEXT_STO','I_ROWID_INDEX_CLAUSE','tablespace WD_I storage (initial 50M)');
...
create index wd_cand_doc_lob_text
on wd_cand_doc_lob (doc_blob)
indextype is ctxsys.context
parameters('
format column ORACLETEXT_FMT
datastore WD_CAND_DOC_LOB_TEXT_DST
filter WD_CAND_DOC_LOB_TEXT_FIL
section group WD_CAND_DOC_LOB_TEXT_SGP
lexer WD_CAND_DOC_LOB_TEXT_LEX
wordlist WD_CAND_DOC_LOB_TEXT_WDL
stoplist WD_CAND_DOC_LOB_TEXT_SPL
storage WD_CAND_DOC_LOB_TEXT_STO
memory 100M
')


I want to increase the memory setting to 500Mb again and tried the following statement (on a mirror server):
ALTER INDEX WD_CAND_DOC_LOB_TEXT REBUILD PARAMETERS ('REPLACE MEMORY 500M');

Seems right, but when I ran the ctx_report.describe_index, it seemed unchanged. Is this the right statement?

Re: Optimization taking more and more time each day [message #471469 is a reply to message #471428] Sat, 14 August 2010 17:11 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9101
Registered: November 2002
Location: California, USA
Senior Member
The syntax looks right to me and executes without error. I don't see where ctx_report.describe_index or anything else shows the original or new values anywhere, so that you can confirm the change.

I found an interesting note here:

http://download.oracle.com/docs/cd/E11882_01/text.112/e10944/cddlpkg.htm#i998200

"Optimize in rebuild mode rebuilds the $I table (the inverted list table) in its entirety. Rebuilding an index is often significantly faster than performing a full optimization, and is more likely to result in smaller indexes, especially if the index is heavily fragmented."

So, you might want to try optimizing using rebuilding instead of full and see if that is faster for you.

Re: Optimization taking more and more time each day [message #471752 is a reply to message #471469] Tue, 17 August 2010 03:04 Go to previous messageGo to next message
leon_buijsman
Messages: 13
Registered: March 2009
Location: Rotterdam
Junior Member

Hi Barbara,

We decided to rebuild the index over the weekend. We ended up with an index of the same size, same number of tokens, same number of documents. This morning we ran the first FULL optimization plus the two index_stats reports in 45 minutes total! This compares to 325 minutes three days ago.

My assumption right now: optimization helps to defragment the index, but the whole process will become slower and slower in time and redoing the index seems the only option.
Re: Optimization taking more and more time each day [message #471861 is a reply to message #471752] Tue, 17 August 2010 11:31 Go to previous message
Barbara Boehmer
Messages: 9101
Registered: November 2002
Location: California, USA
Senior Member
Did you do:

alter index index_name rebuild

or:

ctx_ddl.optimize_index ('index_name', 'rebuild') ?

The second is what I was trying to suggest instead of:

ctx_ddl.optimize_index ('index_name', 'full')

which is what I understood you had been doing.

Your assumption is correct that it needs to be rebuilt periodically one way or the other.

[Updated on: Tue, 17 August 2010 11:33]

Report message to a moderator

Previous Topic: PDF documents not indexed by OracleText using auto-filter and format_column
Next Topic: ORA-29900: operator binding does not exist
Goto Forum:
  


Current Time: Thu Dec 26 23:48:45 CST 2024