Checkpoint not complete [message #137649] |
Thu, 15 September 2005 21:48 |
David King
Messages: 8 Registered: December 2004
|
Junior Member |
|
|
Hi gang,
I am in need of some advice. First, I am running Oracle 9.2.0.1 on Solaris 9. On just about a daily basis we perform sqlldr loads that load on the order of 300000 rows. I frequently see in my alert log:
Checkpoint not complete
Current log# 5 seq# 176431
I have 5 redo logs each of 10M in size. If I check what is going on in v$log and correlate to the alert log when it throws the checkpoint error I always notice that I have one current log (which is good) and the rest are in a status of 'ACTIVE'. It seems that when this happens I get the checkpoint error.
What can I do to get rid of this checkpoint error? Should I increase the size of my redo logs? Is there a good way to go about estimating what size redo logs I should have?
Thanks in advance for your help.
David
|
|
|
|
Re: Checkpoint not complete [message #137708 is a reply to message #137700] |
Fri, 16 September 2005 03:26 |
alliejane
Messages: 59 Registered: July 2005 Location: Glasgow
|
Member |
|
|
Not really sure if increasing the size of the redo log files are the solution I would go for.
As Mahesh says, the redo log files can only be reused if Oracle still requires them for instance recovery, that is, he checks the scn# of the oldest block in the checkpoint queue (a list of dirty blocks in the buffer cache) and works out if that number is higher or lower than the "highest scn#" in the redo log file that LGWR is trying to overwrite.. if the number is lower than the "highest scn#" then CKPT prevents the logfile being overwritten, as an instance crash right now will need that information for recovery.. (recovery starts at the point of the oldest dirty block not written to disk at the point of failure.)
The reason I mention all of this is because.. there are a number of solutions.
1. Make DBWR write more aggressively - as you are on 9i the parameter I would use is FAST_START_MTTR_TARGET=(how long you want recovery to take in seconds), the lower that number, the more aggressively DBWR has to write to keep up with the target, the advantage of this is that by the time LGWR comes to overwrite the redo log file, the chances are that DBWR has already written the "high scn#" (and beyond) from the checkpoint q.
-- This disadvantage is that you will get more I/O to your disks.
2. Create more redo log file groups - this will give DBWR more time to write before LGWR tries to overwrite a particular redolog file, again the chances are that the extra (6th) or (7th) group will give CKPT enough time to completely checkpoint beyond the "highest scn#" before that group is again required..
Which one to go for.. well that's up to you and your setup, if you have an I/O bound system then 2. would be better for you, as 1. will just increase your I/O problem, however if physical space is an issue and I/O isn't then 1. might be better (with the added advantage that instance recovery will also be faster).
Sorry for the training session, but as with everything to do with Oracle, there is rarely one solution that apply to everyone...
Allie
|
|
|
|
|
|
|