Yasin Baskan
Stuff about the Oracle database.Yasin Baskanhttp://www.blogger.com/profile/07642821755164831104noreply@blogger.comBlogger67125
Updated: 6 hours 33 min ago
Strange ORA-14404, or not?
I was trying to drop a tablespace which I know there were no segments in it. A simple query from dba_segments returns no rows which means there are no segments allocated in this tablespace. But strangely I got this:
SQL> drop tablespace psapsr3old including contents and datafiles;
drop tablespace psapsr3old including contents and datafiles
*
ERROR at line 1:
ORA-14404: partitioned table contains partitions in a different tablespace
How come I cannot drop a tablespace with no segments in it?
Enter deferred segment creation. Things were simpler on 9i or 10g. The database I get this error on is an 11.2.0.2 database. Starting with 11.2.0.1 things may fool you when it comes to segments in the database. 11.2 brought us the feature called "deferred segment creation" which means no segments are created if you do not insert any data into the tables you created. Have a look at the note "11.2 Database New Feature Deferred Segment Creation [Video] (Doc ID 887962.1)" about this feature. It is there to save disk space in case you have lots of tables without data in them. In 11.2.0.1 it was only for non-partitioned heap tables, starting with 11.2.0.2 it is also used for partitioned tables.
SQL> select segment_created,count(*) from dba_tables
2 where tablespace_name='PSAPSR3OLD'
3 group by segment_created;
SEG COUNT(*)
--- ----------
NO 13482
SQL> select segment_created,count(*) from dba_tab_partitions
2 where tablespace_name='PSAPSR3OLD'
3 group by segment_created;
SEGM COUNT(*)
---- ----------
NO 1237
There are thousands of objects in there.
What is the solution then?
Obviously it is to get rid of these objects by moving them to a different tablespace. The standard "alter table move" and "alter table move partition" commands do the job. Then the question becomes; will a move table operation create the segment in the new tablespace? If you are on 11.2.0.1, yes it will, defeating the whole purpose of this feature. If you are on 11.2.0.2 it will not create the segments. This is explained in the note "Bug 8911160 - ALTER TABLE MOVE creates segment with segment creation deferred table (Doc ID 8911160.8)".
After everything is moved you can safely drop the tablespace without this error.
UPDATE: Gokhan Atil made me aware of Randolf Geist's post about the same issue. See that post here.
SQL> drop tablespace psapsr3old including contents and datafiles;
drop tablespace psapsr3old including contents and datafiles
*
ERROR at line 1:
ORA-14404: partitioned table contains partitions in a different tablespace
How come I cannot drop a tablespace with no segments in it?
Enter deferred segment creation. Things were simpler on 9i or 10g. The database I get this error on is an 11.2.0.2 database. Starting with 11.2.0.1 things may fool you when it comes to segments in the database. 11.2 brought us the feature called "deferred segment creation" which means no segments are created if you do not insert any data into the tables you created. Have a look at the note "11.2 Database New Feature Deferred Segment Creation [Video] (Doc ID 887962.1)" about this feature. It is there to save disk space in case you have lots of tables without data in them. In 11.2.0.1 it was only for non-partitioned heap tables, starting with 11.2.0.2 it is also used for partitioned tables.
Coming back to my problem, even if there were no segments reported in dba_segments there are tables and partitions created in this tablespace without their segments created yet. If we look at the tables and partitions in that tablespace:
2 where tablespace_name='PSAPSR3OLD'
3 group by segment_created;
SEG COUNT(*)
--- ----------
NO 13482
SQL> select segment_created,count(*) from dba_tab_partitions
2 where tablespace_name='PSAPSR3OLD'
3 group by segment_created;
SEGM COUNT(*)
---- ----------
NO 1237
There are thousands of objects in there.
What is the solution then?
Obviously it is to get rid of these objects by moving them to a different tablespace. The standard "alter table move" and "alter table move partition" commands do the job. Then the question becomes; will a move table operation create the segment in the new tablespace? If you are on 11.2.0.1, yes it will, defeating the whole purpose of this feature. If you are on 11.2.0.2 it will not create the segments. This is explained in the note "Bug 8911160 - ALTER TABLE MOVE creates segment with segment creation deferred table (Doc ID 8911160.8)".
After everything is moved you can safely drop the tablespace without this error.
UPDATE: Gokhan Atil made me aware of Randolf Geist's post about the same issue. See that post here.
Direct NFS Clonedb
Direct NFS Clonedb is a feature in 11.2 that you can use to clone your databases. Kevin Closson explains what it is in this post. In his demo videos he is using a perl script to automate the process of generating the necessary scripts. That script is not publicly available as of today but the MOS note 1210656.1 explains how to do the clone manually without the perl script.
Tim Hall also has a step by step guide on how to do the cloning in this post. He also uses the perl script mentioned above.
We have been testing backups and clones on Exadata connected to a 7410 ZFS Storage Appliance, I wanted to share our test on Direct NFS Clonedb. This test is on a quarter rack x2-2 connected to a 7410 storage via Infiniband. A RAC database will be cloned as a single instance database and the clone database will opened in one db node.
Enable Direct NFS on Exadata
For security, as of today default Exadata installation disables some services needed by NFS. To use NFS on Exadata db nodes we enabled those services first.
Tim Hall also has a step by step guide on how to do the cloning in this post. He also uses the perl script mentioned above.
We have been testing backups and clones on Exadata connected to a 7410 ZFS Storage Appliance, I wanted to share our test on Direct NFS Clonedb. This test is on a quarter rack x2-2 connected to a 7410 storage via Infiniband. A RAC database will be cloned as a single instance database and the clone database will opened in one db node.
Enable Direct NFS on Exadata
For security, as of today default Exadata installation disables some services needed by NFS. To use NFS on Exadata db nodes we enabled those services first.
service portmap start
service nfslock start
chkconfig --level 345 portmap on
chkconfig --level 345 nfslock on
Recent Exadata installations come with Direct NFS (dNFS) enabled, you can check if you have it enabled by looking at the database alert log. When the database is started you can see this line in the alert log if you have dNFS enabled.
Oracle instance running with ODM: Oracle Direct NFS ODM Library Version 3.0
If it is not enabled you can use this command after stopping the database to enable it.
make -f $ORACLE_HOME/rdbms/lib/ins_rdbms.mk dnfs_on
Mount the NFS share
I am assuming 7410 is configured and NFS sharing is up on it at this point. To mount the NFS share you can use a mount command like this on Exadata db nodes.
mount 192.168.27.105:/export/backup1 /backup -o rw,bg,hard,intr,rsize=131072,wsize=131072,intr,nodev,nosuid,actimeo=0
Back up the source database
You can use OS copies or RMAN image copies to back up the database for use in the cloning process. Here are the commands we used, do not forget to create the target directory before.
sql 'alter database begin backup';
backup as copy database format '/backup/clone_backup/%U';
sql 'alter database end backup';
Prepare the clone db
To start the clone database we need an init.ora file and a create controlfile script. You can back up the source database's control file to a text file and use it. In the source database run this to get the script, this will produce a script under the udump directory (/u01/app/oracle/diag/rdbms/dbm/dbm1/trace in Exadata).
SQL> alter database backup controlfile to trace;
Database altered.
After editing the script this is the one we can use for the clone database.
CREATE CONTROLFILE REUSE SET DATABASE "clone" RESETLOGS NOARCHIVELOG
MAXLOGFILES 1024
MAXLOGMEMBERS 5
MAXDATAFILES 32767
MAXINSTANCES 32
MAXLOGHISTORY 33012
LOGFILE
GROUP 1 (
'/u01/app/oradata/clone/redo01.log'
) SIZE 4096M BLOCKSIZE 512,
GROUP 2 (
'/u01/app/oradata/clone/redo02.log'
) SIZE 4096M BLOCKSIZE 512,
GROUP 3 (
'/u01/app/oradata/clone/redo03.log'
) SIZE 4096M BLOCKSIZE 512
DATAFILE
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-UNDOTBS1_FNO-3_bnm5ajrp',
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-SYSTEM_FNO-1_blm5ajro',
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-SYSAUX_FNO-2_bmm5ajro',
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-UNDOTBS2_FNO-4_bom5ajrp',
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-GNS_DATA01_FNO-7_bqm5ajrp',
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-5_bpm5ajrp',
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-9_bsm5ajrp',
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-10_btm5ajrp',
'/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-8_brm5ajrp'
CHARACTER SET AL32UTF8
;
/u01/app/oradata/clone is a directory on the local disks, you can also use NFS for redo logs if you want to. The DATAFILE section lists the image copies we have just produced using RMAN. You can get this list using this sql, be careful about the completion time because you may have previous image copies in the same directory.
select name,completion_time from V$BACKUP_COPY_DETAILS;
Now we need an init.ora file, we can just copy the source database's file and edit it.
SQL> create pfile='/backup/clone.ora' from spfile;
File created.
Since the source database is a RAC database you need to remove parameters related to RAC (like cluster_database, etc...). You also need to change the paths to reflect the new clone database, like in the parameter control_files. Here is the control_files parameter in this test.
*.control_files='/u01/app/oradata/clone/control.ctl'
I also use a local directory, not NFS, for the control file.
There is one parameter you need to add when cloning a RAC database to a single instance database.
_no_recovery_through_resetlogs=TRUE
If you do not set this parameter you will get an error when you try to open the clone database with resetlogs. MOS note 334899.1 explains why we need to set this. If you do not set this this is the error you will get when opening the database.
RMAN> sql 'alter database open resetlogs';
sql statement: alter database open resetlogs
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of sql command on default channel at 02/22/2011 16:13:07
RMAN-11003: failure during parse/execution of SQL statement: alter database open resetlogs
ORA-38856: cannot mark instance UNNAMED_INSTANCE_2 (redo thread 2) as enabled
Now we are ready to create the clone database.
Create the clone db
After preparing the init.ora file and the create controlfile script we can create the database.
export ORACLE_SID=clone
SQL> startup nomount pfile='/backup/clone.ora';
ORACLE instance started.
Total System Global Area 3991842816 bytes
Fixed Size 2232648 bytes
Variable Size 754978488 bytes
Database Buffers 3087007744 bytes
Redo Buffers 147623936 bytes
SQL> @cr_control
Control file created.
Now we need to rename the datafiles and point them to a NFS location we want. dbms_dnfs is the package needed for this.
begin
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-UNDOTBS1_FNO-3_bnm5ajrp','/backup/oradata/undotbs1.263.740679581');
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-SYSTEM_FNO-1_blm5ajro','/backup/oradata/system.261.740679559');
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-SYSAUX_FNO-2_bmm5ajro','/backup/oradata/sysaux.262.740679571');
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-UNDOTBS2_FNO-4_bom5ajrp','/backup/oradata/undotbs2.265.740679601');
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-GNS_DATA01_FNO-7_bqm5ajrp','/backup/oradata/gns_data01.264.741356977');
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-5_bpm5ajrp','/backup/oradata/users.266.740679611');
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-9_bsm5ajrp','/backup/oradata/users.274.741357097');
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-10_btm5ajrp','/backup/oradata/users.275.741357121');
dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-8_brm5ajrp','/backup/oradata/users.273.741357075');
end;
/
The first parameter to dbms_dnfs is the backup image copy name we set in the controlfile script, the second parameter is the target filename which should reside in NFS. You can create this script using the following sql on the source database.
select 'dbms_dnfs.clonedb_renamefile('''||b.name||''',''/backup/oradata/'||
substr(d.file_name,instr(d.file_name,'/',-1)+1)||''');'
from v$backup_copy_details b,dba_data_files d
where b.file#=d.file_id
and b.completion_time>sysdate-1/24;
If you have multiple image copies be careful about the completion_time predicate. In this example I am looking at the image copies of the last hour.
If the target location in the dbms_dnfs call is not on NFS here is what you will get:
SQL> exec dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-10_b1m59uem','/u01/app/oradata/clone/users.275.741357121');
BEGIN dbms_dnfs.clonedb_renamefile('/backup/clone_backup/data_D-DBM_I-1222199824_TS-USERS_FNO-10_b1m59uem','/u01/app/oradata/clone/users.275.741357121'); END;
*
ERROR at line 1:
ORA-01511: error in renaming log/data files
ORA-17513: dNFS package call failed
ORA-06512: at "SYS.X$DBMS_DNFS", line 10
ORA-06512: at line 1
I got this error in my first try and searched in MOS and Google for some time with no results. Then I realized that my target locations was not on NFS but was on local disk.
As we renamed the datafiles and they are pointing at the NFS location, now we can open the clone database.
SQL> alter database open resetlogs;
Database altered.
At this point we have a clone database ready to use.
The target directory we used shows our files.
[oracle@dm01db01 oradata]$ ls -l
total 98523
-rw-r-----+ 1 oracle dba 10737426432 Feb 22 16:44 gns_data01.264.741356977
-rw-r-----+ 1 oracle dba 17179877376 Feb 22 16:44 sysaux.262.740679571
-rw-r-----+ 1 oracle dba 17179877376 Feb 22 16:44 system.261.740679559
-rw-r-----+ 1 oracle dba 17179877376 Feb 22 16:44 undotbs1.263.740679581
-rw-r-----+ 1 oracle dba 17179877376 Feb 22 16:44 undotbs2.265.740679601
-rw-r-----+ 1 oracle dba 10737426432 Feb 22 16:44 users.266.740679611
-rw-r-----+ 1 oracle dba 2147491840 Feb 22 16:44 users.273.741357075
-rw-r-----+ 1 oracle dba 2147491840 Feb 22 16:44 users.274.741357097
-rw-r-----+ 1 oracle dba 2147491840 Feb 22 16:44 users.275.741357121Even though ls shows their size equal to the source database file sizes, du shows us the truth.
[oracle@dm01db01 oradata]$ du -hs *
129K gns_data01.264.741356977
74M sysaux.262.740679571
2.8M system.261.740679559
15M undotbs1.263.740679581
5.4M undotbs2.265.740679601
129K users.266.740679611
129K users.273.741357075
129K users.274.741357097
129K users.275.741357121The datafile sizes are minimal since we did not do any write activity on the clone database yet. These sizes will get bigger after some activity.
Database smart flash cache wait events
When you enable the database smart flash cache and start using it you will see new wait events related to that. These events help to find out if the problem is about the flash cache or not.
The ones I faced till now are "db flash cache single block physical read", "db flash cache multiblock physical read" and "write complete waits: flash cache". These are from a 11.2.0.1 database using the F5100 flash array as the database smart flash cache.
db flash cache single block physical read
"db flash cache single block physical read" is the flash cache equivalent of "db file sequential read". Read waits from the flash cache are not accounted for in the "db file sequential read" event and have their own wait event. The following is from an AWR report of 30 mins from a database using the database smart flash cache.
There are over 11 million "db flash cache single block physical read" waits which took about 0.44ms on average (AWR reports it as 0ms). "db file sequential read" waits are over 600,000. This means we had a high flash cache hit ratio, most of the reads were coming from the flash cache, not the disks.
This is the wait event histogram from the same AWR report.
99% of all flash cache single block reads were under 1ms, none of them are over 4ms.
db flash cache multiblock physical read
"db flash cache multiblock physical read" is the flash cache equivalent of "db file scattered read". It is the event we see when we are reading multiple blocks from the flash cache. The AWR report I am using in this post does not contain many multiblock operations but here is the event from the foreground wait events section.
We had 1048 waits of 1ms on average.
The wait event histogram shows that again most of the waits are below 1ms with some being up to 16ms.
write complete waits: flash cache
This is the wait event we see when DBWR is trying to write a block from the buffer cache to the flash cache and a session wants to modify that block. The session waits on this event and goes on with the update after DBWR finishes his job. You can find a good explanation and a good example for this in Guy Harrison's post. Here is the event and its histogram in the same AWR report I have.
I had 345 waits in 30 mins with an average time of 4ms. But the important thing is the contribution of this event to DB time was only 1 second in a 30 minute workload.
You can see that this event starts climbing up in some situations, especially if you are having a problem with your flash cache device and writes to it start to take longer.
Here is one case where poor DBWR processes are struggling to write to the flash cache. Database control was showing high waits in the "Configuration" class and those waits were "write complete waits", "free buffer waits" and "write complete waits: flash cache". This was all because my flash cache device was a single conventional HDD, not even an SSD. After changing the db_flash_cache_file parameter to use the F5100 array this picture was not seen anymore.
The ones I faced till now are "db flash cache single block physical read", "db flash cache multiblock physical read" and "write complete waits: flash cache". These are from a 11.2.0.1 database using the F5100 flash array as the database smart flash cache.
db flash cache single block physical read
"db flash cache single block physical read" is the flash cache equivalent of "db file sequential read". Read waits from the flash cache are not accounted for in the "db file sequential read" event and have their own wait event. The following is from an AWR report of 30 mins from a database using the database smart flash cache.
Top 5 Timed Foreground Events
There are over 11 million "db flash cache single block physical read" waits which took about 0.44ms on average (AWR reports it as 0ms). "db file sequential read" waits are over 600,000. This means we had a high flash cache hit ratio, most of the reads were coming from the flash cache, not the disks.
This is the wait event histogram from the same AWR report.
% of Waits
Event
Total Waits
<1ms
<2ms
<4ms
<8ms
<16ms
<32ms
<=1s
>1s
db flash cache single block physical read
11.2M
99.0
.9
.1
.0
.0
.0
.0
99% of all flash cache single block reads were under 1ms, none of them are over 4ms.
db flash cache multiblock physical read
"db flash cache multiblock physical read" is the flash cache equivalent of "db file scattered read". It is the event we see when we are reading multiple blocks from the flash cache. The AWR report I am using in this post does not contain many multiblock operations but here is the event from the foreground wait events section.
Event
Waits
%Time -outs
Total Wait Time (s)
Avg wait (ms)
Waits /txn
% DB time
db flash cache multiblock physical read
1,048
0
1
1
0.00
0.00
We had 1048 waits of 1ms on average.
% of Waits
Event
Total Waits
<1ms
<2ms
<4ms
<8ms
<16ms
<32ms
<=1s
>1s
db flash cache multiblock physical read
1171
83.2
12.4
3.2
.9
.3
The wait event histogram shows that again most of the waits are below 1ms with some being up to 16ms.
write complete waits: flash cache
This is the wait event we see when DBWR is trying to write a block from the buffer cache to the flash cache and a session wants to modify that block. The session waits on this event and goes on with the update after DBWR finishes his job. You can find a good explanation and a good example for this in Guy Harrison's post. Here is the event and its histogram in the same AWR report I have.
Event
Waits
%Time -outs
Total Wait Time (s)
Avg wait (ms)
Waits /txn
% DB time
write complete waits: flash cache
345
0
1
4
0.00
0.00
% of Waits
Event
Total Waits
<1ms
<2ms
<4ms
<8ms
<16ms
<32ms
<=1s
>1s
write complete waits: flash cache
345
22.9
23.2
27.2
19.7
5.5
.3
1.2
I had 345 waits in 30 mins with an average time of 4ms. But the important thing is the contribution of this event to DB time was only 1 second in a 30 minute workload.
You can see that this event starts climbing up in some situations, especially if you are having a problem with your flash cache device and writes to it start to take longer.
Here is one case where poor DBWR processes are struggling to write to the flash cache. Database control was showing high waits in the "Configuration" class and those waits were "write complete waits", "free buffer waits" and "write complete waits: flash cache". This was all because my flash cache device was a single conventional HDD, not even an SSD. After changing the db_flash_cache_file parameter to use the F5100 array this picture was not seen anymore.
Sizing the SGA or the buffer cache when enabling the database flash cache
The documentation about using the database flash cache feature recommends increasing db_cache_size or sga_target or memory_target, whichever you are using, to account for the metadata for the blocks kept in the flash cache. This is because for each block in the flash cache, some metadata is kept in the buffer cache. The recommendation is to add 100 bytes for a single instance database and 200 bytes for RAC multiplied by the number of blocks that can fit in the flash cache. So you need to calculate how many blocks the flash cache can keep (by dividing db_flash_cache_size in bytes with the block size) and increase the buffer cache.
If you do not do this and the buffer cache size is too low, it is automatically increased to cater for this metadata. It also means you will have less space for actual data blocks if you do not increase your buffer cache size.
I happened to learn this adjustment by chance and it gave me a chance to calculate exactly how much space the flash cache metadata needs.
This is on a single instance 11.2.0.1 database on an M4000 server running Solaris 10.
I start with db_cache_size=320m (I am not using sga_target or memory_target because I want to control the size of the buffer cache explicitly) and db_flash_cache_size=20g, the instance starts up without any errors or warnings but the alert log shows:
The value of parameter db_cache_size is below the required minimum
The new value is 4MB multiplied by the number of cpus plus the memory required for the L2 cache.
WARNING: More than half the granules are being allocated to the L2 cache when db_cache_size is set to 335544320. Either decrease the size of L2 cache, or increase the pool size to
671088640
If you look at the db_cache_size at this point it shows 448m. The database automatically increased 320m to 448m. It also warns me that most of this space will be used for the flash cache metadata. This is a server with 32 CPUs (cores actually) so I multiply this by 4m, it makes 128m which is the space that will be used for actual data blocks. The remaining 320m will be used for the flash cache metadata. I have 20g of flash cache and my block size is 8K, this means 2,621,440 blocks can fit in there. Let's see how much space is needed for the metadata on one block, since I have 320m for the metadata I convert it to bytes and divide by the number of blocks, 320*1024*1024/2621440, which gives me 128 bytes.
The documentation states 100 bytes for a single instance database but it is actually a little bit higher.
Another case to verify. This time I start with db_cache_size=448m and db_flash_cache_size=60g. Similar messages are written to the alert log again.
The value of parameter db_cache_size is below the required minimum
The new value is 4MB multiplied by the number of cpus plus the memory required for the L2 cache.
WARNING: More than half the granules are being allocated to the L2 cache when db_cache_size is set to 469762048. Either decrease the size of L2 cache, or increase the pool size to
2013265920
When I look at db_cache_size now I see that it is increased to 1088m.
Of the 1088m buffer cache, again 128m will be used for data blocks, the remaining 960m is for the flash cache metadata. 60g of flash can hold 7,864,320 blocks, doing the math again tells me that the metadata for a single block is again 128 bytes.
If you are starting with a small buffer cache, remember to check the alert log and the current size of the buffer cache. If it is already high and you do not see any adjustments be aware that you will use 128 bytes for the metadata for each block. This means you will have less memory for data blocks. It is a good practice to calculate this need beforehand and size the buffer cache accordingly.
If you do not do this and the buffer cache size is too low, it is automatically increased to cater for this metadata. It also means you will have less space for actual data blocks if you do not increase your buffer cache size.
I happened to learn this adjustment by chance and it gave me a chance to calculate exactly how much space the flash cache metadata needs.
This is on a single instance 11.2.0.1 database on an M4000 server running Solaris 10.
I start with db_cache_size=320m (I am not using sga_target or memory_target because I want to control the size of the buffer cache explicitly) and db_flash_cache_size=20g, the instance starts up without any errors or warnings but the alert log shows:
The value of parameter db_cache_size is below the required minimum
The new value is 4MB multiplied by the number of cpus plus the memory required for the L2 cache.
WARNING: More than half the granules are being allocated to the L2 cache when db_cache_size is set to 335544320. Either decrease the size of L2 cache, or increase the pool size to
671088640
If you look at the db_cache_size at this point it shows 448m. The database automatically increased 320m to 448m. It also warns me that most of this space will be used for the flash cache metadata. This is a server with 32 CPUs (cores actually) so I multiply this by 4m, it makes 128m which is the space that will be used for actual data blocks. The remaining 320m will be used for the flash cache metadata. I have 20g of flash cache and my block size is 8K, this means 2,621,440 blocks can fit in there. Let's see how much space is needed for the metadata on one block, since I have 320m for the metadata I convert it to bytes and divide by the number of blocks, 320*1024*1024/2621440, which gives me 128 bytes.
The documentation states 100 bytes for a single instance database but it is actually a little bit higher.
Another case to verify. This time I start with db_cache_size=448m and db_flash_cache_size=60g. Similar messages are written to the alert log again.
The value of parameter db_cache_size is below the required minimum
The new value is 4MB multiplied by the number of cpus plus the memory required for the L2 cache.
WARNING: More than half the granules are being allocated to the L2 cache when db_cache_size is set to 469762048. Either decrease the size of L2 cache, or increase the pool size to
2013265920
When I look at db_cache_size now I see that it is increased to 1088m.
Of the 1088m buffer cache, again 128m will be used for data blocks, the remaining 960m is for the flash cache metadata. 60g of flash can hold 7,864,320 blocks, doing the math again tells me that the metadata for a single block is again 128 bytes.
If you are starting with a small buffer cache, remember to check the alert log and the current size of the buffer cache. If it is already high and you do not see any adjustments be aware that you will use 128 bytes for the metadata for each block. This means you will have less memory for data blocks. It is a good practice to calculate this need beforehand and size the buffer cache accordingly.
Using an ASM diskgroup as flash cache
We have been testing the F5100 flash array in our humble lab (borrowed that term from a colleague, he knows who he is). There are two ways to use it, one is to place your datafiles on it, the other is to use it as the database flash cache.
The database flash cache feature came with 11.2 and is a way to extend the SGA. It is not the same thing as the flash cache in Exadata, read Kevin Closson's this post to find out what the difference is. F5100 is one of the products you can use as the flash cache, the other is the F20 card.
It is possible and (may be the best option) to use ASM to configure the flash cache. The documentation states that you can use a filename or an ASM diskgroup name for db_flash_cache_file parameter which is the parameter to enable the flash cache feature.
So, how do we do this?
The first step is creating an ASM diskgroup on the flash devices. In my test, for simplicity I use one flash device which is /dev/rdsk/c1t9d0s0. This is just one flash module (of size 24GB) from F5100. The process for creating the diskgroup is no different than creating a diskgroup on conventional disks, just make sure your asm_diskstring includes the flash device path. By using asmca I created a diskgroup named FLASH using external redundancy on this single flash device.
Then following the documentation I set the parameters to enable the flash cache.
SQL> alter system set db_flash_cache_size=20G scope=spfile;
System altered.
A problem and a lesson
The database flash cache feature came with 11.2 and is a way to extend the SGA. It is not the same thing as the flash cache in Exadata, read Kevin Closson's this post to find out what the difference is. F5100 is one of the products you can use as the flash cache, the other is the F20 card.
It is possible and (may be the best option) to use ASM to configure the flash cache. The documentation states that you can use a filename or an ASM diskgroup name for db_flash_cache_file parameter which is the parameter to enable the flash cache feature.
So, how do we do this?
The first step is creating an ASM diskgroup on the flash devices. In my test, for simplicity I use one flash device which is /dev/rdsk/c1t9d0s0. This is just one flash module (of size 24GB) from F5100. The process for creating the diskgroup is no different than creating a diskgroup on conventional disks, just make sure your asm_diskstring includes the flash device path. By using asmca I created a diskgroup named FLASH using external redundancy on this single flash device.
Then following the documentation I set the parameters to enable the flash cache.
SQL> alter system set db_flash_cache_size=20G scope=spfile;
System altered.
SQL> alter system set db_flash_cache_file='+FLASH' scope=spfile;
System altered.
Now time to restart to make the new parameters effective.
SQL> startup force;
ORACLE instance started.
Total System Global Area 2606465024 bytes
Fixed Size 2150840 bytes
Variable Size 2113932872 bytes
Database Buffers 469762048 bytes
Redo Buffers 20619264 bytes
Database mounted.
ORA-03113: end-of-file on communication channel
Process ID: 17560
Session ID: 2471 Serial number: 3
The instance fails to start. Looking at the alert log file we see this.
Errors in file /u01/app/oracle/diag/rdbms/flash/flash/trace/flash_dbw0_18053.trc:
ORA-01565: error in identifying file '+FLASH'
ORA-17503: ksfdopn:2 Failed to open file +FLASH
ORA-15045: ASM file name '+FLASH' is not in reference form
DBW0 (ospid: 18053): terminating the instance due to error 1565
So it seems that we need to specify the actual file name, not the name of the diskgroup. Let's see what we have in the diskgroup FLASH using amscmd.
ASMCMD> pwd
+FLASH/FLASH/FLASHFILE
ASMCMD> ls
bufpool#2147472020.256.739041793
Even if we got an error when starting the instance there is a file created under +FLASH/FLASH/FLASHFILE. The first FLASH in the path is the diskgroup name, the second is my database name.
What we need to do is set this file as the db_flash_cache_file parameter to be able to start the instance with flash cache enabled. I edit my text init.ora file and change the parameter.
db_flash_cache_file='+FLASH/FLASH/FLASHFILE/bufpool#2147472020.256.739041793'
Now we can start the instance.
SQL> create spfile from pfile='/tmp/x.ora';
File created.
SQL> startup
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.
Total System Global Area 2606465024 bytes
Fixed Size 2150840 bytes
Variable Size 2113932872 bytes
Database Buffers 469762048 bytes
Redo Buffers 20619264 bytes
Database mounted.
Database opened.
SQL> show parameter flash_cache
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_flash_cache_file string +FLASH/flash/flashfile/bufpool
#2147472020.256.739041793
db_flash_cache_size big integer 20G
The instance is up and I can see that the flash cache is enabled and using the correct ASM diskgroup.
The documentation states that we can use the diskgroup name but as we saw it needs some correction.
A problem and a lesson
The path I followed to this point is a little embarrassing and teaches a lesson about setting parameters.
At first I used db_flash_cache_file='FLASH' assuming it would use the ASM diskgroup. After restarting the instance I immediately started a workload on the database to see the effects of the flash cache. Here is what I saw in the Enterprise Manager performance page.
The waits of class "Configuration" were holding back the database. When I clicked on "Configuration" link I saw this.
The system was mostly waiting on "free buffer waits", "write complete waits" and "write complete waits: flash cache". This is because of DBWR. Someone has to write the blocks from the buffer cache to the flash when the buffer cache is full. This process is DBWR. Since I love using the DTrace toolkit I used the iosnoop script in that toolkit to find out what DBWR was doing. iosnoop can show you what file a process reading from or writing to most. So I ran iosnoop for one of the DBWR processes.
bash-3.00# ./iosnoop -p 16631
UID PID D BLOCK SIZE COMM PATHNAME
101 16631 W 175311 8704 oracle
101 16631 W 246883104 8192 oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH
101 16631 W 246883104 8192 oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH
101 16631 W 175328 8704 oracle
101 16631 W 246883120 8192 oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH
101 16631 W 246883120 8192 oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH
101 16631 W 175345 8704 oracle
101 16631 W 246883136 8192 oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH
101 16631 W 246883136 8192 oracle /u01/app/oracle/product/11.2.0/dbhome_1/dbs/FLASH
The file it is trying to write to is $ORACLE_HOME/dbs/FLASH and since that directory is on a slow local disk it is taking a long time and causing the waits.
If I had looked at the db_flash_cache_file parameter after restarting the database to see if it was alright I would have seen this before starting the workload. So, once more it teaches to check if the parameter was set the way you want it before taking further action.
What does ISDEFAULT mean in v$parameter?
There is a common misunderstanding among DBAs about the column ISDEFAULT in the view v$parameter. Some think that when this column is TRUE it means the current value of the parameter is the default value. This leads to wrong conclusions and sometimes wrong settings for even production environments.
The documentation says this about this column:
"Indicates whether the parameter is set to the default value (
This explanation is not a clear one and different people may understand different things from it.
This column is dependent on the setting in the parameter file when the instance is started. It does not show if the current value is the default or not. It only shows if the parameter is set in the parameter file or not. When it is TRUE it means that you did not set this in the parameter file when starting the instance. When it is FALSE it means this parameter was set in the parameter file when starting the instance.
Here is a real life case about this. When changing a few parameters an Exadata DBA accidentally sets the parameter cell_offload_processing to FALSE using an alter system command. When he tries to take back the settings he looks at v$parameter.ISDEFAULT to find out if cell_offload_processing=FALSE is the default setting. He sees that ISDEFAULT=TRUE and arrives at the wrong conclusion that cell_offload_processing=FALSE is the default value and leaves the parameter that way. This causes all Exadata storage offloading to be disabled and may cause query times to go over the roof.
Let's look at this with an example on 11.2.0.2. This is from a database created with default parameters.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
2 where name='cell_offload_processing';
NAME VALUE ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing TRUE TRUE FALSE
We see that the parameter is set to TRUE, ISMODIFIED is false meaning we did not modify the parameter and ISDEFAULT is TRUE meaning we did not set this parameter in the parameter file.
Now let's change it.
SQL> alter system set cell_offload_processing=FALSE;
System altered.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
2 where name='cell_offload_processing';
NAME VALUE ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing FALSE TRUE SYSTEM_MOD
After we set it FALSE we see that ISMODIFIED reflected the change, but ISDEFAULT is still TRUE. From this if a DBA concludes that this is the default value and takes action based on that, the result will be wrong. As we did not set this parameter in the parameter file when starting the instance, ISDEFAULT still shows TRUE.
Let's do a restart and look at v$parameter again.
SQL> startup force;
ORACLE instance started.
Total System Global Area 839282688 bytes
Fixed Size 2231128 bytes
Variable Size 583009448 bytes
Database Buffers 251658240 bytes
Redo Buffers 2383872 bytes
Database mounted.
Database opened.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
2 where name='cell_offload_processing';
NAME VALUE ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing FALSE FALSE FALSE
Now we see that ISDEFAULT is FALSE because the parameter was in the parameter file.
What if we set it the parameter to the default value.
SQL> alter system set cell_offload_processing=TRUE;
System altered.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
2 where name='cell_offload_processing';
NAME VALUE ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing TRUE FALSE SYSTEM_MOD
As you see even if the parameter has the default value now, ISDEFAULT still shows FALSE.
Again let's restart and look again.
SQL> startup force;
ORACLE instance started.
Total System Global Area 839282688 bytes
Fixed Size 2231128 bytes
Variable Size 583009448 bytes
Database Buffers 251658240 bytes
Redo Buffers 2383872 bytes
Database mounted.
Database opened.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
2 where name='cell_offload_processing';
NAME VALUE ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing TRUE FALSE FALSE
The parameter has the default value and ISDEFAULT is still FALSE after a restart. This is again because even if we set the parameter to the default value it was in the parameter file anyway.
What's the correct way to fix this? If we reset the parameter instead of setting it explicitly it will revert back to the default value.
SQL> alter system reset cell_offload_processing;
System altered.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
2 where name='cell_offload_processing';
NAME VALUE ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing TRUE FALSE FALSE
SQL> startup force;
ORACLE instance started.
Total System Global Area 839282688 bytes
Fixed Size 2231128 bytes
Variable Size 583009448 bytes
Database Buffers 251658240 bytes
Redo Buffers 2383872 bytes
Database mounted.
Database opened.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
2 where name='cell_offload_processing';
NAME VALUE ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing TRUE TRUE FALSE
Now we are back to the defaults as the reset command removed the parameter from the parameter file.
So, do not count on the ISDEFAULT column when you are trying to find if the current value of a parameter is the default value or not. The documentation is the most reliable source to find out the default values of the parameters.
The documentation says this about this column:
"Indicates whether the parameter is set to the default value (
TRUE
) or the parameter value was specified in the parameter file (FALSE
)"This explanation is not a clear one and different people may understand different things from it.
This column is dependent on the setting in the parameter file when the instance is started. It does not show if the current value is the default or not. It only shows if the parameter is set in the parameter file or not. When it is TRUE it means that you did not set this in the parameter file when starting the instance. When it is FALSE it means this parameter was set in the parameter file when starting the instance.
Here is a real life case about this. When changing a few parameters an Exadata DBA accidentally sets the parameter cell_offload_processing to FALSE using an alter system command. When he tries to take back the settings he looks at v$parameter.ISDEFAULT to find out if cell_offload_processing=FALSE is the default setting. He sees that ISDEFAULT=TRUE and arrives at the wrong conclusion that cell_offload_processing=FALSE is the default value and leaves the parameter that way. This causes all Exadata storage offloading to be disabled and may cause query times to go over the roof.
Let's look at this with an example on 11.2.0.2. This is from a database created with default parameters.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
2 where name='cell_offload_processing';
NAME VALUE ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing TRUE TRUE FALSE
We see that the parameter is set to TRUE, ISMODIFIED is false meaning we did not modify the parameter and ISDEFAULT is TRUE meaning we did not set this parameter in the parameter file.
Now let's change it.
SQL> alter system set cell_offload_processing=FALSE;
System altered.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
2 where name='cell_offload_processing';
NAME VALUE ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing FALSE TRUE SYSTEM_MOD
After we set it FALSE we see that ISMODIFIED reflected the change, but ISDEFAULT is still TRUE. From this if a DBA concludes that this is the default value and takes action based on that, the result will be wrong. As we did not set this parameter in the parameter file when starting the instance, ISDEFAULT still shows TRUE.
Let's do a restart and look at v$parameter again.
SQL> startup force;
ORACLE instance started.
Total System Global Area 839282688 bytes
Fixed Size 2231128 bytes
Variable Size 583009448 bytes
Database Buffers 251658240 bytes
Redo Buffers 2383872 bytes
Database mounted.
Database opened.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
2 where name='cell_offload_processing';
NAME VALUE ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing FALSE FALSE FALSE
Now we see that ISDEFAULT is FALSE because the parameter was in the parameter file.
What if we set it the parameter to the default value.
SQL> alter system set cell_offload_processing=TRUE;
System altered.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
2 where name='cell_offload_processing';
NAME VALUE ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing TRUE FALSE SYSTEM_MOD
As you see even if the parameter has the default value now, ISDEFAULT still shows FALSE.
Again let's restart and look again.
SQL> startup force;
ORACLE instance started.
Total System Global Area 839282688 bytes
Fixed Size 2231128 bytes
Variable Size 583009448 bytes
Database Buffers 251658240 bytes
Redo Buffers 2383872 bytes
Database mounted.
Database opened.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
2 where name='cell_offload_processing';
NAME VALUE ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing TRUE FALSE FALSE
The parameter has the default value and ISDEFAULT is still FALSE after a restart. This is again because even if we set the parameter to the default value it was in the parameter file anyway.
What's the correct way to fix this? If we reset the parameter instead of setting it explicitly it will revert back to the default value.
SQL> alter system reset cell_offload_processing;
System altered.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
2 where name='cell_offload_processing';
NAME VALUE ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing TRUE FALSE FALSE
SQL> startup force;
ORACLE instance started.
Total System Global Area 839282688 bytes
Fixed Size 2231128 bytes
Variable Size 583009448 bytes
Database Buffers 251658240 bytes
Redo Buffers 2383872 bytes
Database mounted.
Database opened.
SQL> select name,VALUE,ISDEFAULT,ISMODIFIED from v$parameter
2 where name='cell_offload_processing';
NAME VALUE ISDEFAULT ISMODIFIED
------------------------------ ------------------------------ --------- ----------
cell_offload_processing TRUE TRUE FALSE
Now we are back to the defaults as the reset command removed the parameter from the parameter file.
So, do not count on the ISDEFAULT column when you are trying to find if the current value of a parameter is the default value or not. The documentation is the most reliable source to find out the default values of the parameters.
Creating listeners on the private network interfaces of RAC nodes or use that Exadata Infiniband network
In general usage, everyone configures tns listeners on the public interfaces of RAC nodes so that clients can connect through the public network. Conventionally the private network interfaces are used for the interconnect traffic so most people do not open them to the application because the application traffic may interfere with the interconnect messaging.
But what if we have a high-speed and high-bandwidth interconnect network that some applications can also use for fast communication to the nodes? Can we create tns listeners on the private interfaces, if we can how? This high-speed interconnect network is especially true for Exadata where we have an Infiniband private network that is used for RAC interconnect and for the communication between the storage nodes and the database nodes. Since it is a high-speed low-latency network the application servers or ETL servers can connect to that network and use it for fast data transfer.
To do this we need to create tns listeners on the private interfaces so that clients can connect through sqlnet. The following steps will show how I did this. I did this configuration on a half rack Database Machine but since I do not have access to that now the example here is based on a test Virtualbox system (which runs 11.2.0.2 RAC) so Infiniband is not here but the steps are the same for Exadata. Also DNS is not used, all addresses are defined in the /etc/hosts files of the nodes.
Here is my /etc/hosts file for this setup.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
# Public
192.168.56.2 rac1.localdomain rac1
192.168.56.3 rac2.localdomain rac2
# Private
192.168.0.101 rac1-priv.localdomain rac1-priv
192.168.0.102 rac2-priv.localdomain rac2-priv
# Virtual
192.168.56.4 rac1-vip.localdomain rac1-vip
192.168.56.5 rac2-vip.localdomain rac2-vip
# SCAN
192.168.56.6 rac-scan.localdomain rac-scan
I have two nodes, rac1 and rac2. The private network is 192.168.0 and the public network is 192.168.56. After the default installation I have the default listeners created on the public interfaces, there is only one SCAN address as this is a test setup.
Assume I have an ETL server that is connected to the private network which needs to connect to the database through the private interface. What I need is a listener per node that is listening on the private IP addresses.
If you start with netca ( from the grid home because that is where the listeners run on 11.2) and try to create the listeners you will see that you will not be able to select the private network.
It will show only the public network because this selection is based on your virtual IP definitions. Since I do not have any VIPs on the private network I do not see it.
So the first step is to create VIPs on the private interfaces. I start by adding the new VIPs to the /etc/hosts files. These lines should be added to both nodes' /etc/hosts file.
# Private VIPs
192.168.0.103 rac1-ib.localdomain rac1-ib
192.168.0.104 rac2-ib.localdomain rac2-ib
With root I run "srvctl add vip" from the grid home to create the VIPs.
[root@rac1 ~]# cd /u01/app/11.2.0/grid/bin/
[root@rac1 bin]# ./srvctl add vip -n rac1 -A 192.168.0.103/255.255.255.0/eth1 -k 2
[root@rac1 bin]# ./srvctl add vip -n rac2 -A 192.168.0.104/255.255.255.0/eth1 -k 2
I created a VIP for each node on the eth1 interface which is the private interface and I have specified "-k 2" to indicate that the network number is 2. You can use "srvctl add vip -h" to see what options you have.
At this step if you look at the "ifconfig -a" output you will not see the new VIPs up because we have not started them up yet. Let's start them now.
[root@rac1 bin]# ./srvctl start vip -i rac1-ib
Now you will see the new IP up in the "ifconfig -a" output. This is the related line.
eth1:2 Link encap:Ethernet HWaddr 08:00:27:8B:69:FB
inet addr:192.168.0.103 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
We need to start the new VIP for the second node also. Go to the second node and repeat the same for that.
[root@rac2 bin]# ./srvctl start vip -i rac2-ib
We can now start netca and we see the new subnet there. You can create the listener by selecting the new subnet. I named mine as LISTENER_IB and chose 1522 as the port number.
After netca completes here is what I see on rac1.
[oracle@rac1 ~]$ lsnrctl status listener_ib
LSNRCTL for Linux: Version 11.2.0.2.0 - Production on 24-DEC-2010 13:36:20
Copyright (c) 1991, 2010, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_IB)))
STATUS of the LISTENER
------------------------
Alias LISTENER_IB
Version TNSLSNR for Linux: Version 11.2.0.2.0 - Production
Start Date 24-DEC-2010 13:34:46
Uptime 0 days 0 hr. 1 min. 35 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File /u01/app/oracle/diag/tnslsnr/rac1/listener_ib/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_IB)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.103)(PORT=1522)))
The listener supports no services
The command completed successfully
The listener is up and running but it does support any services yet. Here is my init.ora parameters related to the listener.
SQL> show parameter listen
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
listener_networks string
local_listener string (DESCRIPTION=(ADDRESS_LIST=(AD
DRESS=(PROTOCOL=TCP)(HOST=192.
168.56.4)(PORT=1521))))
remote_listener string rac-scan:1521
I can change the local_listener parameter and add this listener so that my database registers with it. After that as the default installation sets the SCAN listener as the remote_listener SCAN will be able to direct connections to this listener as well. But, there is a problem with this. What happens if SCAN directs the connection from the ETL server to the public interface instead of the private one? Or vice-versa, what happens if it directs connections from the public network clients to the private interface? Users will get errors because they cannot reach the private network and the ETL server will get errors because it cannot reach the public network.
The correct way to register my database to the listeners is to use the listener_networks parameter. listener_networks is a new 11.2 parameter and serves the purpose of cross-registration when you have listeners on multiple networks. Basically with it you can say, "register my local listeners on the public interfaces to the SCAN listener, register my local listeners on the private interface to each other".
This way clients using SCAN will connect to the public interfaces and clients using the private network will connect to the private interfaces. Let's do it now.
Not to clutter the listener parameters with long tns descriptions let's add the definitions to the tnsnames.ora file and use the names instead. On both nodes's tnsnames.ora file residing in the database home I add these lines. Remember to change the hostnames to rac2 for the ones other ORCL_IB when editing the file on rac2.
ORCL_IB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = rac1-ib)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = rac2-ib)(PORT = 1522))
)
ORCL_PUBLIC_LOCAL=
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = rac1-vip)(PORT = 1521))
)
ORCL_PRIVATE_LOCAL=
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = rac1-ib)(PORT = 1522))
)
But what if we have a high-speed and high-bandwidth interconnect network that some applications can also use for fast communication to the nodes? Can we create tns listeners on the private interfaces, if we can how? This high-speed interconnect network is especially true for Exadata where we have an Infiniband private network that is used for RAC interconnect and for the communication between the storage nodes and the database nodes. Since it is a high-speed low-latency network the application servers or ETL servers can connect to that network and use it for fast data transfer.
To do this we need to create tns listeners on the private interfaces so that clients can connect through sqlnet. The following steps will show how I did this. I did this configuration on a half rack Database Machine but since I do not have access to that now the example here is based on a test Virtualbox system (which runs 11.2.0.2 RAC) so Infiniband is not here but the steps are the same for Exadata. Also DNS is not used, all addresses are defined in the /etc/hosts files of the nodes.
Here is my /etc/hosts file for this setup.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
# Public
192.168.56.2 rac1.localdomain rac1
192.168.56.3 rac2.localdomain rac2
# Private
192.168.0.101 rac1-priv.localdomain rac1-priv
192.168.0.102 rac2-priv.localdomain rac2-priv
# Virtual
192.168.56.4 rac1-vip.localdomain rac1-vip
192.168.56.5 rac2-vip.localdomain rac2-vip
# SCAN
192.168.56.6 rac-scan.localdomain rac-scan
I have two nodes, rac1 and rac2. The private network is 192.168.0 and the public network is 192.168.56. After the default installation I have the default listeners created on the public interfaces, there is only one SCAN address as this is a test setup.
Assume I have an ETL server that is connected to the private network which needs to connect to the database through the private interface. What I need is a listener per node that is listening on the private IP addresses.
If you start with netca ( from the grid home because that is where the listeners run on 11.2) and try to create the listeners you will see that you will not be able to select the private network.
It will show only the public network because this selection is based on your virtual IP definitions. Since I do not have any VIPs on the private network I do not see it.
So the first step is to create VIPs on the private interfaces. I start by adding the new VIPs to the /etc/hosts files. These lines should be added to both nodes' /etc/hosts file.
# Private VIPs
192.168.0.103 rac1-ib.localdomain rac1-ib
192.168.0.104 rac2-ib.localdomain rac2-ib
With root I run "srvctl add vip" from the grid home to create the VIPs.
[root@rac1 ~]# cd /u01/app/11.2.0/grid/bin/
[root@rac1 bin]# ./srvctl add vip -n rac1 -A 192.168.0.103/255.255.255.0/eth1 -k 2
[root@rac1 bin]# ./srvctl add vip -n rac2 -A 192.168.0.104/255.255.255.0/eth1 -k 2
I created a VIP for each node on the eth1 interface which is the private interface and I have specified "-k 2" to indicate that the network number is 2. You can use "srvctl add vip -h" to see what options you have.
At this step if you look at the "ifconfig -a" output you will not see the new VIPs up because we have not started them up yet. Let's start them now.
[root@rac1 bin]# ./srvctl start vip -i rac1-ib
Now you will see the new IP up in the "ifconfig -a" output. This is the related line.
eth1:2 Link encap:Ethernet HWaddr 08:00:27:8B:69:FB
inet addr:192.168.0.103 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
We need to start the new VIP for the second node also. Go to the second node and repeat the same for that.
[root@rac2 bin]# ./srvctl start vip -i rac2-ib
We can now start netca and we see the new subnet there. You can create the listener by selecting the new subnet. I named mine as LISTENER_IB and chose 1522 as the port number.
After netca completes here is what I see on rac1.
[oracle@rac1 ~]$ lsnrctl status listener_ib
LSNRCTL for Linux: Version 11.2.0.2.0 - Production on 24-DEC-2010 13:36:20
Copyright (c) 1991, 2010, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_IB)))
STATUS of the LISTENER
------------------------
Alias LISTENER_IB
Version TNSLSNR for Linux: Version 11.2.0.2.0 - Production
Start Date 24-DEC-2010 13:34:46
Uptime 0 days 0 hr. 1 min. 35 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File /u01/app/oracle/diag/tnslsnr/rac1/listener_ib/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_IB)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.103)(PORT=1522)))
The listener supports no services
The command completed successfully
The listener is up and running but it does support any services yet. Here is my init.ora parameters related to the listener.
SQL> show parameter listen
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
listener_networks string
local_listener string (DESCRIPTION=(ADDRESS_LIST=(AD
DRESS=(PROTOCOL=TCP)(HOST=192.
168.56.4)(PORT=1521))))
remote_listener string rac-scan:1521
I can change the local_listener parameter and add this listener so that my database registers with it. After that as the default installation sets the SCAN listener as the remote_listener SCAN will be able to direct connections to this listener as well. But, there is a problem with this. What happens if SCAN directs the connection from the ETL server to the public interface instead of the private one? Or vice-versa, what happens if it directs connections from the public network clients to the private interface? Users will get errors because they cannot reach the private network and the ETL server will get errors because it cannot reach the public network.
The correct way to register my database to the listeners is to use the listener_networks parameter. listener_networks is a new 11.2 parameter and serves the purpose of cross-registration when you have listeners on multiple networks. Basically with it you can say, "register my local listeners on the public interfaces to the SCAN listener, register my local listeners on the private interface to each other".
This way clients using SCAN will connect to the public interfaces and clients using the private network will connect to the private interfaces. Let's do it now.
Not to clutter the listener parameters with long tns descriptions let's add the definitions to the tnsnames.ora file and use the names instead. On both nodes's tnsnames.ora file residing in the database home I add these lines. Remember to change the hostnames to rac2 for the ones other ORCL_IB when editing the file on rac2.
ORCL_IB =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = rac1-ib)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = rac2-ib)(PORT = 1522))
)
ORCL_PUBLIC_LOCAL=
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = rac1-vip)(PORT = 1521))
)
ORCL_PRIVATE_LOCAL=
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = rac1-ib)(PORT = 1522))
)
ORCL_PUBLIC_LOCAL will be used as the local listener for the public network, ORCL_PRIVATE_LOCAL will be used as the local listener for the private network, ORCL_IB will be used as the remote listener for the private network so that the local private listeners can register to each other and the SCAN address will be used as the remote listener for the public network.
Now time to change the parameters.
SQL> alter system set listener_networks='((name=public_network)(local_listener=orcl_public_local)(remote_listener=rac-scan:1521))','((name=priv_network)(local_listener=orcl_private_local)(remote_listener=orcl_ib))';
System altered.
SQL> alter system set remote_listener='';
System altered.
SQL> show parameter listen
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
listener_networks string ((name=public_network)(local_l
istener=orcl_public_local)(rem
ote_listener=rac-scan:1521)),
((name=priv_network)(local_lis
tener=orcl_private_local)(remo
te_listener=rac-scan:1521))
local_listener string (DESCRIPTION=(ADDRESS_LIST=(AD
DRESS=(PROTOCOL=TCP)(HOST=192.
168.56.4)(PORT=1521))))
remote_listener string
Now let's see what is registered to the new listener listener_ib.
[oracle@rac1 ~]$ lsnrctl status listener_ib
LSNRCTL for Linux: Version 11.2.0.2.0 - Production on 24-DEC-2010 15:03:17
Copyright (c) 1991, 2010, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_IB)))
STATUS of the LISTENER
------------------------
Alias LISTENER_IB
Version TNSLSNR for Linux: Version 11.2.0.2.0 - Production
Start Date 24-DEC-2010 13:34:46
Uptime 0 days 1 hr. 28 min. 32 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File /u01/app/oracle/diag/tnslsnr/rac1/listener_ib/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_IB)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.103)(PORT=1522)))
Services Summary...
Service "orcl" has 2 instance(s).
Instance "orcl1", status READY, has 2 handler(s) for this service...
Instance "orcl2", status READY, has 1 handler(s) for this service...
The command completed successfully
Both instances are registered and can be served by this listener. Also have a look at the scan listener to make sure we did not make a mess.
[oracle@rac1 ~]$ lsnrctl status listener_scan1
LSNRCTL for Linux: Version 11.2.0.2.0 - Production on 24-DEC-2010 15:04:47
Copyright (c) 1991, 2010, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias LISTENER_SCAN1
Version TNSLSNR for Linux: Version 11.2.0.2.0 - Production
Start Date 24-DEC-2010 12:06:02
Uptime 0 days 2 hr. 58 min. 45 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File /u01/app/11.2.0/grid/log/diag/tnslsnr/rac1/listener_scan1/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.56.6)(PORT=1521)))
Services Summary...
Service "orcl" has 2 instance(s).
Instance "orcl1", status READY, has 1 handler(s) for this service...
Instance "orcl2", status READY, has 1 handler(s) for this service...
The command completed successfully
It is OK too.
Now we can use this tnsnames entry to connect through the private network.
ORCL_ETL =
(DESCRIPTION =
(LOAD_BALANCE=on)
(ADDRESS = (PROTOCOL = TCP)(HOST = rac1-ib)(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = rac2-ib)(PORT = 1522))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = orcl)
)
)
It took quite a while for me to get this in writing, if you use this procedure and see problems or if you think there are points not explained clearly, let me know so I can edit the post, this way it will be more useful to others who may need it.
Applying Grid Infrastructure Patch Set 1 (GI PSU1) to non-RAC installations
If you are trying to patch an existing 11.2.0.1 installation on a non-RAC GI (also called single instance HA) system be aware that some steps will be different from the patch readme or the documentation.
Here is what I experienced.
The documentation for Grid Infrastructure 11.2 states that if you want to upgrade from 11.2.0.1 to 11.2.0.2 you need to apply some patches first.
"To upgrade existing 11.2.0.1 Oracle Grid Infrastructure installations to Oracle Grid Infrastructure 11.2.0.2, you must first do at least one of the following:
"2.2.5 Prepare GI home for patch
As an owner of Oracle GI home software owner, run the command:
%/bin/srvctl stop home -o -s -n "
When I ran this on a single instance system I got:
[oracle@oel5 patches]$ srvctl stop home -o /u01/app/product/11.2.0/grid -s /tmp/stat.txt
PRCH-1002 : Failed to stop resources running from crs home /u01/app/product/11.2.0/grid
PRCH-1030 : one or more resources failed to stop PRCH-1026 : Failed to stop ASM
PRCD-1027 : Failed to retrieve database orcl
PRCD-1035 : orcl is the database unique name of a single instance database, not a cluster database
This is a problem with srvctl and this step must be skipped for non-RAC GI installations as the next step will stop GI. After this README tells you to unlock the grid home:
"You must invoke this script as root user to open protection on GI software files for patch application
#/crs/install/rootcrs.pl -unlock"
rootcrs.pl is the script used for RAC environments. If you have non-RAC GI the script you need to use is roothas.pl. After changing rootcrs.pl with roothas.pl and unlocking the grid home you can continue with applying the patch.
MOS note 1089476.1 explains the steps to patch a non-RAC GI home. Keep this note in mind when you are working on such a system.
Here is what I experienced.
The documentation for Grid Infrastructure 11.2 states that if you want to upgrade from 11.2.0.1 to 11.2.0.2 you need to apply some patches first.
"To upgrade existing 11.2.0.1 Oracle Grid Infrastructure installations to Oracle Grid Infrastructure 11.2.0.2, you must first do at least one of the following:
- Patch the release 11.2.0.1 Oracle Grid Infrastructure home with the 9413827 and 9706490 patches.
- Install Oracle Grid Infrastructure Patch Set 1 (GI PSU1) or Oracle Grid Infrastructure Patch Set 2 (GI PSU2)."
"2.2.5 Prepare GI home for patch
As an owner of Oracle GI home software owner, run the command:
%/bin/srvctl stop home -o -s -n "
When I ran this on a single instance system I got:
[oracle@oel5 patches]$ srvctl stop home -o /u01/app/product/11.2.0/grid -s /tmp/stat.txt
PRCH-1002 : Failed to stop resources running from crs home /u01/app/product/11.2.0/grid
PRCH-1030 : one or more resources failed to stop PRCH-1026 : Failed to stop ASM
PRCD-1027 : Failed to retrieve database orcl
PRCD-1035 : orcl is the database unique name of a single instance database, not a cluster database
This is a problem with srvctl and this step must be skipped for non-RAC GI installations as the next step will stop GI. After this README tells you to unlock the grid home:
"You must invoke this script as root user to open protection on GI software files for patch application
#/crs/install/rootcrs.pl -unlock"
rootcrs.pl is the script used for RAC environments. If you have non-RAC GI the script you need to use is roothas.pl. After changing rootcrs.pl with roothas.pl and unlocking the grid home you can continue with applying the patch.
MOS note 1089476.1 explains the steps to patch a non-RAC GI home. Keep this note in mind when you are working on such a system.
Do not forget to set gpgkey when installing the oracle-validated rpm
In my previous post I had five lines added to /etc/yum.conf.
[local]
name="media"
baseurl=file:///media/Enterprise%20Linux%20dvd%2020090127/Server
enabled=1
gpgkey=file:///media/Enterprise%20Linux%20dvd%2020090127/RPM-GPG-KEY-oracle
The ones except gpgkey are self-explanatory. The parameter gpgkey is used to point to a file that contains the public key for the packages you install so that yum can verify the package's authenticity if needed. The file I use is the key file that contains the public key to verify the oracle-validated rpm.
oracle-validated rpm is used to install the necessary packages for Oracle installations, it also updates the kernel parameters and creates a default oracle user. Using it is an easy way to prepare your server for Oracle installations, the other option is to check the installation prerequisites from the documentation and install the packages, update the kernel parameters and create the user yourself.
MOS Note 579101.1 explains how to install the oracle-validated rpm.
I tried to install this rpm without checking the note and I did not use the gpgkey parameter in /etc/yum.conf initially. This is what you get if you do not set it.
[root@oeltest tmp]# yum install oracle-validated-1.0.0-18.el5.i386.rpm
Loaded plugins: security
Setting up Install Process
Parsing package install arguments
Examining oracle-validated-1.0.0-18.el5.i386.rpm: oracle-validated-1.0.0-18.el5.i386
Marking oracle-validated-1.0.0-18.el5.i386.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package oracle-validated.i386 0:1.0.0-18.el5 set to be updated
--> Processing Dependency: compat-db for package: oracle-validated
--> Processing Dependency: compat-gcc-34 for package: oracle-validated
--> Processing Dependency: compat-gcc-34-c++ for package: oracle-validated
--> Processing Dependency: elfutils-libelf-devel for package: oracle-validated
--> Processing Dependency: gcc for package: oracle-validated
--> Processing Dependency: gcc-c++ for package: oracle-validated
--> Processing Dependency: gdb for package: oracle-validated
--> Processing Dependency: glibc-devel for package: oracle-validated
--> Processing Dependency: glibc-headers for package: oracle-validated
--> Processing Dependency: kernel-headers for package: oracle-validated
--> Processing Dependency: libXp for package: oracle-validated
--> Processing Dependency: libaio-devel for package: oracle-validated
--> Processing Dependency: libstdc++-devel for package: oracle-validated
--> Processing Dependency: sysstat for package: oracle-validated
--> Processing Dependency: unixODBC for package: oracle-validated
--> Processing Dependency: unixODBC-devel for package: oracle-validated
--> Running transaction check
---> Package kernel-headers.i386 0:2.6.18-128.el5 set to be updated
---> Package libaio-devel.i386 0:0.3.106-3.2 set to be updated
---> Package compat-gcc-34-c++.i386 0:3.4.6-4 set to be updated
---> Package gdb.i386 0:6.8-27.el5 set to be updated
---> Package libXp.i386 0:1.0.0-8.1.el5 set to be updated
---> Package compat-db.i386 0:4.2.52-5.1 set to be updated
---> Package unixODBC-devel.i386 0:2.2.11-7.1 set to be updated
---> Package gcc.i386 0:4.1.2-44.el5 set to be updated
--> Processing Dependency: libgomp >= 4.1.2-44.el5 for package: gcc
---> Package compat-gcc-34.i386 0:3.4.6-4 set to be updated
---> Package glibc-headers.i386 0:2.5-34 set to be updated
---> Package sysstat.i386 0:7.0.2-3.el5 set to be updated
---> Package elfutils-libelf-devel.i386 0:0.137-3.el5 set to be updated
--> Processing Dependency: elfutils-libelf-devel-static-i386 = 0.137-3.el5 for package: elfutils-libelf-devel
---> Package unixODBC.i386 0:2.2.11-7.1 set to be updated
---> Package glibc-devel.i386 0:2.5-34 set to be updated
---> Package libstdc++-devel.i386 0:4.1.2-44.el5 set to be updated
---> Package gcc-c++.i386 0:4.1.2-44.el5 set to be updated
--> Running transaction check
---> Package libgomp.i386 0:4.3.2-7.el5 set to be updated
---> Package elfutils-libelf-devel-static.i386 0:0.137-3.el5 set to be updated
--> Finished Dependency Resolution
Dependencies Resolved
=================================================================================================================================================================================
Package Arch Version Repository Size
=================================================================================================================================================================================
Installing:
oracle-validated i386 1.0.0-18.el5 oracle-validated-1.0.0-18.el5.i386.rpm 15 k
Installing for dependencies:
compat-db i386 4.2.52-5.1 local 1.7 M
compat-gcc-34 i386 3.4.6-4 local 4.1 M
compat-gcc-34-c++ i386 3.4.6-4 local 11 M
elfutils-libelf-devel i386 0.137-3.el5 local 24 k
elfutils-libelf-devel-static i386 0.137-3.el5 local 66 k
gcc i386 4.1.2-44.el5 local 5.2 M
gcc-c++ i386 4.1.2-44.el5 local 3.4 M
gdb i386 6.8-27.el5 local 3.3 M
glibc-devel i386 2.5-34 local 2.0 M
glibc-headers i386 2.5-34 local 612 k
kernel-headers i386 2.6.18-128.el5 local 926 k
libXp i386 1.0.0-8.1.el5 local 22 k
libaio-devel i386 0.3.106-3.2 local 11 k
libgomp i386 4.3.2-7.el5 local 67 k
libstdc++-devel i386 4.1.2-44.el5 local 2.9 M
sysstat i386 7.0.2-3.el5 local 170 k
unixODBC i386 2.2.11-7.1 local 830 k
unixODBC-devel i386 2.2.11-7.1 local 743 k
Transaction Summary
=================================================================================================================================================================================
Install 19 Package(s)
Update 0 Package(s)
Remove 0 Package(s)
Total download size: 37 M
Is this ok [y/N]: y
Downloading Packages:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 2.6 GB/s | 37 MB 00:00
warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID 1e5e0159
Public key for oracle-validated-1.0.0-18.el5.i386.rpm is not installed
The error indicates that yum cannot verify this rpm so it does not install it. When you update /etc/yum.conf with the gpgkey parameter the error goes away.
[local]
name="media"
baseurl=file:///media/Enterprise%20Linux%20dvd%2020090127/Server
enabled=1
gpgkey=file:///media/Enterprise%20Linux%20dvd%2020090127/RPM-GPG-KEY-oracle
The ones except gpgkey are self-explanatory. The parameter gpgkey is used to point to a file that contains the public key for the packages you install so that yum can verify the package's authenticity if needed. The file I use is the key file that contains the public key to verify the oracle-validated rpm.
oracle-validated rpm is used to install the necessary packages for Oracle installations, it also updates the kernel parameters and creates a default oracle user. Using it is an easy way to prepare your server for Oracle installations, the other option is to check the installation prerequisites from the documentation and install the packages, update the kernel parameters and create the user yourself.
MOS Note 579101.1 explains how to install the oracle-validated rpm.
I tried to install this rpm without checking the note and I did not use the gpgkey parameter in /etc/yum.conf initially. This is what you get if you do not set it.
[root@oeltest tmp]# yum install oracle-validated-1.0.0-18.el5.i386.rpm
Loaded plugins: security
Setting up Install Process
Parsing package install arguments
Examining oracle-validated-1.0.0-18.el5.i386.rpm: oracle-validated-1.0.0-18.el5.i386
Marking oracle-validated-1.0.0-18.el5.i386.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package oracle-validated.i386 0:1.0.0-18.el5 set to be updated
--> Processing Dependency: compat-db for package: oracle-validated
--> Processing Dependency: compat-gcc-34 for package: oracle-validated
--> Processing Dependency: compat-gcc-34-c++ for package: oracle-validated
--> Processing Dependency: elfutils-libelf-devel for package: oracle-validated
--> Processing Dependency: gcc for package: oracle-validated
--> Processing Dependency: gcc-c++ for package: oracle-validated
--> Processing Dependency: gdb for package: oracle-validated
--> Processing Dependency: glibc-devel for package: oracle-validated
--> Processing Dependency: glibc-headers for package: oracle-validated
--> Processing Dependency: kernel-headers for package: oracle-validated
--> Processing Dependency: libXp for package: oracle-validated
--> Processing Dependency: libaio-devel for package: oracle-validated
--> Processing Dependency: libstdc++-devel for package: oracle-validated
--> Processing Dependency: sysstat for package: oracle-validated
--> Processing Dependency: unixODBC for package: oracle-validated
--> Processing Dependency: unixODBC-devel for package: oracle-validated
--> Running transaction check
---> Package kernel-headers.i386 0:2.6.18-128.el5 set to be updated
---> Package libaio-devel.i386 0:0.3.106-3.2 set to be updated
---> Package compat-gcc-34-c++.i386 0:3.4.6-4 set to be updated
---> Package gdb.i386 0:6.8-27.el5 set to be updated
---> Package libXp.i386 0:1.0.0-8.1.el5 set to be updated
---> Package compat-db.i386 0:4.2.52-5.1 set to be updated
---> Package unixODBC-devel.i386 0:2.2.11-7.1 set to be updated
---> Package gcc.i386 0:4.1.2-44.el5 set to be updated
--> Processing Dependency: libgomp >= 4.1.2-44.el5 for package: gcc
---> Package compat-gcc-34.i386 0:3.4.6-4 set to be updated
---> Package glibc-headers.i386 0:2.5-34 set to be updated
---> Package sysstat.i386 0:7.0.2-3.el5 set to be updated
---> Package elfutils-libelf-devel.i386 0:0.137-3.el5 set to be updated
--> Processing Dependency: elfutils-libelf-devel-static-i386 = 0.137-3.el5 for package: elfutils-libelf-devel
---> Package unixODBC.i386 0:2.2.11-7.1 set to be updated
---> Package glibc-devel.i386 0:2.5-34 set to be updated
---> Package libstdc++-devel.i386 0:4.1.2-44.el5 set to be updated
---> Package gcc-c++.i386 0:4.1.2-44.el5 set to be updated
--> Running transaction check
---> Package libgomp.i386 0:4.3.2-7.el5 set to be updated
---> Package elfutils-libelf-devel-static.i386 0:0.137-3.el5 set to be updated
--> Finished Dependency Resolution
Dependencies Resolved
=================================================================================================================================================================================
Package Arch Version Repository Size
=================================================================================================================================================================================
Installing:
oracle-validated i386 1.0.0-18.el5 oracle-validated-1.0.0-18.el5.i386.rpm 15 k
Installing for dependencies:
compat-db i386 4.2.52-5.1 local 1.7 M
compat-gcc-34 i386 3.4.6-4 local 4.1 M
compat-gcc-34-c++ i386 3.4.6-4 local 11 M
elfutils-libelf-devel i386 0.137-3.el5 local 24 k
elfutils-libelf-devel-static i386 0.137-3.el5 local 66 k
gcc i386 4.1.2-44.el5 local 5.2 M
gcc-c++ i386 4.1.2-44.el5 local 3.4 M
gdb i386 6.8-27.el5 local 3.3 M
glibc-devel i386 2.5-34 local 2.0 M
glibc-headers i386 2.5-34 local 612 k
kernel-headers i386 2.6.18-128.el5 local 926 k
libXp i386 1.0.0-8.1.el5 local 22 k
libaio-devel i386 0.3.106-3.2 local 11 k
libgomp i386 4.3.2-7.el5 local 67 k
libstdc++-devel i386 4.1.2-44.el5 local 2.9 M
sysstat i386 7.0.2-3.el5 local 170 k
unixODBC i386 2.2.11-7.1 local 830 k
unixODBC-devel i386 2.2.11-7.1 local 743 k
Transaction Summary
=================================================================================================================================================================================
Install 19 Package(s)
Update 0 Package(s)
Remove 0 Package(s)
Total download size: 37 M
Is this ok [y/N]: y
Downloading Packages:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 2.6 GB/s | 37 MB 00:00
warning: rpmts_HdrFromFdno: Header V3 DSA signature: NOKEY, key ID 1e5e0159
Public key for oracle-validated-1.0.0-18.el5.i386.rpm is not installed
The error indicates that yum cannot verify this rpm so it does not install it. When you update /etc/yum.conf with the gpgkey parameter the error goes away.
How to use the Oracle Enterprise Linux installation media as the yum repository
If you are using Oracle Enterprise Linux (OEL) 5, the installation media comes with a yum repository on it. The repository is in the directory /media/Enterprise Linux dvd 20090127/Server/repodata for OEL 5.3 (the location may change).
It is possible to use that repository when installing new packages or components locally without accessing a remote repository or without having the need to copy the rpms to a local directory. If you did a base installation the yum package is already installed, if not you need to install it first. After yum is in place edit /etc/yum.conf to insert lines related to the repository on the installation media.
[main]
cachedir=/var/cache/yum
keepcache=0
debuglevel=2
logfile=/var/log/yum.log
distroverpkg=redhat-release
tolerant=1
exactarch=1
obsoletes=1
gpgcheck=1
plugins=1
# Note: yum-RHN-plugin doesn't honor this.
metadata_expire=1h
# Default.
# installonly_limit = 3
# PUT YOUR REPOS HERE OR IN separate files named file.repo
# in /etc/yum.repos.d
[local]
name="media"
baseurl=file:///media/Enterprise%20Linux%20dvd%2020090127/Server
enabled=1
gpgkey=file:///media/Enterprise%20Linux%20dvd%2020090127/RPM-GPG-KEY-oracle
You need to add only the last 5 lines, the other ones are already there.
After this you can use yum to add packages or components when needed. An example to install the oracle-validated rpm from this repository will be in my next post.
It is possible to use that repository when installing new packages or components locally without accessing a remote repository or without having the need to copy the rpms to a local directory. If you did a base installation the yum package is already installed, if not you need to install it first. After yum is in place edit /etc/yum.conf to insert lines related to the repository on the installation media.
[main]
cachedir=/var/cache/yum
keepcache=0
debuglevel=2
logfile=/var/log/yum.log
distroverpkg=redhat-release
tolerant=1
exactarch=1
obsoletes=1
gpgcheck=1
plugins=1
# Note: yum-RHN-plugin doesn't honor this.
metadata_expire=1h
# Default.
# installonly_limit = 3
# PUT YOUR REPOS HERE OR IN separate files named file.repo
# in /etc/yum.repos.d
[local]
name="media"
baseurl=file:///media/Enterprise%20Linux%20dvd%2020090127/Server
enabled=1
gpgkey=file:///media/Enterprise%20Linux%20dvd%2020090127/RPM-GPG-KEY-oracle
You need to add only the last 5 lines, the other ones are already there.
After this you can use yum to add packages or components when needed. An example to install the oracle-validated rpm from this repository will be in my next post.
Starting a vm in VirtualBox hangs
When trying to start a new vm in Oracle VM VirtualBox 3.2.10 it hot hung at progress 0%. There is no way to cancel the startup without killing the process from the host OS.
The first thing to look at is the VBox.log file which resides under C:\Documents and Settings\\.VirtualBox\Machines\OELTest\Logs in Windows XP. Or you can access the log using the menu like below.
I had these lines at the bottom of the log file.
00:00:04.228 AIOMgr: Endpoint for file 'E:\vm\OELTest\OELTest_boot.vdi' (flags 000c0723) created successfully
00:00:04.228 AIOMgr: I/O manager 0x3f3f898 encountered a critical error (rc=VERR_INVALID_PARAMETER) during operation. Falling back to failsafe mode. Expect reduced performance
00:00:04.228 AIOMgr: Error happened in D:\tinderbox\win-3.2\src\VBox\VMM\PDMAsyncCompletionFileNormal.cpp:(1631){pdmacFileAioMgrNormal}
00:00:04.228 AIOMgr: Please contact the product vendor
This seems like the problem is related to the virtual disk I created as the root disk. I am using the SATA interface and in the virtual machine storage settings there is an option named "Use host I/O cache" which is unchecked in my case. Checking it and starting up the vm again resolves the issue.
There are lots of Google results when you search for messages in the log file but the issue has been explained in http://www.virtualbox.org/ticket/7363.
The first thing to look at is the VBox.log file which resides under C:\Documents and Settings\\.VirtualBox\Machines\OELTest\Logs in Windows XP. Or you can access the log using the menu like below.
I had these lines at the bottom of the log file.
00:00:04.228 AIOMgr: Endpoint for file 'E:\vm\OELTest\OELTest_boot.vdi' (flags 000c0723) created successfully
00:00:04.228 AIOMgr: I/O manager 0x3f3f898 encountered a critical error (rc=VERR_INVALID_PARAMETER) during operation. Falling back to failsafe mode. Expect reduced performance
00:00:04.228 AIOMgr: Error happened in D:\tinderbox\win-3.2\src\VBox\VMM\PDMAsyncCompletionFileNormal.cpp:(1631){pdmacFileAioMgrNormal}
00:00:04.228 AIOMgr: Please contact the product vendor
This seems like the problem is related to the virtual disk I created as the root disk. I am using the SATA interface and in the virtual machine storage settings there is an option named "Use host I/O cache" which is unchecked in my case. Checking it and starting up the vm again resolves the issue.
There are lots of Google results when you search for messages in the log file but the issue has been explained in http://www.virtualbox.org/ticket/7363.
opatch problem on Windows
There is a note in Metalink that explains that on Windows having space characters in your ORACLE_HOME variable, the patch location or JDK location causes an error when running opatch. Yesterday I saw a strange problem that is similar to the above case.
If your opatch directory contains space characters you get a strange error. Even if the above conditions were not present we got an error like this:
C:\Documents and Settings\test\Desktop\OPatch>opatch lsinventory
Exception in thread "main" java.lang.NoClassDefFoundError: and
OPatch failed with error code = 1
Metalink returns no results for this error. This error is caused by the space characters in "Documents and Settings". When you move the opatch directory to another directory which does not contain space in its name opatch runs without this problem.
Just a note to help in case someone gets the same error.
If your opatch directory contains space characters you get a strange error. Even if the above conditions were not present we got an error like this:
C:\Documents and Settings\test\Desktop\OPatch>opatch lsinventory
Exception in thread "main" java.lang.NoClassDefFoundError: and
OPatch failed with error code = 1
Metalink returns no results for this error. This error is caused by the space characters in "Documents and Settings". When you move the opatch directory to another directory which does not contain space in its name opatch runs without this problem.
Just a note to help in case someone gets the same error.
DBFS
Yesterday I attended Kevin Closson's Exadata technical deep dive webcast series part 4. It is now available to download here. In there he talks about DBFS which is a filesystem on top of the Oracle database which can store normal files like text files. DBFS is provided with Exadata and is used to store staging files for the ETL/ELT process. This looks very promising, he sites several tests he conducted and gives performance numbers too. Watch the webcast if you haven't yet.
Tablespace selection in interval partitioning
11G brought interval partitioning which is a new partitioning method to ease the maintenance burden of adding new partitions manually. The interval partition clause in the create table statement has an option to list tablespace names to be used for interval partitioning. The documentation states that the tablespaces in the list you provide are used in a round-robin manner for new partitions:
Interval partitions are created in the provided list of tablespaces in a round-robin manner.
This does not mean that any newly created partition will reside in the tablespace which is next on the list. The tablespaces may be skipped if partitions map to more than one interval. Here is a test case that shows how the list is used.
The "store in" clause lists tablespaces tbs1, tbs2 and tbs3 to be used for interval partitioning. After the above create table command I now have one partition which resides in tbs1. Let's insert a row which needs to be inserted into a new partition and see which tablespace the partition will be created in.
The row I inserted maps to one interval, which is one month, it does not have a date value which is more than one month higher than the current maximum value. So the next tablespace, tbs2, is used for the new partition.
Again I inserted a row for the next month and the partition is created in tbs3, which is the next one on the list.
What happens if I insert a row with a date value that is more than one month after the current maximum partitioning key?
I skipped March and inserted a value for April. The current maximum key becomes May 1st, we do not see a partition with a maximum value of Apr 1st. The next tablespace on the list was tbs1 but we see that the new partition is on tbs2, not tbs1. Tbs1 would be used if I did not skip an interval when inserting rows.
So, the tablespaces on the list are used in a round-robin manner but each is used for only one interval. If you skip intervals the tablespaces related to that interval are skipped too.
This is something to keep in mind if you want to strictly decide which tablespace will hold which partition.
Interval partitions are created in the provided list of tablespaces in a round-robin manner.
This does not mean that any newly created partition will reside in the tablespace which is next on the list. The tablespaces may be skipped if partitions map to more than one interval. Here is a test case that shows how the list is used.
set lines 200
SQL> r
1 create table t(col1 date,col2 varchar2(100))
2 partition by range (col1)
3 interval(numtoyminterval(1,'MONTH')) store in (tbs1,tbs2,tbs3)
4* (PARTITION p0 VALUES LESS THAN (TO_DATE('1-1-2009', 'DD-MM-YYYY')) tablespace tbs1)
Table created.
SQL> r
1 select PARTITION_NAME,HIGH_VALUE,TABLESPACE_NAME
2* from user_Tab_partitions where table_name='T'
PARTITION_NAME HIGH_VALUE TABLESPACE_NAME
------------------------------ -------------------------------------------------------------------------------- ------------------------------
P0 TO_DATE(' 2009-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA TBS1
The "store in" clause lists tablespaces tbs1, tbs2 and tbs3 to be used for interval partitioning. After the above create table command I now have one partition which resides in tbs1. Let's insert a row which needs to be inserted into a new partition and see which tablespace the partition will be created in.
SQL> insert into t values(to_date('15.01.2009','dd.mm.yyyy'),'jan');
1 row created.
SQL> commit;
Commit complete.
SQL> select PARTITION_NAME,HIGH_VALUE,TABLESPACE_NAME
2 from user_Tab_partitions where table_name='T';
PARTITION_NAME HIGH_VALUE TABLESPACE_NAME
------------------------------ -------------------------------------------------------------------------------- ------------------------------
P0 TO_DATE(' 2009-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA TBS1
SYS_P61 TO_DATE(' 2009-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA TBS2
The row I inserted maps to one interval, which is one month, it does not have a date value which is more than one month higher than the current maximum value. So the next tablespace, tbs2, is used for the new partition.
SQL> insert into t values(to_date('15.02.2009','dd.mm.yyyy'),'feb');
1 row created.
SQL> commit;
Commit complete.
SQL> select PARTITION_NAME,HIGH_VALUE,TABLESPACE_NAME
2 from user_Tab_partitions where table_name='T';
PARTITION_NAME HIGH_VALUE TABLESPACE_NAME
------------------------------ -------------------------------------------------------------------------------- ------------------------------
SYS_P62 TO_DATE(' 2009-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA TBS3
P0 TO_DATE(' 2009-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA TBS1
SYS_P61 TO_DATE(' 2009-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA TBS2
Again I inserted a row for the next month and the partition is created in tbs3, which is the next one on the list.
What happens if I insert a row with a date value that is more than one month after the current maximum partitioning key?
SQL> insert into t values(to_date('15.04.2009','dd.mm.yyyy'),'apr');
1 row created.
SQL> commit;
Commit complete.
SQL> select PARTITION_NAME,HIGH_VALUE,TABLESPACE_NAME
2 from user_Tab_partitions where table_name='T';
PARTITION_NAME HIGH_VALUE TABLESPACE_NAME
------------------------------ -------------------------------------------------------------------------------- ------------------------------
SYS_P62 TO_DATE(' 2009-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA TBS3
P0 TO_DATE(' 2009-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA TBS1
SYS_P61 TO_DATE(' 2009-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA TBS2
SYS_P63 TO_DATE(' 2009-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA TBS2
I skipped March and inserted a value for April. The current maximum key becomes May 1st, we do not see a partition with a maximum value of Apr 1st. The next tablespace on the list was tbs1 but we see that the new partition is on tbs2, not tbs1. Tbs1 would be used if I did not skip an interval when inserting rows.
So, the tablespaces on the list are used in a round-robin manner but each is used for only one interval. If you skip intervals the tablespaces related to that interval are skipped too.
This is something to keep in mind if you want to strictly decide which tablespace will hold which partition.
Different plans for a sql with the rule hint
I was trying to find out why a query with the RULE hint produces different plans. I got stuck so I posted the problem to the OTN database forum and Randolf Geist provided a good answer and starting point for it. A second eye on the problem can remove the mind blockage and clear your way.
Remote dependencies
When changing plsql code in a production system the dependencies between objects can cause some programs to be invalidated. Making sure all programs are valid before opening the system to users is very important for application availability. Oracle automatically compiles invalid programs on their first execution, but a successfull compilation may not be possible because the error may need code correction or the compilation may cause library cache locks when applications are running.
Dependencies between plsql programs residing in the same database are easy to handle. When you compile a program the programs dependent on that one are invalidated right away and we can see the invalid programs and compile or correct them before opening the system to the users.
If you have db links and if your plsql programs are dependent on programs residing in other databases you have a problem handling the invalidations. There is an initialization parameter named remote_dependencies_mode that handles this dependency management. If it is set to TIMESTAMP the timestamp of the local program is compared to that of the remote program. If the remote program's timestamp is more recent than the local program, the local program is invalidated in the first run. If the parameter is set to SIGNATURE the remote program's signature (number and types of parameters, subprogram names, etc...) is checked, if there has been a change the local program is invalidated in the first run.
The problem here is; if you change the remote program's signature or timestamp you cannot see that the local program is invalidated because it will wait for the first execution to be invalidated. If you open the system to the users before correcting this you may face a serious application problem leading to downtime.
Here is a simple test case to see the problem.
I start by creating a plsql package in the database TEST.
On another database let's create a database link pointing to the remote database TEST, create a synonym for the remote package and create a local package calling the remote one.
If we look at the status of this local package we see that it is valid.
We can execute it without any problems.
Now, let's change the remote package code including the package specification.
When we look at the local package we see its status as valid.
But when it is executed we get an error.
The second execution compiles the package and returns the status to valid.
How can we know beforehand that the local program will be invalidated on the first execution? The only way I can think of is to check the dependencies across all databases involved. By collecting all rows from dba_dependencies from all databases we can see that when the remote program is changed the programs on other databases that use this remote program will be invalidated when they are executed. Then we can compile these programs and see if they compile without errors.
Database links may be very annoying sometimes, this case is just one of them.
Dependencies between plsql programs residing in the same database are easy to handle. When you compile a program the programs dependent on that one are invalidated right away and we can see the invalid programs and compile or correct them before opening the system to the users.
If you have db links and if your plsql programs are dependent on programs residing in other databases you have a problem handling the invalidations. There is an initialization parameter named remote_dependencies_mode that handles this dependency management. If it is set to TIMESTAMP the timestamp of the local program is compared to that of the remote program. If the remote program's timestamp is more recent than the local program, the local program is invalidated in the first run. If the parameter is set to SIGNATURE the remote program's signature (number and types of parameters, subprogram names, etc...) is checked, if there has been a change the local program is invalidated in the first run.
The problem here is; if you change the remote program's signature or timestamp you cannot see that the local program is invalidated because it will wait for the first execution to be invalidated. If you open the system to the users before correcting this you may face a serious application problem leading to downtime.
Here is a simple test case to see the problem.
I start by creating a plsql package in the database TEST.
SQL> create package test_pack as
2 procedure test_proc(a number);
3 end;
4 /
Package created.
SQL> create package body test_pack as
2 procedure test_proc(a number) is
3 begin
4 null;
5 end;
6 end;
7 /
Package body created.
On another database let's create a database link pointing to the remote database TEST, create a synonym for the remote package and create a local package calling the remote one.
SQL> create database link test.world connect to yas identified by yas using 'TEST';
Database link created.
SQL> select * from dual@yas.world;
D
-
X
SQL> create synonym test_pack for test_pack@yasbs.world;
Synonym created.
SQL> create package local_pack as
2 procedure local_test(a number);
3 end;
4 /
Package created.
SQL> r
1 create or replace package body local_pack as
2 procedure local_test(a number) is
3 begin
4 test_pack.test_proc(1);
5 end;
6* end;
Package body created.
If we look at the status of this local package we see that it is valid.
SQL> col object_name format a30
SQL> r
1* select status,object_name,object_type from user_objects where object_name='LOCAL_PACK'
STATUS OBJECT_NAME OBJECT_TYPE
------- ------------------------------ ------------------
VALID LOCAL_PACK PACKAGE
VALID LOCAL_PACK PACKAGE BODY
We can execute it without any problems.
SQL> exec LOCAL_PACK.local_test(1);
PL/SQL procedure successfully completed.
Now, let's change the remote package code including the package specification.
SQL> create or replace package test_pack as
2 procedure test_proc(b number,a number default 1);
3 end;
4 /
Package created.
SQL> create or replace package body test_pack as
2 procedure test_proc(b number,a number default 1) is
3 begin
4 null;
5 end;
6 end;
7 /
Package body created.
When we look at the local package we see its status as valid.
SQL> r
1* select status,object_name,object_type from user_objects where object_name='LOCAL_PACK'
STATUS OBJECT_NAME OBJECT_TYPE
------- ------------------------------ ------------------
VALID LOCAL_PACK PACKAGE
VALID LOCAL_PACK PACKAGE BODY
But when it is executed we get an error.
SQL> exec LOCAL_PACK.local_test(1);
BEGIN LOCAL_PACK.local_test(1); END;
*
ERROR at line 1:
ORA-04068: existing state of packages has been discarded
ORA-04062: signature of package "YASBS.TEST_PACK" has been changed
ORA-06512: at "YASBS.LOCAL_PACK", line 4
ORA-06512: at line 1
SQL> select status,object_name,object_type from user_objects where object_name='LOCAL_PACK';
STATUS OBJECT_NAME OBJECT_TYPE
------- ------------------------------ ------------------
VALID LOCAL_PACK PACKAGE
INVALID LOCAL_PACK PACKAGE BODY
SQL> exec LOCAL_PACK.local_test(1);
PL/SQL procedure successfully completed.
SQL> select status,object_name,object_type from user_objects where object_name='LOCAL_PACK';
STATUS OBJECT_NAME OBJECT_TYPE
------- ------------------------------ ------------------
VALID LOCAL_PACK PACKAGE
VALID LOCAL_PACK PACKAGE BODY
The second execution compiles the package and returns the status to valid.
How can we know beforehand that the local program will be invalidated on the first execution? The only way I can think of is to check the dependencies across all databases involved. By collecting all rows from dba_dependencies from all databases we can see that when the remote program is changed the programs on other databases that use this remote program will be invalidated when they are executed. Then we can compile these programs and see if they compile without errors.
Database links may be very annoying sometimes, this case is just one of them.
Index block split bug in 9i
In his famous index internals presentation Richard Foote mentions a bug in 9i about index block splits when rows are inserted in the order of the index columns. Depending on when you commit your inserts the index size changes dramatically.
While I was trying to find out why a 3-column primary key index takes more space than its table I recalled that bug and it turned out that was the reason of the space issue. The related bug is 3196414 and it is fixed in 10G.
Here is the test case Richard presents in his paper.
I am trying to insert the rows in the order of the primary key column, so what I expect to see is that when an index block fills there will be a 90-10 split and the index will grow in size. But as the number of leaf block splits show there are 35 block splits and none of them are 90-10 splits meaning all are 50-50 block splits. I have 36 leaf blocks but half of each one is empty.
If we try the same inserts but commit after the loop the result changes.
In this case we see that there have been 18 block splits and all were 90-10 splits as expected. We have 19 leaf blocks and all are nearly full. Depending on where the commit is we can get an index twice the size it has to be. When I ran the same test in 10G it did not matter where the commit was. I got 19 leaf blocks in both cases.
I did not test if this problem happens when several sessions insert a single row and commit just like in an OLTP system but I think it is likely because we have indexes showing this behavior in OLTP systems.
While I was trying to find out why a 3-column primary key index takes more space than its table I recalled that bug and it turned out that was the reason of the space issue. The related bug is 3196414 and it is fixed in 10G.
Here is the test case Richard presents in his paper.
SQL> create table t(id number,value varchar2(10));
Table created.
SQL> create index t_ind on t(id);
Index created.
SQL> @mystat split
NAME VALUE
------------------------------ ----------
leaf node splits 0
leaf node 90-10 splits 0
branch node splits 0
SQL> ed
Wrote file afiedt.buf
1 begin
2 for i in 1..10000 loop
3 insert into t values(i,'test');
4 commit;
5 end loop;
6* end;
SQL> r
1 begin
2 for i in 1..10000 loop
3 insert into t values(i,'test');
4 commit;
5 end loop;
6* end;
PL/SQL procedure successfully completed.
SQL> @mystat2 split
NAME VALUE DIFF
------------------------------ ---------- ----------
leaf node splits 35 35
leaf node 90-10 splits 0 0
branch node splits 0 0
SQL> analyze index t_ind validate structure;
Index analyzed.
SQL> select lf_blks, pct_used from index_stats;
LF_BLKS PCT_USED
---------- ----------
36 51
SQL> drop table t;
Table dropped.
I am trying to insert the rows in the order of the primary key column, so what I expect to see is that when an index block fills there will be a 90-10 split and the index will grow in size. But as the number of leaf block splits show there are 35 block splits and none of them are 90-10 splits meaning all are 50-50 block splits. I have 36 leaf blocks but half of each one is empty.
If we try the same inserts but commit after the loop the result changes.
SQL> create table t(id number,value varchar2(10));
Table created.
SQL> create index t_ind on t(id);
Index created.
SQL> @mystat split
NAME VALUE
------------------------------ ----------
leaf node splits 35
leaf node 90-10 splits 0
branch node splits 0
SQL> ed
Wrote file afiedt.buf
1 begin
2 for i in 1..10000 loop
3 insert into t values(i,'test');
4 end loop;
5 commit;
6* end;
SQL> r
1 begin
2 for i in 1..10000 loop
3 insert into t values(i,'test');
4 end loop;
5 commit;
6* end;
PL/SQL procedure successfully completed.
SQL> @mystat2 split
NAME VALUE DIFF
------------------------------ ---------- ----------
leaf node splits 53 53
leaf node 90-10 splits 18 18
branch node splits 0 0
SQL> analyze index t_ind validate structure;
Index analyzed.
SQL> select lf_blks, pct_used from index_stats;
LF_BLKS PCT_USED
---------- ----------
19 94
In this case we see that there have been 18 block splits and all were 90-10 splits as expected. We have 19 leaf blocks and all are nearly full. Depending on where the commit is we can get an index twice the size it has to be. When I ran the same test in 10G it did not matter where the commit was. I got 19 leaf blocks in both cases.
I did not test if this problem happens when several sessions insert a single row and commit just like in an OLTP system but I think it is likely because we have indexes showing this behavior in OLTP systems.
OTN members, don't change your e-mail account!
I am regular user of OTN and its forums. Last week I was trying to login to OTN from a public computer and I got the "invalid login" error everytime I tried. I was sure I was typing my password correct but I could not get in anyway. So, I tried to get my password reset and sent to my e-mail address. Then I remembered that the e-mail address I used to register for OTN was from my previous employer meaning I did not have access to it anymore. As OTN does not allow changing the registration e-mail address I was stuck. I send a request from OTN to get my password delivered to my current e-mail address. Here is the reply I got:
If you are an OTN member do not lose your password and your e-mail account at the same time, you will not find anybody from OTN who is willing to solve your problem and help you to recover your password.
I am used to bad behavior and unwillingness to solve problems in Metalink, now I get the same behavior in OTN. Whatever, just wanted to let you know about it.
Resolution: Oracle's membership management system does not currently support
the editing of the email address or username in your membership profile.
(It will support this capability in a future release.)
Please create a new account with the new email address you wish to use. However,
it is possible to change the email at which you receive
Discussion Forum "watch" emails (see "Your Control Panel" when logged in).
They tell me to create a new user and forget about my history, watch list, everything. What a user centric approach this is.If you are an OTN member do not lose your password and your e-mail account at the same time, you will not find anybody from OTN who is willing to solve your problem and help you to recover your password.
I am used to bad behavior and unwillingness to solve problems in Metalink, now I get the same behavior in OTN. Whatever, just wanted to let you know about it.
Materialized view refresh change in 10G and sql tracing
Complete refresh of a single materialized view used to do a truncate and insert on the mview table until 10G. Starting with 10G the refresh does a delete and insert on the mview table. This guarantees that the table is never empty in case of an error, the refresh process became an atomic operation.
There is another difference between 9.2 and 10G in the refresh process, which I have realized when trying to find out why a DDL trigger prevented a refresh operation. In 9.2 the refresh process runs an ALTER SUMMARY statement against the mview while in 10G it does not.
We have a DDL trigger in some of the test environments (all 9.2) for preventing some operations. The trigger checks the type of the DDL and allows or disallows it based on some conditions. After this trigger was put in place some developers started to complain about some periodic mview refresh operations failing a few weeks ago. I knew it was not because of the TRUNCATE because the trigger allowed truncate operations.
So I enabled sql tracing and found out what the refresh process was doing. Here is a simplified test case for it in 10G.
The refresh was successful in 10G even after the DDL trigger was active. In 9.2 it errors out after the DDL trigger is enabled.
After enabling sql trace and running the refresh again, the trace file shows the problem statement.
The last statement in the trace file is this:
So, in 9.2 refresh process runs an ALTER SUMMARY command against the mview. After changing the trigger to allow DDL operations on SUMMARY objects I could refresh the mview without errors.
There is another difference between 9.2 and 10G in the refresh process, which I have realized when trying to find out why a DDL trigger prevented a refresh operation. In 9.2 the refresh process runs an ALTER SUMMARY statement against the mview while in 10G it does not.
We have a DDL trigger in some of the test environments (all 9.2) for preventing some operations. The trigger checks the type of the DDL and allows or disallows it based on some conditions. After this trigger was put in place some developers started to complain about some periodic mview refresh operations failing a few weeks ago. I knew it was not because of the TRUNCATE because the trigger allowed truncate operations.
So I enabled sql tracing and found out what the refresh process was doing. Here is a simplified test case for it in 10G.
YAS@10G>create table master as select * from all_objects where rownum<=5;
Table created.
YAS@10G>alter table master add primary key (owner,object_name,object_type);
Table altered.
YAS@10G>create materialized view mview as select * from master;
Materialized view created.
YAS@10G>exec dbms_mview.refresh('MVIEW');
PL/SQL procedure successfully completed.
YAS@10G>create or replace trigger ddl_trigger before ddl on schema
2 begin
3 if ora_sysevent<>'TRUNCATE' then
4 raise_application_error(-20001,'DDL NOT ALLOWED');
5 end if;
6 end;
7 /
Trigger created.
YAS@10G>exec dbms_mview.refresh('MVIEW');
PL/SQL procedure successfully completed.
The refresh was successful in 10G even after the DDL trigger was active. In 9.2 it errors out after the DDL trigger is enabled.
SQL> exec dbms_mview.refresh('MVIEW');
BEGIN dbms_mview.refresh('MVIEW'); END;
*
ERROR at line 1:
ORA-04045: errors during recompilation/revalidation of YAS.MVIEW
ORA-20001: DDL NOT ALLOWED
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 820
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 877
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 858
ORA-06512: at line 1
After enabling sql trace and running the refresh again, the trace file shows the problem statement.
SQL> alter session set sql_trace=true;
Session altered.
SQL> exec dbms_mview.refresh('MVIEW');
BEGIN dbms_mview.refresh('MVIEW'); END;
*
ERROR at line 1:
ORA-04045: errors during recompilation/revalidation of YAS.MVIEW
ORA-20001: DDL NOT ALLOWED
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 820
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 877
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 858
ORA-06512: at line 1
The last statement in the trace file is this:
ALTER SUMMARY "YAS"."MVIEW" COMPILE
So, in 9.2 refresh process runs an ALTER SUMMARY command against the mview. After changing the trigger to allow DDL operations on SUMMARY objects I could refresh the mview without errors.
Tonguc Yilmaz had a recent post about using 10046 for purposes other than performance tuning, like finding out what statements a procedure runs. I find uses for sql tracing nearly everyday and the above case is one of them. When you are not able to understand why an error happens, it is sometimes very useful to turn on sql tracing and examine the trace file. Errors related to specific bind values, errors raised from a "when others" block (you know we do not like "when others", right?) are a couple of things you can dig and analyze with sql tracing.
SQL> r
1 create or replace trigger ddl_trigger before ddl on schema
2 begin
3 if ora_sysevent<>'TRUNCATE' and ora_dict_obj_type<>'SUMMARY' then
4 raise_application_error(-20001,'DDL NOT ALLOWED');
5 end if;
6* end;
Trigger created.
SQL> exec dbms_mview.refresh('MVIEW');
PL/SQL procedure successfully completed.
Disqus
Last week I read a post in Andy C's blog about the service called Disqus. It is a service to keep track of comments in blogs. Later he made another post about it.
I have been looking for a solution to keep track of blog comments, both mine and other people's. Not all blogs have the option to subscribe to the comments, when you comment on a post you need to check later if someone commented further. I want a central repository where I can see all my comments on a post, all comments made by others on the same post and all comments made on my own blog.
I tried Cocomment before but I was not satisfied with it. So I decided to give Disqus a try and enabled it on this blog. Laurent Schneider decided to try it too.
Then Andy made another post about the concerns of some people about Disqus (one being Tim Hall). Their concerns make sense. Who owns the comments one makes, the commenter or the blog owner? Is it sensible to store the comments to your blog not in your blog database, but elsewhere? What if Disqus is not there next year, what if Disqus is inaccessible for some time? Is it possible to export the comments from Disqus and import them back to the blog?
Some of these may be irrelevant if you are using a public blogging service, like Blogger, because it means you are already storing your posts and comments somewhere else. The question "What if Blogger is not there next year?" comes to mind for example.
A solution suggested by Tim Hall is to dual post the comments. The comments will be in the blog and on Disqus also, this need the blogging service to provide a way to do it.
Another solution can be a strong export-import utility in Disqus. That way you can export the comments and put them back to the blog whenever you want. Disqus currently has an export utility but as far as I have read it is not reliable for now.
While I agree with these concerns I liked what Disqus provides. The one-stop page for all comments, the threading of comments, being able to follow other users are primary features I like. So, I will stick with it, at least for now.
I have been looking for a solution to keep track of blog comments, both mine and other people's. Not all blogs have the option to subscribe to the comments, when you comment on a post you need to check later if someone commented further. I want a central repository where I can see all my comments on a post, all comments made by others on the same post and all comments made on my own blog.
I tried Cocomment before but I was not satisfied with it. So I decided to give Disqus a try and enabled it on this blog. Laurent Schneider decided to try it too.
Then Andy made another post about the concerns of some people about Disqus (one being Tim Hall). Their concerns make sense. Who owns the comments one makes, the commenter or the blog owner? Is it sensible to store the comments to your blog not in your blog database, but elsewhere? What if Disqus is not there next year, what if Disqus is inaccessible for some time? Is it possible to export the comments from Disqus and import them back to the blog?
Some of these may be irrelevant if you are using a public blogging service, like Blogger, because it means you are already storing your posts and comments somewhere else. The question "What if Blogger is not there next year?" comes to mind for example.
A solution suggested by Tim Hall is to dual post the comments. The comments will be in the blog and on Disqus also, this need the blogging service to provide a way to do it.
Another solution can be a strong export-import utility in Disqus. That way you can export the comments and put them back to the blog whenever you want. Disqus currently has an export utility but as far as I have read it is not reliable for now.
While I agree with these concerns I liked what Disqus provides. The one-stop page for all comments, the threading of comments, being able to follow other users are primary features I like. So, I will stick with it, at least for now.