Bar Solutions
Reconnect Plugin
On the forum of PL/SQL Developer there was a message by my good friend Erik van Roon. He was having trouble with Reconnecting. Even my plugin didn’t help him. I figured I should update the plugin.
The plugin wasn’t really updated since its first version. Only some cosmetic changes, like adding new buttons.
I have decided to make use of all the features of the newer versions of PL/SQL Developer. The new version now has multiple options. First of all, the Reconnect Main option, which is similar to the way it always worked. New options are Reconnect All, which reconnects all the sessions and Reconnect Single to reconnect a single session.
If you want to download the new plugin, go to the plugin page.
Constraints
The other day a friend called me with a question on constraints. “I have a table where the combination of two columns must be unique”. My response was: “Use a Unique Constraint on these two columns”.
That should work, however…
“There is one value for the first column, where the second column doesn’t have to be unique.”
Let’s start with a simple script to play with this issue:
prompt create a simple table create table t ( v1 number , v2 number ) / prompt add a unique constraint on columns v1 and v2 alter table t add constraint uk_2cols unique (v1, v2) / prompt add some data begin   insert into t(v1, v2) values (1,1);   insert into t(v1, v2) values (1,2);   insert into t(v1, v2) values (2,1);   insert into t(v1, v2) values (2,2);   insert into t(v1, v2) values (3,1);   commit; end; /
If we add a duplicate record, this will fail:
prompt try to add a duplicate, this will fail insert into t(v1, v2) values (3,1) /
So using a Unique Constraint like this doesn’t work. There is a solution to this using a package and some triggers to work around a mutating table problem, but there should be a solution just using SQL and the features of the Oracle Database.
So we drop the constraint, it doesn’t do what we want.
prompt drop the constraint alter table t drop constraint uk_2cols /
Remember, two (or more) NULL values are considered to be Unique, so I create a virtual column in the table with the following statement:
prompt add a virtual column alter table T add v3 as (case when v1 = 3 then null else v1 || v2 end) /
Then we can add a unique constraint on the virtual column, which now contains NULL values for the values that don’t have to be unique
prompt add a unique constraint on this virtual column alter table t add constraint uk_2cols unique (v3) /
If we now try the insert again, this works.
prompt now the insert works insert into t(v1, v2) values (3,1) /
But the insert where the combination should be unique still failes.
prompt but this one still failes insert into t(v1, v2) values (1,1) /
As you can see, you can solve this issue by just using the features provided by the Oracle Database.
Log errors
Just a quick note of something I ran into at my current assignment. I have to import data from external tables into persistent tables. This proces normally is just inserting data into the table, but sometimes a file can be sent in more than once.
Originally I thought I would add the /*+ APPEND */ hint to the statement to speed things up. This works flawlessly as long as you are just inserting new records. As soon as you try to insert a record which already exists you can hit a unique constraint (or some other constraint) (if defined). So my next idea was to use the LOG ERRORS construction. As soon as you hit an error for a specific record, it will record it in a separate table and continue with the next record.
But…This doesn’t work when combined. As soon as you add the /*+ APPEND */ hint, the LOG ERRORS line seems to be ignored.
Recursive Subquery
At my current assignment we are processing files coming from an online system to be inserted into our database (kind of a data warehouse). This is done using external tables and a scheduled job. The job just checks if there is a file available and will process this. The trouble is that the files might not make it to our database, for various reasons.
I want to be able to check if all the files have been processed and no file has gone missing. After processing a file, its name gets logged into a table so I can check this at a later time. Of course, I don’t want to eyeball the list to see if all the files have been processed.
I have been playing around with some queries to find out the gaps. A couple of things to know in advance.
The files are named using a timestamp (‘YYYYMMDDHH24MI’)
The files are always 5 minutes apart from each other.
Let’s first create a test table:
rem create the testdata table create table testdata ( filename varchar2(4000) ) /
Then I create some testdata. Using the DBMS_RANDOM.VALUE function I determine which records should be inserted.
rem fill the testdata table begin for d in 10 .. 15 loop -- create 5 days of test data for hr in 0 .. 23 loop -- for every hour of the day for mn in 0 .. 59 loop -- for every second of the hour if mod(mn, 5) = 0 then -- only if it's a 5 minute value if trunc(dbms_random.value * 10) <> 4 then -- and our randomizer doesn't end up being 4 (create random gaps) insert into testdata (filename) values ('201907' || trim(to_char(d, '09')) || trim(to_char(hr, '09')) || trim(to_char(mn, '09')) || '.tst'); end if; end if; end loop; end loop; end loop; commit; end; /
My first attempt at this query was to find all the gaps between to filenames which weren’t 5 minutes apart. So I started creating the query. Using subquery factoring I can show the steps I took.
Since the table consists of filenames and not dates (or timestamps) I needed to get the date portion of the file first:
select to_timestamp(substr(filename, 1, 12), 'YYYYMMDDHH24MISS') filedate from testdata order by 1
Using this set of data I can create the next step in the query, that is determining the preceding and the following date for each date and while we’re at it, determine the gap before and after:
with filenames as (select to_timestamp(substr(filename, 1, 12), 'YYYYMMDDHH24MISS') filedate from testdata order by 1) select lag(filedate) over(order by filedate) previousfiledate ,filedate ,lead(filedate) over(order by filedate) nextfiledate ,(filedate - lag(filedate) over(order by filedate)) gapbefore ,(lead(filedate) over(order by filedate) - filedate) gapafter from filenames
Wrapping this set of data into yet another factored subquery, I can remove all the rows that have a gap of exactly 5 minutes.
with filenames as (select to_timestamp(substr(filename, 1, 12), 'YYYYMMDDHH24MISS') filedate from testdata order by 1), gaps as (select lag(filedate) over(order by filedate) previousfiledate ,filedate ,lead(filedate) over(order by filedate) nextfiledate ,(filedate - lag(filedate) over(order by filedate)) gapbefore ,(lead(filedate) over(order by filedate) - filedate) gapafter from filenames) select * from gaps where 1 = 1 and (gapafter <> to_dsinterval('0 00:05:00')) or (gapbefore <> to_dsinterval('0 00:05:00'))
After I got the results for this query it made me wonder: This query shows me where the gaps in the data are, but it doesn’t tell me exactly which file or files are missing. I still have to figure that out myself. It also shows every gap twice, once after one file and once before the next file. There has got to be a better way to find and fill up the gaps. What if I could just generate all the filenames that should be there and then subtract the filenames that have been recorded.
First I need to create a list of all the possible filenames that exist between the first and last recorded filename. I know about a feature called recursive subquery factoring, but I never used it before. Luckily Tim Hall has created a nice post on this subject.
with /* * First determine all the possible dates between the first and the last recorded file * Using recursive subquery * Thanks to Tim Hall for https://oracle-base.com/articles/11g/recursive-subquery-factoring-11gr2 */ possibledates(thedate) as ( -- Anchor member select min(to_timestamp(substr(filename, 1, 12), 'YYYYMMDDHH24MI')) thedate from testdata union all -- Recursive member select thedate + to_dsinterval('0 00:05:00') thedate from possibledates where thedate < (select max(to_timestamp(substr(filename, 1, 12), 'YYYYMMDDHH24MI')) from testdata)), /* * Then determing the filenames from these dates */ possiblefilenames as (select to_char(thedate, 'YYYYMMDDHH24MI') || '.tst' filename from possibledates) /* * Using a simple minus operation, determine which filenames are missing */ select filename from possiblefilenames minus select filename from testdata
Not only is this query very fast, it also takes away the problem for me to eye-ball the data to find the missing files. It just shows which files are actually missing.
ACE Director
It has now been a week since I have been promoted to ACE Director. I want to thank everybody for their wishes. Directly after being promoted I was off to Dublin for the UKOUG Ireland conference. That is why I didn’t respond to every tweet individually. Please don’t think I didn’t like the tweets or wishes on Linked-In, but I just didn’t have the time and the opportunity to respond.
I especially want to thank Jim Czuprynski and Alex Nuijten for nominating me. And Bryn Llewellyn and Maria Colgan for supporting me from the Oracle side.
I also want to thank my employer (Qualogy) for granting me the opportunity to be very active in the community by sending me to all the usergroups all over the world.
And of course a big thanks to the ACE Program for accepting me.
I hope I will be able to maintain this title for a long, long time.
Polymorphic Table Functions – Part 2
In my previous post I wrote about a possible use-case for Polymorphic Table Functions. I also wrote that I had some ideas to implement extra functionality to make the function more generic.
After attending DOAG Konferentz und ausstellung 2018 and UKOUG Tech18 I finally came around to publishing my new version.
Lets start off with a table to hold my data, I chose to use the NATO phonetic alphabet for this:
create table t_alphabet (thecount number ,alphabet varchar2(4000) ) / begin insert into t_alphabet(alphabet, thecount) values ('Alfa;Bravo;Charlie;Delta;Echo;Foxtrot;Golf',7); insert into t_alphabet(alphabet, thecount) values ('Hotel;India;Juliett;Kilo;Lima;Mike;November',7); insert into t_alphabet(alphabet, thecount) values ('Oscar;Papa;Quebec;Romeo;Sierra;Tango;Uniform',7); insert into t_alphabet(alphabet, thecount) values ('Victor;Whiskey;;X-ray;Yankee;;Zulu',5); insert into t_alphabet(alphabet, thecount) values ('Alfa.Bravo.Charlie.Delta.Echo.Foxtrot.Golf',7); insert into t_alphabet(alphabet, thecount) values ('Hotel.India.Juliett.Kilo.Lima.Mike.November',7); insert into t_alphabet(alphabet, thecount) values ('Oscar.Papa.Quebec.Romeo.Sierra.Tango.Uniform',7); insert into t_alphabet(alphabet, thecount) values ('Victor.Whiskey..X-ray.Yankee..Zulu',5); commit; end; /
First of all, I learned from attending the presentations by attending presentations by Andrej Pashchenko: Polymorphic Table Functions in 18c: Einführung und Beispiele and Keith Laker: Patterns and Use Cases For Polymorphic Tables that there is no need for global (package) variables to have access to the parameters supplied. The more I can rely on Oracle to take care of the value of variables, the better I like it.
I won’t bore you with all the intermediate versions of the code, lets jump straight into the ‘final’ result.
The package:
create or replace package separated_ptf is function describe(tab in out dbms_tf.table_t ,cols in dbms_tf.columns_t default null ,coltosplit in varchar2 default null ,separator in varchar2 default ';') return dbms_tf.describe_t; procedure fetch_rows(coltosplit in varchar2 default null ,separator in varchar2 default ';'); end separated_ptf; /
The package body:
create or replace package body separated_ptf as function describe(tab in out dbms_tf.table_t ,cols in dbms_tf.columns_t default null ,coltosplit in varchar2 default null ,separator in varchar2 default ';') return dbms_tf.describe_t as -- metadata for column to add l_new_col dbms_tf.column_metadata_t; -- table of columns to add l_new_cols dbms_tf.columns_new_t; -- make sure the column to split is in the correct format (uppercase with doublequotes) l_coltosplit dbms_quoted_id := dbms_assert.enquote_name(str => coltosplit, capitalize => true); begin -- if the coltosplit parameter is null then if coltosplit is null then -- Mark the first column ReadOnly and don't display it anymore tab.column(1).for_read := true; tab.column(1).pass_through := false; else -- if the coltosplit parameter is not null then -- check every column from the source table for indx in tab.column.first .. tab.column.last loop -- if this is the column we want to split then if tab.column(indx).description.name = l_coltosplit then -- Mark this column ReadOnly and don't display it anymore tab.column(indx).for_read := true; tab.column(indx).pass_through := false; end if; end loop; end if; -- Add the new columns, as specified in the cols parameter for indx in 1 .. cols.count loop -- define metadata for column named cols(indx) -- that will default to a datatype of varchar2 with -- a length of 4000 l_new_col := dbms_tf.column_metadata_t(name => cols(indx)); -- add the new column to the list of columns new columns l_new_cols(l_new_cols.count + 1) := l_new_col; end loop; -- Instead of returning NULL we will RETURN a specific -- DESCRIBE_T that adds new columns return dbms_tf.describe_t(new_columns => l_new_cols); end; procedure fetch_rows(coltosplit in varchar2 default null ,separator in varchar2 default ';') is -- define a table type of varchar2 tables type colset is table of dbms_tf.tab_varchar2_t index by pls_integer; -- variable to hold the rowset as retrieved l_rowset dbms_tf.row_set_t; -- variable to hold the number of rows as retrieved l_rowcount pls_integer; -- variable to hold the number of put columns l_putcolcount pls_integer := dbms_tf.get_env().put_columns.count; -- variable to hold the new values l_newcolset colset; -- get the name of the column to be split from the get columns l_coltosplit dbms_quoted_id := trim('"' from dbms_tf.get_env().get_columns(1).name); begin -- dbms_tf.Trace(dbms_tf.Get_Env); -- fetch rows into a local rowset -- at this point the rows will have columns -- from the the table/view/query passed in dbms_tf.get_row_set(l_rowset, l_rowcount); -- for every row in the rowset... for rowindx in 1 .. l_rowcount loop -- for every column for colindx in 1 .. l_putcolcount loop -- split the row into separate values -- FUNCTION Row_To_Char(rowset Row_Set_t, -- rid PLS_INTEGER, -- format PLS_INTEGER default FORMAT_JSON) -- return VARCHAR2; -- splitting the regexp way: http://nuijten.blogspot.com/2009/07/splitting-comma-delimited-string-regexp.html l_newcolset(colindx)(rowindx) := trim(separator from regexp_substr(json_value(dbms_tf.row_to_char(l_rowset, rowindx), '$.' || l_coltosplit) ,'[^' || separator || ']*' || separator || '{0,1}' ,1 ,colindx)); end loop; -- every column end loop; -- every row in the rowset -- add the newly populated columns to the rowset for indx in 1 .. l_putcolcount loop dbms_tf.put_col(columnid => indx, collection => l_newcolset(indx)); end loop; end; end separated_ptf; /
The wrapper function:
create or replace function separated_fnc(p_tbl in table ,cols columns default null ,coltosplit in varchar2 default null ,separator in varchar2 default ';') return table pipelined row polymorphic using separated_ptf; /
Back to the improvements I suggested earlier.
Supporting duplicate separators:
Using the Regular Expression ‘[^;]+’ didn’t make this possible, because double ; (;;) would be regarded as one. So this had to be changed into ‘[^;]+;{0,1}’. This expression says (in my words): find all the characters which are not a ; followed by 0 or 1 ;. Since this will result in a string with a ; at the end I had to add the trim function around it.
Making the column to be split up a parameter
To find the column to be split I need to make the value look the same as the tab.column(indx).description.name value I can of course add quotes around the parameter myself, but I think it is better to use the built in sys.dbms_assert.enquote_name function to do this. If the value is not supplied or null I just (try to) split the first column.
Making the separator character a parameter
This seemed like an easy task, just replace every ; in my code by a variable, but when running a couple of tests I received an error which put me on the wrong path.
Calling the function using positioned parameters works like a charm, just as you would expect it:
select *
from separated_fnc(t_alphabet, columns(first, second, third, fourth, fifth, sixth, seventh),’alphabet’,’.’)
/
THECOUNT FIRST SECOND THIRD FOURTH FIFTH SIXTH SEVENTH ---------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- -------------------- 7 Alfa;Bravo;Charlie;D elta;Echo;Foxtrot;Go lf 7 Hotel;India;Juliett; Kilo;Lima;Mike;Novem ber 7 Oscar;Papa;Quebec;Ro meo;Sierra;Tango;Uni form 5 Victor;Whiskey;X-ray ;Yankee;Zulu 7 Alfa Bravo Charlie Delta Echo Foxtrot Golf 7 Hotel India Juliett Kilo Lima Mike November 7 Oscar Papa Quebec Romeo Sierra Tango Uniform 5 Victor Whiskey X-ray Yankee Zulu 8 rows selected
But when I tried to use named parameters like this:
select * from separated_fnc(t_alphabet, columns(first, second, third, fourth, fifth, sixth, seventh),coltosplit => 'alphabet',separator => '.') /
it resulted in the following error:
ORA-62573: new column (FIRST) is not allowed with describe only polymorphic table function
After the DOAG conference I looked at this together with Chris Saxon, but we couldn’t find what is going on. So we contacted Keith Laker and he told me that it was bug in the version (18.3.0.0.0) of the database I am using and that it should be fixed in an upcoming (patch) release.
If you know about this behavior, I think it is quite a useful function (and with the upcoming fixes it will become even more useful).
If you have any suggestions or maybe ideas for other use cases for Polymorphic Table Functions, please don’t hesitate to use the comments.
Polymorphic Table Functions
I have been working on a presentation on Polymorphic Table Functions. During this time I was looking for a real use case for Polymorphic Table Functions. I came up with an example which is not very useful in real life, but very useful to explain the technique.
At my current job I came across a piece of code that I had to copy and adjust to fit the needs for that specific case. The idea was always the same, I get a table with semi-colon separated values in one column that have to be split into the correct number of columns before checking the data to the current data in a specific table.
I thought: ‘Maybe I can solve this copy-paste-adjust process by using a Polymorphic Table Function.’
Let’s first set the current scene.
We have two tables. The well known EMP and DEPT tables.
create table emp (empno number(4) not null ,ename varchar2(10) ,job varchar2(9) ,mgr number(4) ,hiredate date ,sal number(7, 2) ,comm number(7, 2) ,deptno number(2) ) / create table dept (deptno number(2) ,dname varchar2(14) ,loc varchar2(14) ) /
And we add the well known data:
insert into emp values (7369, 'SMITH', 'CLERK', 7902, to_date('17-DEC-1980', 'DD-MON-YYYY'), 800, null, 20); insert into emp values (7499, 'ALLEN', 'SALESMAN', 7698, to_date('20-FEB-1981', 'DD-MON-YYYY'), 1600, 300, 30); insert into emp values (7521, 'WARD', 'SALESMAN', 7698, to_date('22-FEB-1981', 'DD-MON-YYYY'), 1250, 500, 30); insert into emp values (7566, 'JONES', 'MANAGER', 7839, to_date('2-APR-1981', 'DD-MON-YYYY'), 2975, null, 20); insert into emp values (7654, 'MARTIN', 'SALESMAN', 7698, to_date('28-SEP-1981', 'DD-MON-YYYY'), 1250, 1400, 30); insert into emp values (7698, 'BLAKE', 'MANAGER', 7839, to_date('1-MAY-1981', 'DD-MON-YYYY'), 2850, null, 30); insert into emp values (7782, 'CLARK', 'MANAGER', 7839, to_date('9-JUN-1981', 'DD-MON-YYYY'), 2450, null, 10); insert into emp values (7788, 'SCOTT', 'ANALYST', 7566, to_date('09-DEC-1982', 'DD-MON-YYYY'), 3000, null, 20); insert into emp values (7839, 'KING', 'PRESIDENT', null, to_date('17-NOV-1981', 'DD-MON-YYYY'), 5000, null, 10); insert into emp values (7844, 'TURNER', 'SALESMAN', 7698, to_date('8-SEP-1981', 'DD-MON-YYYY'), 1500, 0, 30); insert into emp values (7876, 'ADAMS', 'CLERK', 7788, to_date('12-JAN-1983', 'DD-MON-YYYY'), 1100, null, 20); insert into emp values (7900, 'JAMES', 'CLERK', 7698, to_date('3-DEC-1981', 'DD-MON-YYYY'), 950, null, 30); insert into emp values (7902, 'FORD', 'ANALYST', 7566, to_date('3-DEC-1981', 'DD-MON-YYYY'), 3000, null, 20); insert into emp values (7934, 'MILLER', 'CLERK', 7782, to_date('23-JAN-1982', 'DD-MON-YYYY'), 1300, null, 10); insert into dept values (10, 'ACCOUNTING', 'NEW YORK'); insert into dept values (20, 'RESEARCH', 'DALLAS'); insert into dept values (30, 'SALES', 'CHICAGO'); insert into dept values (40, 'OPERATIONS', 'BOSTON');
We get data from a different system, which is in a semi-colon separated format. So we load that in a couple of staging tables:
create table empstg (line varchar2(4000) ) / create table deptstg (line varchar2(4000) ) /
And then we add some data:
insert into empstg values ('7369;SMITH;CLERK;7902;17121980;800; ;20'); insert into empstg values ('7499;ALLEN;SALESMAN;7698;20021981;1600; 300;30'); insert into empstg values ('7521;WARD;SALESMAN;7698;22021981;1250; 500;30'); insert into empstg values ('7566;JONES;MANAGER;7839;02041981; 2975; ;20'); insert into empstg values ('7654;MARTIN;SALESMAN;7698;28091981;1250;1400;30'); insert into empstg values ('7698;BLAKE;MANAGER;7839;01051981; 2850; ;30'); insert into empstg values ('7782;CLARK;MANAGER;7839;09061981; 2450; ;10'); insert into empstg values ('7788;SCOTT;ANALYST;7566;09121982;3000; ;20'); insert into empstg values ('7839;KING;PRESIDENT; ;17111981;5000; ;10'); insert into empstg values ('7844;TURNER;SALESMAN;7698;08091981; 1500;0;30'); insert into empstg values ('7876;ADAMS;CLERK;7788;12011983;1100; ;20'); insert into empstg values ('7900;JAMES;CLERK;7698;03121981; 950; ;30'); insert into empstg values ('7902;FORD;ANALYST;7566;03121981; 3000; ;20'); insert into empstg values ('7934;MILLER;CLERK;7782;23011982;1300; ;10'); insert into empstg values ('2912;BAREL;DEVELOPER;7839;29122017;4000; ;50'); insert into deptstg values ('10;ACCOUNTING;NEW YORK'); insert into deptstg values ('20;RESEARCH;DALLAS'); insert into deptstg values ('30;SALES;NORTH CAROLINA'); insert into deptstg values ('40;OPERATIONS;BOSTON'); insert into deptstg values ('50;DEVELOPMENT;SAN FRANCISCO');
To process the data and merge it into the main tables we use a package. We could have used a merge statement, but this implies all the rows that are the same will get an update anyway, which results in a lot of journal-ling data which is done by triggers. Using the EMP and DEPT tables this wouldn’t be too much of a problem, but we are talking 250k+ rows each time (at least once a day).
So we want a little more control and only insert/update when it’s really necessary.
create or replace package process_stg is procedure dept; procedure emp; end process_stg; / create or replace package body process_stg is failure_in_forall exception; pragma exception_init(failure_in_forall, -24381); c_limit constant number := 10; procedure dept is cursor c_inserts is with import as (select trim(regexp_substr(line, '[^;]+', 1, 1)) deptno ,trim(regexp_substr(line, '[^;]+', 1, 2)) dname ,trim(regexp_substr(line, '[^;]+', 1, 3)) loc from deptstg stg) select i.deptno ,i.dname ,i.loc from import i left outer join dept d on (d.deptno = i.deptno) where 1 = 1 and d.deptno is null; cursor c_updates is with import as (select trim(regexp_substr(line, '[^;]+', 1, 1)) deptno ,trim(regexp_substr(line, '[^;]+', 1, 2)) dname ,trim(regexp_substr(line, '[^;]+', 1, 3)) loc from deptstg stg) select i.deptno ,i.dname ,i.loc from import i join dept d on (d.deptno = i.deptno) where 1 = 1 and ( coalesce(d.dname,'-NULL') <> coalesce(i.dname,'-NULL-') or coalesce(d.loc, '-NULL') <> coalesce(i.loc, '-NULL-') ); type data_t is table of c_inserts%rowtype index by pls_integer; l_data data_t; begin open c_inserts; loop fetch c_inserts bulk collect into l_data limit c_limit; if l_data.count > 0 then begin forall indx in l_data.first .. l_data.last save exceptions insert into dept(deptno, dname, loc) values (l_data(indx).deptno, l_data(indx).dname, l_data(indx).loc); exception when failure_in_forall then null; end; end if; exit when l_data.count < c_limit; end loop; close c_inserts; -- open c_updates; loop fetch c_updates bulk collect into l_data limit c_limit; if l_data.count > 0 then begin forall indx in l_data.first .. l_data.last save exceptions update dept set dname = l_data(indx).dname , loc = l_data(indx).loc where 1=1 and deptno = l_data(indx).deptno; exception when failure_in_forall then null; end; end if; exit when l_data.count < c_limit; end loop; close c_updates; end dept; procedure emp is cursor c_inserts is with import as (select trim(regexp_substr(line, '[^;]+', 1, 1)) empno ,trim(regexp_substr(line, '[^;]+', 1, 2)) ename ,trim(regexp_substr(line, '[^;]+', 1, 3)) job ,trim(regexp_substr(line, '[^;]+', 1, 4)) mgr ,trim(regexp_substr(line, '[^;]+', 1, 5)) hiredate ,trim(regexp_substr(line, '[^;]+', 1, 6)) sal ,trim(regexp_substr(line, '[^;]+', 1, 7)) comm ,trim(regexp_substr(line, '[^;]+', 1, 8)) deptno from empstg stg) select i.empno ,i.ename ,i.job ,i.mgr ,i.hiredate ,i.sal ,i.comm ,i.deptno from import i left outer join emp e on (e.empno = i.empno) where 1 = 1 and e.empno is null; cursor c_updates is with import as (select trim(regexp_substr(line, '[^;]+', 1, 1)) empno ,trim(regexp_substr(line, '[^;]+', 1, 2)) ename ,trim(regexp_substr(line, '[^;]+', 1, 3)) job ,trim(regexp_substr(line, '[^;]+', 1, 4)) mgr ,trim(regexp_substr(line, '[^;]+', 1, 5)) hiredate ,trim(regexp_substr(line, '[^;]+', 1, 6)) sal ,trim(regexp_substr(line, '[^;]+', 1, 7)) comm ,trim(regexp_substr(line, '[^;]+', 1, 8)) deptno from empstg stg) select i.empno ,i.ename ,i.job ,i.mgr ,i.hiredate ,i.sal ,i.comm ,i.deptno from import i left outer join emp e on (e.empno = i.empno) where 1 = 1 and ( coalesce(e.ename, '-NULL') <> coalesce(i.ename, '-NULL') or coalesce(e.job, '-NULL') <> coalesce(i.job, '-NULL') or e.mgr <> i.mgr or coalesce(to_char(e.hiredate,'DDMMYYYY'), '-NULL') <> coalesce(i.hiredate, '-NULL') or e.sal <> i.sal or e.comm <> i.comm or e.deptno <> i.deptno ); type data_t is table of c_inserts%rowtype index by pls_integer; l_data data_t; begin open c_inserts; loop fetch c_inserts bulk collect into l_data limit c_limit; if l_data.count > 0 then begin forall indx in l_data.first .. l_data.last save exceptions insert into emp(empno, ename, job, mgr, hiredate, sal, comm, deptno) values (l_data(indx).empno, l_data(indx).ename, l_data(indx).job, l_data(indx).mgr, to_date(l_data(indx).hiredate, 'DDMMYYYY'), l_data(indx).sal, trim(l_data(indx).comm), l_data(indx).deptno); exception when failure_in_forall then null; end; end if; exit when l_data.count < c_limit; end loop; close c_inserts; -- open c_updates; loop fetch c_updates bulk collect into l_data limit c_limit; if l_data.count > 0 then begin forall indx in l_data.first .. l_data.last save exceptions update emp set ename = l_data(indx).ename , job = l_data(indx).job , mgr = l_data(indx).mgr , hiredate = to_date(l_data(indx).hiredate, 'DDMMYYYY') , sal = l_data(indx).sal , comm = l_data(indx).comm , deptno = l_data(indx).deptno where 1=1 and empno = l_data(indx).empno; exception when failure_in_forall then null; end; end if; exit when l_data.count < c_limit; end loop; close c_updates; end emp; end process_stg; /
As you can see, the code, especially for the cursors, is pretty much the same. Only difference is the number of columns that are generated from the semi-colon separated line.
I really don’t like to do the same thing over and over again, especially when the only difference is the number of columns and their names. But since this is what changes between the tables I think there is no way of making this generic in 12c or earlier. But then 18c came into play and they provide us with Polymorphic Table Functions.
This is what the documentation says (summary):
Polymorphic Table Functions
Polymorphic Table Functions (PTF) are user-defined functions that can be invoked in the FROM clause.
They are capable of processing tables whose row type is not declared at definition time and producing a
result table whose row type may or may not be declared at definition time. Polymorphic Table Functions
allow application developers to leverage the long-defined dynamic SQL capabilities to create powerful
and complex custom functions.
In my own words: Call a function, supplying a table and get a set of columns back. You can supply the names (and number) of columns as a parameter. Also, these columns don’t have to exist in the table, you can create them on the fly. That is exactly what I need. I have different tables with pretty much the same layout but the results I need are completely different.
So I came up with the following Polymorphic Table Function to do what I want. First there is the specification of the package. What I need is the DESCRIBE function (which is mandatory) and a procedure to fetch the rows, where I can alter the results.
create or replace package separated_ptf is function describe(tab in out dbms_tf.table_t ,cols in dbms_tf.columns_t default null) return dbms_tf.describe_t; procedure fetch_rows; end separated_ptf; /
Then there is the implementation of the package:
create or replace package body separated_ptf as g_colcount pls_integer; -- save the number of columns requested g_colname varchar2(128); -- save the name of the first column function describe(tab in out dbms_tf.table_t ,cols in dbms_tf.columns_t default null) return dbms_tf.describe_t as -- metadata for column to add l_new_col dbms_tf.column_metadata_t; -- table of columns to add l_new_cols dbms_tf.columns_new_t; -- := DBMS_TF.COLUMNS_NEW_T(); begin -- Mark the first column ReadOnly and don't display it anymore tab.column(1).for_read := true; tab.column(1).pass_through := false; -- Save the name of the first column for use in the fetch_rows procedure g_colname := tab.column(1).description.name; -- Save the number of columns for use in the fetch_rows procedure g_colcount := cols.count; -- Add the new columns, as specified in the cols parameter for indx in 1 .. cols.count loop -- define metadata for column named cols(indx) -- that will default to a datatype of varchar2 with -- a length of 4000 l_new_col := dbms_tf.column_metadata_t(name => cols(indx)); -- add the new column to the list of columns new columns l_new_cols(l_new_cols.count + 1) := l_new_col; end loop; -- Instead of returning NULL we will RETURN a specific -- DESCRIBE_T that adds new columns return dbms_tf.describe_t(new_columns => l_new_cols); end; procedure fetch_rows is -- define a table type of varchar2 tables type colset is table of dbms_tf.tab_varchar2_t index by pls_integer; -- variable to hold the rowset as retrieved l_rowset dbms_tf.row_set_t; -- variable to hold the number of rows as retrieved l_rowcount pls_integer; -- variable to hold the new values l_newcolset colset; begin -- fetch rows into a local rowset -- at this point the rows will have columns -- from the the table/view/query passed in dbms_tf.get_row_set(l_rowset, l_rowcount); -- for every row in the rowset... for rowindx in 1 .. l_rowcount loop -- for every column for colindx in 1 .. g_colcount loop -- split the row into separate values -- splitting the regexp way: http://nuijten.blogspot.com/2009/07/splitting-comma-delimited-string-regexp.html l_newcolset(colindx)(rowindx) := trim(regexp_substr(json_value(dbms_tf.row_to_char(l_rowset, rowindx), '$.' || g_colname) ,'[^;]+' ,1 ,colindx)); end loop; -- every column end loop; -- every row in the rowset -- add the newly populated columns to the rowset for indx in 1 .. g_colcount loop dbms_tf.put_col(columnid => indx, collection => l_newcolset(indx)); end loop; end; end separated_ptf; /
After creating this Polymorphic Table Function we need an interface to use it in a SQL statement:
-- create a 'wrapper' function for the polymorphic table function CREATE OR REPLACE FUNCTION separated_fnc(p_tbl IN TABLE, cols columns DEFAULT NULL) RETURN TABLE PIPELINED ROW POLYMORPHIC USING separated_ptf; /
Now, with this wrapper function in place we can start using it:
select * from separated_fnc(deptstg, columns(deptno, dname, loc)) /
Which is a lot easier than:
select trim(regexp_substr(line, '[^;]+', 1, 1)) deptno ,trim(regexp_substr(line, '[^;]+', 1, 2)) dname ,trim(regexp_substr(line, '[^;]+', 1, 3)) loc from deptstg stg /
And similarly we can access the same code to retrieve data from the other table:
select * from separated_fnc(empstg, columns(empno, ename, job, mgr, hiredate, sal, comm, deptno)) /
That looks pretty much the same as the other one, but is definitely a lot simpler than
select trim(regexp_substr(line, '[^;]+', 1, 1)) empno ,trim(regexp_substr(line, '[^;]+', 1, 2)) ename ,trim(regexp_substr(line, '[^;]+', 1, 3)) job ,trim(regexp_substr(line, '[^;]+', 1, 4)) mgr ,trim(regexp_substr(line, '[^;]+', 1, 5)) hiredate ,trim(regexp_substr(line, '[^;]+', 1, 6)) sal ,trim(regexp_substr(line, '[^;]+', 1, 7)) comm ,trim(regexp_substr(line, '[^;]+', 1, 8)) deptno from empstg stg /
The new implementation of my package is now like this:
create or replace package body process_stg is failure_in_forall exception; pragma exception_init(failure_in_forall, -24381); c_limit constant number := 10; procedure dept is cursor c_inserts is with import as (select * from separated_fnc(deptstg, columns(deptno, dname, loc)) ) select i.deptno ,i.dname ,i.loc from import i left outer join dept d on (d.deptno = i.deptno) where 1 = 1 and d.deptno is null; cursor c_updates is with import as (select * from separated_fnc(deptstg, columns(deptno, dname, loc)) ) select i.deptno ,i.dname ,i.loc from import i join dept d on (d.deptno = i.deptno) where 1 = 1 and ( coalesce(d.dname,'-NULL') <> coalesce(i.dname,'-NULL-') or coalesce(d.loc, '-NULL') <> coalesce(i.loc, '-NULL-') ); type data_t is table of c_inserts%rowtype index by pls_integer; l_data data_t; begin open c_inserts; loop fetch c_inserts bulk collect into l_data limit c_limit; if l_data.count > 0 then begin forall indx in l_data.first .. l_data.last save exceptions insert into dept(deptno, dname, loc) values (l_data(indx).deptno, l_data(indx).dname, l_data(indx).loc); exception when failure_in_forall then null; end; end if; exit when l_data.count < c_limit; end loop; close c_inserts; -- open c_updates; loop fetch c_updates bulk collect into l_data limit c_limit; if l_data.count > 0 then begin forall indx in l_data.first .. l_data.last save exceptions update dept set dname = l_data(indx).dname , loc = l_data(indx).loc where 1=1 and deptno = l_data(indx).deptno; exception when failure_in_forall then null; end; end if; exit when l_data.count < c_limit; end loop; close c_updates; end dept; procedure emp is cursor c_inserts is with import as (select * from separated_fnc(empstg, columns(empno,ename,job,mgr,hiredate,sal,comm,deptno)) ) select i.empno ,i.ename ,i.job ,i.mgr ,i.hiredate ,i.sal ,i.comm ,i.deptno from import i left outer join emp e on (e.empno = i.empno) where 1 = 1 and e.empno is null; cursor c_updates is with import as (select * from separated_fnc(empstg, columns(empno,ename,job,mgr,hiredate,sal,comm,deptno)) ) select i.empno ,i.ename ,i.job ,i.mgr ,i.hiredate ,i.sal ,i.comm ,i.deptno from import i left outer join emp e on (e.empno = i.empno) where 1 = 1 and ( coalesce(e.ename, '-NULL') <> coalesce(i.ename, '-NULL') or coalesce(e.job, '-NULL') <> coalesce(i.job, '-NULL') or e.mgr <> i.mgr or coalesce(to_char(e.hiredate,'DDMMYYYY'), '-NULL') <> coalesce(i.hiredate, '-NULL') or e.sal <> i.sal or e.comm <> i.comm or e.deptno <> i.deptno ); type data_t is table of c_inserts%rowtype index by pls_integer; l_data data_t; begin open c_inserts; loop fetch c_inserts bulk collect into l_data limit c_limit; if l_data.count > 0 then begin forall indx in l_data.first .. l_data.last save exceptions insert into emp(empno, ename, job, mgr, hiredate, sal, comm, deptno) values (l_data(indx).empno, l_data(indx).ename, l_data(indx).job, l_data(indx).mgr, to_date(l_data(indx).hiredate, 'DDMMYYYY'), l_data(indx).sal, trim(l_data(indx).comm), l_data(indx).deptno); exception when failure_in_forall then dbms_output.put_line(q'[error]'); dbms_output.put_line(sqlcode); dbms_output.put_line(sqlerrm); for indx in 1 .. sql%bulk_exceptions.count loop dbms_output.put_line('Error ' || indx || ' occurred on index ' || sql%bulk_exceptions(indx).error_index); dbms_output.put_line('Oracle error is ' || sqlerrm(-1 * sql%bulk_exceptions(indx).error_code)); end loop; null; end; end if; exit when l_data.count < c_limit; end loop; close c_inserts; -- open c_updates; loop fetch c_updates bulk collect into l_data limit c_limit; if l_data.count > 0 then begin forall indx in l_data.first .. l_data.last save exceptions update emp set ename = l_data(indx).ename , job = l_data(indx).job , mgr = l_data(indx).mgr , hiredate = to_date(l_data(indx).hiredate, 'DDMMYYYY') , sal = l_data(indx).sal , comm = l_data(indx).comm , deptno = l_data(indx).deptno where 1=1 and empno = l_data(indx).empno; exception when failure_in_forall then null; end; end if; exit when l_data.count < c_limit; end loop; close c_updates; end emp; end process_stg; /
There is absolutely some improvement possible to the current implementation, like supporting duplicate separators, making the column to be split up a parameter, making the separator character a parameter as well, but that is a nice project for a later time.
I hope it all makes a bit of sense. If you have any improvements, don’t hesitate to comment.
Comparing queries…
How do you compare a rewritten query to its original version? Most of the time I just run a MINUS operation on the original and new query. Actually I execute two. Old query (A) MINUS New query (B) and vice versa (B) MINUS (A). Both should result in no rows. That way I thought I had proven that the resultsets for both queries are equal.
But there is a flaw in this assumption.
What if there are duplicate rows in one of the resultsets? The MINUS operator removes a row the resultset if it exists in both collections. But if one of the collections has the row twice and the other collection has it one, then it is completely removed.
Let’s say we have two queries:
-- first query select 'alpha' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'charlie' nato_spelling from dual
NATO_SPELLING ------------- alpha bravo bravo charlie
and
-- second query select 'alpha' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'charlie' nato_spelling from dual
NATO_SPELLING ------------- alpha bravo charlie
As you can see, by just eye-balling the queries the resultsets are different.
But when you execute the minus operator on the queries you’ll get the impression the resultsets are the same:
select * from ((-- first query select 'alpha' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'charlie' nato_spelling from dual ) MINUS (-- second query select 'alpha' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'charlie' nato_spelling from dual ) )
NATO_SPELLING ----------------------------
select * from ((-- second query select 'alpha' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'charlie' nato_spelling from dual ) MINUS (-- first query select 'alpha' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'charlie' nato_spelling from dual ) )
NATO_SPELLING ----------------------------
It gets worse when you compare queries that include a UNION or a UNION ALL operator:
select * from ( (-- first query select 'alpha' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'charlie' nato_spelling from dual ) UNION (-- third query select 'x-ray' nato_spelling from dual union all select 'yankee' nato_spelling from dual union all select 'zulu' nato_spelling from dual ) )
NATO_SPELLING ------------- alpha bravo charlie x-ray yankee zulu 6 rows selected
and
select * from ( (-- first query select 'alpha' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'charlie' nato_spelling from dual ) UNION ALL (-- third query select 'x-ray' nato_spelling from dual union all select 'yankee' nato_spelling from dual union all select 'zulu' nato_spelling from dual ) )
NATO_SPELLING ------------- alpha bravo bravo charlie x-ray yankee zulu 7 rows selected
Clearly the results are different. This is because of how the UNION operator works. It remove duplicates from the resultset. But, when I check it using the MINUS operator (both ways):
select * from ( (select * from ( (-- first query select 'alpha' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'charlie' nato_spelling from dual ) UNION (-- third query select 'x-ray' nato_spelling from dual union all select 'yankee' nato_spelling from dual union all select 'zulu' nato_spelling from dual ) ) ) MINUS (select * from ( (-- first query select 'alpha' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'charlie' nato_spelling from dual ) UNION ALL (-- third query select 'x-ray' nato_spelling from dual union all select 'yankee' nato_spelling from dual union all select 'zulu' nato_spelling from dual ) ) ) )
NATO_SPELLING -------------
and
select * from ( (select * from ( (-- first query select 'alpha' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'charlie' nato_spelling from dual ) UNION ALL (-- third query select 'x-ray' nato_spelling from dual union all select 'yankee' nato_spelling from dual union all select 'zulu' nato_spelling from dual ) ) ) MINUS (select * from ( (-- first query select 'alpha' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'bravo' nato_spelling from dual union all select 'charlie' nato_spelling from dual ) UNION (-- third query select 'x-ray' nato_spelling from dual union all select 'yankee' nato_spelling from dual union all select 'zulu' nato_spelling from dual ) ) ) )
NATO_SPELLING -------------
These results tell me the queries have the same resultsets, when in fact they do not.
A simple (extra) check could be to check if both queries have the same number of rows in the resultset.
In my opinion the resultsets are equal when both the MINUS queries give no results AND the count of the individual queries are the same.
Is there something I am still missing here?
I hope not.
Change UUIDs in VirtualBox
If you are anything like me you will have multiple virtualboxes running on your system. Sometimes you might want to run a copy of a virtualbox for different purposes. Like running an Oracle 11 Devdays instance as test environment but also running the same vbox for customer testing. If you copy the vbox and try to run it in the manager you’ll be presented with an error that a harddisk with the same UUID already exists. Here’s how I solved it.
First of all you make a backup-copy of the Virtualbox you want to change. While this is running you can download the portable apps UUID-GUID generator or if you are not running windows a similar program. You can also use an online GUID generator.
After the backup has completed you can start changing the UUIDs for the VirtualBox. Open the <virtualboxname>.vbox file in a text editor. There are a couple of UUIDs that need to be changed:
First look for the <Machine> tag (2nd tag in the xml file). One of the attributes is uuid={some_uuid}. You can change this to your new uuid. This is where the generator comes in, just generate a new uuid and paste that here.
Next you need to change the uuids for the harddisks. This is a little more tricky. Find the tag <Harddisk> and look for the uuid attribute. This uuid is used multiple times in the xml file. Also in the StorageControllers section. The easiest way to keep these in sync is to do a search-and-replace over the entire file. Search for the current uuid, replace with a freshly generated uuid. Before you change the next one. you also need to change the uuid in the harddisk file. You do this running a command line utility VBoxManage.
The command is like this:
<path_to_virtualbox>VBoxManage internalcommands sethduuid <filepath> <uuid>
Repeat this process for all the harddisks that are defined. This way you can have multiple instances of the same VirtualBox in your VirtualBox Manager.
You may want to change other settings like MAC Addresses for your network cards, but you can do this using the VBox interface.
Connecting PL/SQL Developer
In SQL Developer you have a lot of options when connecting to the database. You can use the TNS entries defined but you can also give the hostname, port and SID or Servicename for instance. PL/SQL Developer doesn’t supply these options, but you can still use them…
In the logonscreen you can choose the database by choosing you TNS entry. But did you know you can put in the complete text of a tnsnames.ora entry here?
So, you can type in:
username: demo password: <<password>> database: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=LOCALHOST)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=DEMO)))
But it can be done even simpler:
username: demo password: <<password>> database: LOCALHOST:1521/DEMO
Maybe you already knew this and I have known this for a while, but I thought I would write this down anyway. Hope the helps some of you.
How can we use Oracle to deduplicate our data
Dear Patrick,
We have gone through a merger at our company where we are trying to merge the databases. The problem now is that we have duplicate records in our tables. We can of course go through all the records by hand and check if they exist twice. Another option is to build an application to do this. But using the Oracle Database there must be a better way to do this. Any ideas?
Ramesh Cunar
Dear Ramesh,
Going through all these records by hand seems like quite a time consuming process. I don’t know the number of records in your tables, but I would definitely not do this by hand. Writing an application to do this makes it possible to run the application multiple times, so you can update the rules making it better every time. Writing an application can be time consuming and running it can be as well.
I think your best shot would be to try and find a solution in either PL/SQL or SQL. Depending on the complexity and size of your data you could try any of the following solutions.
You can write a procedure that loops through your data and saves the records in an Associative array. If the record already exists in the array, or at least the key fields are equal, then it should not be added.
CREATE OR REPLACE PROCEDURE ot_deduplicate_aa IS TYPE t_ot_emp_aa IS TABLE OF ot_emp%ROWTYPE INDEX BY PLS_INTEGER; CURSOR c_ot_emp IS SELECT e.empno ,e.ename ,e.job ,e.mgr ,e.hiredate ,e.sal ,e.comm ,e.deptno FROM ot_emp e; r_ot_emp ot_emp%ROWTYPE; l_unique_ot_emps t_ot_emp_aa; FUNCTION record_already_exists(record_in IN ot_emp%ROWTYPE ,collection_inout IN OUT t_ot_emp_aa) RETURN BOOLEAN IS l_indx PLS_INTEGER := collection_inout.first; l_returnvalue BOOLEAN := FALSE; BEGIN IF l_indx IS NOT NULL THEN LOOP l_returnvalue := l_returnvalue OR ((record_in.ename = collection_inout(l_indx).ename) AND (record_in.job = collection_inout(l_indx).job)); l_indx := collection_inout.next(l_indx); EXIT WHEN l_returnvalue OR(l_indx IS NULL); END LOOP; END IF; RETURN l_returnvalue; END record_already_exists; BEGIN OPEN c_ot_emp; LOOP FETCH c_ot_emp INTO r_ot_emp; EXIT WHEN c_ot_emp%NOTFOUND; -- check if this record already exists IF NOT (record_already_exists(record_in => r_ot_emp, collection_inout => l_unique_ot_emps)) THEN l_unique_ot_emps(l_unique_ot_emps.count + 1) := r_ot_emp; END IF; END LOOP; FOR indx IN l_unique_ot_emps.first .. l_unique_ot_emps.last LOOP INSERT INTO ot_emp_deduplicated (empno ,ename ,job ,mgr ,hiredate ,sal ,comm ,deptno) VALUES (l_unique_ot_emps(indx).empno ,l_unique_ot_emps(indx).ename ,l_unique_ot_emps(indx).job ,l_unique_ot_emps(indx).mgr ,l_unique_ot_emps(indx).hiredate ,l_unique_ot_emps(indx).sal ,l_unique_ot_emps(indx).comm ,l_unique_ot_emps(indx).deptno); END LOOP; END ot_deduplicate_aa; /
You can speed up this process by using bulk processing for both the retrieval and the storing of the data. Watch out that you don’t blow up your memory by retrieving too much data at once. You can use the limit clause to prevent this.
Another way to go at this, is the use of the analytic functions in Oracle. The idea behind using analytics is to rank all the record in their own partition. This rank can be rather arbitrary so we can use ROW_NUMBER as a ranking mechanism. In the PARTITION BY clause we can add all the columns we need to check for duplicates on. And to define which record to keep we add the (obligatory) order by clause. Then, using this new column, we can state that we are just interested in the records where the ROW_NUMBER equals 1.
WITH emp_all_rows AS (SELECT e.empno, e.ename, e.job, e.mgr, e.hiredate, e.sal, e.comm, e.deptno ,row_number() OVER (PARTITION BY e.ename, e.job ORDER BY e.empno ASC) rn FROM ot_emp e) INSERT INTO ot_emp_deduplicated(empno, ename, job, mgr, hiredate, sal, comm, deptno) (SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno FROM emp_all_rows WHERE rn = 1) /
You can of course use this same technique to delete the duplicate rows directly from the source table, but then you cannot use the query factoring (WITH clause) because that isn’t supported in SQL.
DELETE FROM ot_emp e WHERE e.rowid IN (SELECT ear.rowid FROM (SELECT e.rowid ,row_number() over(PARTITION BY e.ename ORDER BY e.empno ASC) rn FROM ot_emp e) ear --<-- EmpAllRows WHERE rn > 1) /
It is probably faster to insert the de-duplicated rows into a new table, then drop the original table and rename the new table. Be aware of any triggers, constraints, indexes etc. that may exist and need to be recreated.
You can check out the full demo code at
https://livesql.oracle.com/apex/livesql/s/ckqg3dkqmvj5omqttapzmrhtg
Or you can send me an email and I will send you the demo script.
Hope this helps in your efforts to de-duplicate your data.
Happy Oracle’ing,
Patrick Barel
This question has been published in OTech Magazine of Winter 2015.
OTN Appreciation Day : Prebuilt Developer VMs
I learnt a lot from the entire Oracle Technology Network and I still do. One of the best features of OTN these days, IMHO, is the ability to download prebuilt Virtual Machines. Since I am a developer and not an administrator, I don’t like to be bothered with stuff like how much disk space do I need, how many cores should I use etc. I can just download a Virtual Box image, import it and start experimenting with the technology I am interested in. For instance, the multi tenant features of the Oracle 12c database. The best thing in using a virtual machine is that when you screw up really, really bad, you can just throw away the virtual machine, import the original version again and try again.
Thanks, OTN, for making it possible for me to learn new technologies without having to learn all the administrator stuff.
Oh, and if I need some extra information or find out what an error means and what I can do about it, there is almost always an answer to be found at OTN.
#ThanksOTN
How can we add custom code to existing triggers?
Dear Patrick,
We have bought an application that runs on an Oracle database. There are triggers defined on the tables, but we want to add our own code to these triggers, but we don’t have access to the source code. What is the approach we should follow to accomplish this?
Collin Bratforth
Dear Collin,
There are two types of DML triggers. Statement level triggers and row level triggers. Then for these two types there are two triggering moments, before and after. So you get four trigger moments for each DML operation. The statement level triggers fire once for each statement, the row level triggers fire once for each row. This gives you enough moments to add your code. If however you want to add code to a triggering moment already used by the application, you can just add another trigger which will fire at the same moment as the existing one.
This should give you plenty of possibilities to add your own code. Since Oracle 8i you can define multiple triggers at the same firing event, but if you define two or more triggers on the same firing event there is no way to tell which trigger will always fire first. You might think it is based on the creation order or an alphabetical order of the object name, but that is not the case. If the code executed in the triggers is not dependent on each other, then there is no problem, but what if one trigger gets the next value of a sequence and in the other trigger you want to use the current value, you might run into a problem. If the second trigger fires first then the current value of the sequence is either not defined or still holds an old value (previously defined in the session) which is both not correct.
CREATE TABLE trigger_demo ( ID NUMBER , VALUE VARCHAR2(30) , first_trigger_value VARCHAR2(30) , second_trigger_value VARCHAR2(30) ) /
CREATE SEQUENCE trigger_demo_seq START WITH 1 NOCACHE /
CREATE OR REPLACE TRIGGER first_trigger BEFORE INSERT ON trigger_demo FOR EACH ROW BEGIN :new.id := trigger_demo_seq.nextval; :new.first_trigger_value := 'First Trigger ' || to_char(trigger_demo_seq.currval); END; /
CREATE OR REPLACE TRIGGER second_trigger BEFORE INSERT ON trigger_demo FOR EACH ROW BEGIN :new.second_trigger_value := 'Second Trigger ' || to_char(trigger_demo_seq.currval); END; /
INSERT INTO trigger_demo (VALUE) VALUES ('Patrick') /
SELECT * FROM trigger_demo /
If you run this code, chances are about fifty-fifty that you will run into this error:
ORA-08002: sequence TRIGGER_DEMO_SEQ.CURRVAL is not yet defined in this session ORA-06512: at "SCOTT.SECOND_TRIGGER", line 2 ORA-04088: error during execution of trigger
The reason is that SCOTT.SECOND_TRIGGER fires before SCOTT.FIRST_TRIGGER fires and trigger_demo_seq.currval is only defined after trigger_demo_seq.nextval has been called which only happens in SCOTT.FIRST_TRIGGER.
Since Oracle 11g you have the possibility to tell the trigger to fire after another trigger. You can do this by adding the FOLLOWS clause to the trigger. This way you can make sure your trigger gets fired after the other one.
CREATE OR REPLACE TRIGGER second_trigger BEFORE INSERT ON trigger_demo FOR EACH ROW FOLLOWS first_trigger BEGIN :new.second_trigger_value := 'Second Trigger ' || to_char(trigger_demo_seq.currval); END; /
INSERT INTO trigger_demo (VALUE) VALUES ('Patrick') /
SELECT * FROM trigger_demo /
This way the second_trigger gets fired after the first_trigger. Unfortunately there is no preceding clause for the triggers, so you cannot have triggers get fired before the original code. There is a preceding clause available but this is reserved for cross edition triggers, which are part of Edition Based Redefinition, but that is a whole different subject.
I hope this answers your question.
Happy Oracle’ing,
Patrick Barel
This question has been published in OTech Magazine of Fall 2015.
What is overloading and how and when do I use it
Dear Patrick,
Recently I heard someone talk about overloading in Java. What is it, is it possible in PL/SQL and if so, how would I use it?
Ramesh Cumar
Dear Ramesh,
Overloading is a technique of creating multiple programs with the same name that can be called with different sets of parameters. It is definitely possible to apply this technique in PL/SQL, in fact, Oracle does this a lot of times in their own built-in packages. If you take a look at the SYS.STANDARD package then you will find a lot of functions called TO_CHAR, but with different parameter sets. You probably never wondered how Oracle can use the same function name for completely different tasks. It’s just as easy to write
TO_CHAR(9) which will result in ‘9’ as it is to write TO_CHAR(SYSDATE) which will result in the current date in to format specified in the NLS_DATE_FORMAT parameter, for example 29-12-15 if the format is ‘DD-MM-RR’. If you would want to get this value in a different format you can just write TO_CHAR (SYSDATE, ‘Month, DDth YYYY’) to get ‘December, 29th 2015’. As you can see they are all calls to a function with the same name, but with completely different sets of parameters.
This behavior cannot be realized by making all the parameters option, like this:
FUNCTION TO_CHAR (num_in in number default null , date_in in date default null , datemask_in in varchar2) return varchar2;
Because if you would want to call this function without using named parameters this call
TO_CHAR (SYSDATE) would not work, since SYSDATE returns a DATE and the function expects a number as its first parameter. Maybe it might work, because of the implicit typecasts, but you get the idea.
The way this is implemented is there are multiple functions defined in a package with the same name but different sets of parameters.
One of the packages you can take a look at, because its implementation is readable, i.e. not wrapped, is the HTP package which you can use to generate HTML output for instance in an APEX application. If you take a look at for instance the PRINT procedure. In the package specification you can see there are three implementations available for this procedure:
procedure print (cbuf in varchar2 character set any_cs DEFAULT NULL); procedure print (dbuf in date); procedure print (nbuf in number);
The parameters of these function differ not only in name, but also in data type, which is a requirement for the use of overloading:
Data type and/or number and/or name of parameters must differ
The compiler will not complain if you don’t completely comply with this rule, but at runtime you will not be able to use either one of them.
Consider the following package with its implementation
[PATRICK]SQL>CREATE OR REPLACE PACKAGE ol IS PROCEDURE p (param_in IN VARCHAR2); PROCEDURE p (param_in IN CHAR); END ol; /
[PATRICK]SQL>CREATE OR REPLACE PACKAGE BODY ol IS PROCEDURE p (param_in IN VARCHAR2) IS BEGIN dbms_output.put_line(param_in); END p; PROCEDURE p (param_in IN CHAR) IS BEGIN dbms_output.put_line(param_in); END p; END ol; /
If you want to call the procedure there is no way Oracle can decide which one to use.
[PATRICK]SQL>BEGIN ol.p('Hello World'); END; / ol.p('Hello World'); * ERROR at line 2: ORA-06550: Line 2, column 3: PLS-00307: too many declarations of 'P' match this call. ORA-06550: Line 2, column 3: PL/SQL: Statement ignored.
Even if you were using named parameters you would get the same error. What we have here is so called ‘ambiguous overloading’. You can read more about this subject at http://www.stevenfeuerstein.com/learn/building-code-analysis.
So, there is definitely a use for overloading but you have to be careful about the parameters, especially when parameters have default values. If you run into a situation of ambiguous overloading you now know why the compiler didn’t complain, but the runtime engine does.
Happy Oracle’ing,
Patrick Barel
This question has been published in OTech Magazine of Summer 2015.
When would you use a normal table function?
Dear Patrick,
Last year I did a presentation on table functions at KScope. One of the questions I got was: ‘If pipelined table functions provide their results faster, why would you want to use a normal table function?’ I couldn’t come up with the answer then, maybe you can help?
Erik van Roon
Dear Erik,
Let’s start with explaining a bit what table functions are. Table Functions are functions that return a collection of data and which can be called in a normal SQL statement by using the TABLE() operator. Let’s create a simple function. Please note this is a demonstration only, not something you would normally solve using table functions (or PL/SQL as a whole).
First step is to create a record type in the database:
[PATRICK]SQL>CREATE TYPE emp_rt AS OBJECT ( empno NUMBER(4) , ename VARCHAR2(10) , mgr NUMBER(4) ) /
Then you need to create a table type in the database:
[PATRICK]SQL>CREATE TYPE emp_tt AS TABLE OF emp_rt /
Then it is time for the simple function. The DBMS_LOCK.SLEEP call is in there to show the difference between Table Functions and Pipelined Table Functions.
[PATRICK]SQL>CREATE OR REPLACE FUNCTION tf_emp RETURN emp_tt AS l_returnvalue emp_tt; BEGIN SELECT emp_rt(e.empno, e.ename, e.mgr) BULK COLLECT INTO l_returnvalue FROM emp e ORDER BY e.deptno; FOR indx IN l_returnvalue.first .. l_returnvalue.last LOOP l_returnvalue(indx).ename := INITCAP(l_returnvalue(indx).ename); dbms_lock.sleep(.25); -- for demo purposes only END LOOP; RETURN l_returnvalue; END; /
Now you can call the function in the FROM clause of your SQL statement as if it were a relational table:
[PATRICK]SQL>SELECT * FROM TABLE(tf_emp) /
Notice that the result is displayed after all records have been processed, i.e. after 3.5 seconds (due to the DBMS_LOCK.SLEEP statement).
Now let’s create a PIPELINED table function, which produces the same result but in a different manner:
[PATRICK]SQL>CREATE OR REPLACE FUNCTION ptf_emp RETURN emp_tt PIPELINED AS l_emps emp_tt; l_returnvalue emp_rt; BEGIN SELECT emp_rt(e.empno, e.ename, e.mgr) BULK COLLECT INTO l_emps FROM emp e ORDER BY e.deptno; FOR indx IN l_emps.first .. l_emps.last LOOP l_returnvalue := emp_rt(empno => l_emps(indx).empno ,ename => INITCAP(l_emps(indx).ename) ,mgr => l_emps(indx).mgr); PIPE ROW (l_returnvalue); dbms_lock.sleep(.25); -- for demo purposes only END LOOP; RETURN; END; /
If you set the arraysize of your SQL*Plus session (or your command window in PL/SQL Developer) you can see how the results are being returned as they are produced, i.e. 0.25 seconds apart.
[PATRICK]SQL>SET ARRAYSIZE 1 [PATRICK]SQL>SELECT * FROM TABLE(ptf_emp) /
Now you can see the difference between a Table Function and a Pipelined Table Function. Pipelined Table Functions are best used when you are executing different stages of transformation of your data, for example reading from an OLTP system and writing to a DataWareHouse system. If you PARALLEL_ENABLE your functions AND your source table can be read parallel then you could really see some performance benefits.
But all of this doesn’t explain why you should NOT use pipelining in a table function. The ONLY reason I can think of is when you want to be able to call the function from plain PL/SQL. PL/SQL does one call to a function and expects one result from it. Not a result spread out over many ‘callback’s.
If you create a function like this:
[PATRICK]SQL>CREATE OR REPLACE PROCEDURE call_tf AS l_emps emp_tt; BEGIN l_emps := tf_emp; FOR indx IN l_emps.first .. l_emps.last LOOP dbms_output.put_line(l_emps(indx).empno || ' ' || l_emps(indx).ename || ' ' || l_emps(indx).mgr); END LOOP; END; /
And call this function, then everything works ok. It takes about 3.5 seconds for the function to complete, due to the DBMS_LOCK.SLEEP call in the tf_emp function.
If you call the pipelined table function you are stopped at compile time. The call to a pipelined table function is not allowed.
[PATRICK]SQL>CREATE OR REPLACE PROCEDURE call_ptf AS l_emps emp_tt; BEGIN l_emps := ptf_emp; FOR indx IN l_emps.first .. l_emps.last LOOP dbms_output.put_line(l_emps(indx).empno || ' ' || l_emps(indx).ename || ' ' || l_emps(indx).mgr); END LOOP; END; / Warning: Procedure created with compilation errors. [PATRICK]SQL>sho err Errors for PROCEDURE CALL_PTF: LINE/COL ERROR -------- --------------------------------------------------------------------- 1/10 PLS-00653: aggregate/table functions are not allowed in PL/SQL scope
You can of course wrap the call to the pipelined table function in a cursor like this:
[PATRICK]SQL>CREATE OR REPLACE PROCEDURE call_ptf2 AS CURSOR c_emps IS SELECT emp_rt(t.empno, t.ename, t.mgr) FROM TABLE(ptf_emp) t; l_emps emp_tt; BEGIN OPEN c_emps; FETCH c_emps BULK COLLECT INTO l_emps; FOR indx IN l_emps.first .. l_emps.last LOOP dbms_output.put_line(l_emps(indx).empno || ' ' || l_emps(indx).ename || ' ' || l_emps(indx).mgr); END LOOP; END; / Procedure created.
But when you call this function you will see that it takes about 3.5 seconds to fetch all the records, effectively using the pipelined table function as a normal table function. This might be your escape to use pipelined table functions in a SQL only environment and still use the same function in a PL/SQL environment.
‘But you said pipelined table functions are best used then executing different stages of transformation. That includes multiple PL/SQL functions.’ you might wonder. That is correct. A pipelined table function may call another pipelined table function. It can use for instance a collection as its input like this
[PATRICK]SQL>CREATE OR REPLACE FUNCTION ptf_emp3 (emps_in IN emp_tt) RETURN emp_tt PIPELINED AS l_emp emp_rt; BEGIN FOR indx IN emps_in.first .. emps_in.last LOOP l_emp := emps_in(indx); l_emp.ename := UPPER(l_emp.ename); PIPE ROW (l_emp); dbms_lock.sleep(.25); -- for demo purposes only END LOOP; RETURN; END; /
When you call this function using this:
[PATRICK]SQL>SET ARRAYSIZE 1 [PATRICK]SQL>SELECT * FROM TABLE(ptf_emp3(CAST(MULTISET (SELECT * FROM TABLE(ptf_emp)) AS emp_tt))) /
You will notice the output will not start displaying until after about 3.5 seconds which tells me the call to ptf_emp must be completed before pft_emp3 can start doing its work.
If you change the parameter to accept a cursor (and of course change the processing as well) like this:
[PATRICK]SQL>CREATE OR REPLACE FUNCTION ptf_emp4 (cursor_in IN SYS_REFCURSOR) RETURN emp_tt PIPELINED AS l_emp emp_rt; BEGIN LOOP FETCH cursor_in INTO l_emp; EXIT WHEN cursor_in%NOTFOUND; l_emp.ename := upper(l_emp.ename); PIPE ROW(l_emp); dbms_lock.sleep(.25); -- for demo purposes only END LOOP; RETURN; END; /
And call this function using this statement:
[PATRICK]SQL>SET arraysize 1 [PATRICK]SQL>SELECT * FROM TABLE(ptf_emp4(CURSOR (SELECT emp_rt(empno ,ename ,mgr) FROM TABLE(ptf_emp)))) /
You will notice the output starting to display after about half a second. The total time needed for this function to complete is roughly the same as the previous, but this function starts processing sooner, which might be exactly what you need.
I hope this sheds a bit of light on the subject. I think bottom line is to always use pipelined table functions instead of normal table functions, except when the function is only meant for PL/SQL or client consumption. You can always work around implementation restrictions if you need to use the same codebase in both SQL and PL/SQL.
Happy Oracle’ing,
Patrick Barel
This question has been published in OTech Magazine of Spring 2015.
What’s the difference between SEMI-JOIN and ANTI-JOIN?
Dear Patrick,
What is an ANTI-JOIN? And what is the difference between the SEMI-JOIN and the ANTI-JOIN?
Lillian Sturdey
Dear Lillian,
First of all, both SEMI-JOIN and ANTI-JOIN are not in the SQL syntax but they are more a pattern. You might expect to be able to write something like:
[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc FROM dept d SEMI JOIN emp e ON (e.deptno = d.deptno) /
to get all the departments that have at least one employee.
Or:
[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc FROM dept d ANTI JOIN emp e ON (e.deptno = d.deptno) /
to get the departments with no employees. But all you get is an error saying your command is not properly ended, which can be read as a syntax error.
ERROR at line 3: ORA-00933: ORA-00933 SQL command not properly ended.
Maybe your first idea would be to use a normal join to get all the departments with at least one employee:
[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc FROM dept d JOIN emp e ON (e.deptno = d.deptno) /
But this results in a record for every row in the EMP table. And we only wanted every unique department.
DEPTNO DNAME LOC ---------- -------------- ------------- 20 RESEARCH DALLAS 30 SALES CHICAGO 30 SALES CHICAGO 20 RESEARCH DALLAS 30 SALES CHICAGO 30 SALES CHICAGO 10 ACCOUNTING NEW YORK 20 RESEARCH DALLAS 10 ACCOUNTING NEW YORK 30 SALES CHICAGO 20 RESEARCH DALLAS DEPTNO DNAME LOC ---------- -------------- ------------- 30 SALES CHICAGO 20 RESEARCH DALLAS 10 ACCOUNTING NEW YORK 14 rows selected.
Well, that’s easy enough, you think, just add a DISTINCT to the statement:
[PATRICK]SQL>SELECT DISTINCT d.deptno, d.dname, d.loc FROM dept d JOIN emp e ON (e.deptno = d.deptno) /
Exactly the result we are looking for:
DEPTNO DNAME LOC ---------- -------------- ------------- 20 RESEARCH DALLAS 10 ACCOUNTING NEW YORK 30 SALES CHICAGO
But what if the EMP table contains hundreds, thousands or maybe millions of rows. That would mean the database has to do a lot of work to filter out the distinct values.
A different, and probably better, approach would be to use the SEMI-JOIN pattern. You can use the IN operator like this:
[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc FROM dept d WHERE d.deptno IN (SELECT e.deptno FROM emp e) /
DEPTNO DNAME LOC ---------- -------------- ------------- 20 RESEARCH DALLAS 30 SALES CHICAGO 10 ACCOUNTING NEW YORK
This is exactly what we want to see but for big tables this is not the correct way to go. For every record in the dept table all the records in the EMP table are checked. Again, if we have a lot of employees, this means a lot of work for the database.
A better SEMI-JOIN to use is the EXISTS operator:
[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc FROM dept d WHERE EXISTS (SELECT 1 FROM emp e WHERE e.deptno = d.deptno) /
Please note that with the current optimizer in the database Oracle will rewrite your query to use the best approach for the task. If the inner table (in our example EMP) is rather small, then the IN approach might be the best, in other cases it might be better to use the EXISTS approach. Where in earlier versions you had to think about which way to go (IN is better for small tables, EXISTS is better for big ones), you can now rely on the optimizer to make the correct choice.
If you would want to see exactly the opposite of this query, i.e. all departments with no employees, you use an ANTI-JOIN pattern, which is pretty much the same but in this case you use NOT IN or NOT EXISTS. A different approach, which I think is pretty nice is to use an OUTER JOIN and check for the non-existence of values in column for the OUTER JOINED table.
[PATRICK]SQL>SELECT d.deptno, d.dname, d.loc FROM dept d LEFT OUTER JOIN emp e ON (e.deptno = d.deptno) WHERE e.empno IS NULL /
DEPTNO DNAME LOC ---------- -------------- ------------- 40 OPERATIONS BOSTON
Hope this gives you a bit more insight in this subject and gives you a better understanding of the wonders of the SQL language. Notice there are many ways to reach the same result, but one approach might be more economical than the other.
Happy Oracle’ing,
Patrick Barel
Hope this answers your question.
Happy Oracle’ing,
Patrick Barel
This question has been published in OTech Magazine of Winter 2014.
What is the difference between NVL and COALESCE?
Dear Patrick,
Could you tell me what the difference is between NVL and COALESCE?
Kindest regards,
Paul McCurdey
Dear Paul,
NVL returns the value of the first argument if it doesn’t evaluate to NULL, otherwise it will return the value of the second argument.
COALESCE returns the first argument that doesn’t evaluate to NULL. That can be any one of the arguments.
So they are definitely similar – but there are significant differences.
First of all, COALESCE is a function that’s part of the ANSI-92 standard whereas NVL was made in the 80′s when there were no standards. Since COALESCE is the newer function of the two (since 9i), it is better equipped for multiple values and it does less work, therefore it’s the greener option (doing less work means using less resources, like power, and therefore it is greener).
How is doing less work better, you might ask? Well, would you want to do the work for which you know the result is never going to be used? I know I wouldn’t. That is one thing COALESCE does for you. It does not evaluate an argument if its result is not needed.
A big advantage of using the COALESCE function is the short-circuit evaluation. Where NVL evaluates both arguments, whether the second argument should be used or not, COALESCE only evaluates the arguments if they are needed.
For example:
If you run the following statement:
[PATRICK]SQL>SELECT NVL(1, 1/0) FROM dual /
you will see the
ORA-01476: division by zero
error.
In this statement the first argument (1) is NOT NULL so the second argument (1/0) should not be returned, but since with NVL PL/SQL evaluates the expression, this statement results in the exception.
The COALESCE function only evaluates the arguments if they are needed.
If you run the following statement:
[PATRICK]SQL>SELECT COALESCE(1, 1/0) FROM dual 2 /
you will not get an error, since the second argument is not evaluated. In other words it evaluates exprN only if expr(N-1) evaluates to NULL.
A simple test shows the difference again:
First we create simple package which holds a variable and a function to increase that variable:
[PATRICK]SQL>CREATE OR REPLACE PACKAGE nvl_test IS g_value NUMBER := 0; FUNCTION increase_value RETURN NUMBER; END nvl_test; /
[PATRICK]SQL>CREATE OR REPLACE PACKAGE BODY nvl_test IS FUNCTION increase_value RETURN NUMBER IS l_returnvalue NUMBER; BEGIN dbms_output.put_line('nvl_test.increase_value'); nvl_test.g_value := nvl_test.g_value + 1; l_returnvalue := nvl_test.g_value; RETURN l_returnvalue; END increase_value; END nvl_test; /
Then a script to demonstrate what happens. First display the value of the variable. Then call the NVL function where the first value is NULL. As you can see, the function in the package is called, hence the variable is increased. Then another call to the NVL function, this time with a non NULL value. The function in the package is still called even though its value is not being used.
Then we reset the value of the variable and run the same tests, but this time using the COALESCE function. As you can see, the function is only being called if the previous argument(s) evaluate to NULL.
[PATRICK]SQL>DECLARE l_dummy NUMBER; l_foo NUMBER; BEGIN dbms_output.put_line('====reset package===='); nvl_test.g_value := 0; l_dummy := nvl_test.g_value; dbms_output.put_line(l_dummy); l_foo := NVL(NULL,nvl_test.increase_value); dbms_output.put_line(l_foo); l_dummy := nvl_test.g_value; dbms_output.put_line(l_dummy); l_foo := NVL(2912,nvl_test.increase_value); dbms_output.put_line(l_foo); l_dummy := nvl_test.g_value; dbms_output.put_line(l_dummy); dbms_output.put_line('====reset package===='); nvl_test.g_value := 0; l_dummy := nvl_test.g_value; dbms_output.put_line(l_dummy); l_foo := coalesce(NULL,nvl_test.increase_value); dbms_output.put_line(l_foo); l_dummy := nvl_test.g_value; dbms_output.put_line(l_dummy); l_foo := coalesce(2912,nvl_test.increase_value); dbms_output.put_line(l_foo); l_dummy := nvl_test.g_value; dbms_output.put_line(l_dummy); END; / ====reset package==== 0 nvl_test.increase_value 1 1 nvl_test.increase_value 2912 2 ====reset package==== 0 nvl_test.increase_value 1 1 2912 1
If you run the anonymous block in an IDE where you can step through the code, you can see when the code is executed and when it is bypassed.
So, if you don’t need or want the code executed when the value of a variable or result of a function is not NULL, then you should use COALESCE to prevent this from happening. But there might be a use case in which you always want a piece of code executed whether the first argument is NULL or not. Then you should use (or stick to) NVL.
I think the rule should be: Use COALESCE unless…
Hope this answers your question.
Happy Oracle’ing,
Patrick Barel
This question has been published in OTech Magazine of Fall 2014
How do I get my query results paginated?
Dear Patrick,
I have got a website with a search form. I want to display a limited number of results to the user and have him/her navigate through different pages. Is this possible using plain SQL?
Kindest regards,
Mitchell Ian
Dear Mitchell,
Of course this is possible. It might take some thinking, but that has never hurt anyone (yet). First we need a table with some randomly sorted data in it. In this example I am just using 20 records, but you can use this approach on bigger tables of course.
[PATRICK]SQL>CREATE TABLE t AS SELECT LEVEL val#, to_char(LEVEL, '9999') value_in_text FROM dual CONNECT BY LEVEL < 21 ORDER BY dbms_random.random / Table created.
The order by dbms_random.random is to ensure the data is inserted in random order. I you just select from this new table then you data will be unordered.
Now we select the first ‘page’ from this table. Our page size is 5 records. So the query will be:
[PATRICK]SQL>SELECT * FROM t WHERE ROWNUM <= 5 / VAL# VALUE ---------- ----- 10 10 20 20 16 16 1 1 17 17
This results in the first 5 rows from the table. If we want to get the next 5, rownums 6 through 10 then you might want to try something like this.
[PATRICK]SQL>SELECT * FROM t WHERE ROWNUM > 5 AND ROWNUM <= 10 / no rows selected
Unfortunately this doesn’t work. I appears this query will never have any resulting row with a number between 6 and 10. The solution to this issue is the use of a subquery:
[PATRICK]SQL>SELECT val#, value_in_text FROM (SELECT t.val#, t.value_in_text, ROWNUM rn FROM t) WHERE rn > 5 AND rn <= 10 / VAL# VALUE ---------- ----- 13 13 4 4 5 5 3 3 14 14
In this query we first select all the rows we might need for the pages and using this resultset we just select the rows we are interested in for our page.
If your table is rather big you may want to include the maximum rownum in the inline view.
[PATRICK]SQL>SELECT val#, value_in_text FROM (SELECT t.val#, t.value_in_text, ROWNUM rn FROM t WHERE ROWNUM <= 10) WHERE rn > 5 AND rn <= 10 / VAL# VALUE ---------- ----- 13 13 4 4 5 5 3 3 14 14
As you are probably aware of the is no guarantee on how the rows are being returned unless you specify an order by clause. But what happens if you were to just include this order by in your query. Let’s see what happens when you include it in the first query for the first page:
[PATRICK]SQL>SELECT * FROM t WHERE ROWNUM <= 5 ORDER BY t.val# / VAL# VALUE ---------- ----- 12 12 13 13 15 15 17 17 19 19
The rows returned are in order, but they are definitely not the first 5 values currently in the table. That is how the sql engine works. It first gets the first 5 rows to honor the predicate in the query and then it sorts the result before returning it to the caller.
What we should do to get the correct behavior of our query is use a subquery to get the results in order and apply the rownum clause to that result.
[PATRICK]SQL>SELECT * FROM (SELECT * FROM t ORDER BY t.val#) WHERE ROWNUM <= 5 / VAL# VALUE ---------- ----- 1 1 2 2 3 3 4 4 5 5
We can now use this to build a query to get the next page of results:
[PATRICK]SQL>SELECT val#, value_in_text FROM (SELECT val#, value_in_text, ROWNUM rn FROM (SELECT * FROM t ORDER BY t.val#) ORDER BY rn) WHERE rn > 5 AND rn <= 10 / VAL# VALUE ---------- ----- 6 6 7 7 8 8 9 9 10 10
When you have access to an Oracle 12c database, it is a lot easier, to get the first page of the ordered results, you can issue this statement:
[PATRICK]SQL>SELECT * FROM t ORDER BY t.val# FETCH FIRST 5 ROWS ONLY /
To get another page you can provide query with an offset of how many rows to skip:
[PATRICK]SQL>SELECT * FROM t ORDER BY t.val# OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY /
Under the covers Oracle still issues similar queries as the ones we built earlier, but it is a lot easier to write these.
Hope this sheds a bit of light on your issue.
Happy Oracle’ing,
Patrick Barel
If you have a question you want answered, please send an email to patrick[at]bar-solutions[dot]com. If I know the answer, or can find it for you, maybe I can help.This question has been published in OTech Magazine of Summer 2014
Virtual Private Database…
Some time ago I was asked to assist in fixing or at least finding the cause of a performance problem. The application ran fine until the Virtual Private Database (VPD) policy was applied. Oracle claims there should be near zero impact on your application when you implement VPD, then how is this possible?
First of all, the policy applied was a rather complex one. A complex query should be executed to determine if the current user has access to the record. Let’s say this query takes up a second, then I would expect my query to run about a second slower, maybe two. But the query took several minutes to complete when the VPD policy was applied. This didn’t make sense to me, so I decided to find out what was really happening.
To do this, I opened up my sandbox database to try and recreate this situation.
First I need to create two new users
create user vpd1 identified by vpd1 / grant connect, resource to vpd1 / create user vpd2 identified by vpd2 / grant connect, resource to vpd2 /
Then I created a simple table to hold the data that should be protected by the VPD policy:
drop table emp purge / create table emp (empno number(4) not null, ename varchar2(10), job varchar2(9), mgr number(4), hiredate date, sal number(7, 2), comm number(7, 2), deptno number(2)) ; insert into emp values (7369, 'SMITH', 'CLERK', 7902, to_date('17-12-1980', 'DD-MM-YYYY'), 800, null, 20); insert into emp values (7499, 'ALLEN', 'SALESMAN', 7698, to_date('20-02-1981', 'DD-MM-YYYY'), 1600, 300, 30); insert into emp values (7521, 'WARD', 'SALESMAN', 7698, to_date('22-02-1981', 'DD-MM-YYYY'), 1250, 500, 30); insert into emp values (7566, 'JONES', 'MANAGER', 7839, to_date('02-04-1981', 'DD-MM-YYYY'), 2975, null, 20); insert into emp values (7654, 'MARTIN', 'SALESMAN', 7698, to_date('28-09-1981', 'DD-MM-YYYY'), 1250, 1400, 30); insert into emp values (7698, 'BLAKE', 'MANAGER', 7839, to_date('01-05-1981', 'DD-MM-YYYY'), 2850, null, 30); insert into emp values (7782, 'CLARK', 'MANAGER', 7839, to_date('09-06-1981', 'DD-MM-YYYY'), 2450, null, 10); insert into emp values (7788, 'SCOTT', 'ANALYST', 7566, to_date('09-12-1982', 'DD-MM-YYYY'), 3000, null, 20); insert into emp values (7839, 'KING', 'PRESIDENT', null, to_date('17-11-1981', 'DD-MM-YYYY'), 5000, null, 10); insert into emp values (7844, 'TURNER', 'SALESMAN', 7698, to_date('08-09-1981', 'DD-MM-YYYY'), 1500, 0, 30); insert into emp values (7876, 'ADAMS', 'CLERK', 7788, to_date('12-01-1983', 'DD-MM-YYYY'), 1100, null, 20); insert into emp values (7900, 'JAMES', 'CLERK', 7698, to_date('03-12-1981', 'DD-MM-YYYY'), 950, null, 30); insert into emp values (7902, 'FORD', 'ANALYST', 7566, to_date('03-12-1981', 'DD-MM-YYYY'), 3000, null, 20); insert into emp values (7934, 'MILLER', 'CLERK', 7782, to_date('23-01-1982', 'DD-MM-YYYY'), 1300, null, 10); commit / drop table emp_vpd purge / create table emp_vpd as select * from emp / commit /
And of course I need to grant access to this table to the newly created users:
grant all on emp_vpd to vpd1 / grant all on emp_vpd to vpd2 /
On the table I need to create a policy function so I create a package (which mimics the customers package, just simpler) to do this:
create or replace package emp_vpd_policy as function first_policy(owner_in in varchar2 ,objname_in in varchar2) return varchar2; function allowed(empno_in in number ,deptno_in in number) return number; end emp_vpd_policy; / sho err create or replace package body emp_vpd_policy as function first_policy(owner_in in varchar2 ,objname_in in varchar2) return varchar2 is begin dbms_output.put_line('first policy'); if (user = 'VPD1') then return 'emp_vpd_policy.allowed(emp_vpd.empno, emp_vpd.deptno)=10'; elsif user = 'VPD2' then return 'emp_vpd_policy.allowed(emp_vpd.empno, emp_vpd.deptno)=20'; else return '1=1'; end if; end first_policy; function allowed(empno_in in number ,deptno_in in number) return number is begin dbms_output.put_line('emp_vpd_policy.allowed(' || empno_in || ',' || deptno_in || ')'); return deptno_in; end allowed; end emp_vpd_policy; / sho err
and then protect the EMP_VPD table using a policy:
begin sys.dbms_rls.add_policy(object_schema => 'DEMO' ,object_name => 'EMP_VPD' ,policy_name => 'EMP_VPD_SEL' ,function_schema => '&myuser' ,policy_function => 'EMP_VPD_POLICY.FIRST_POLICY' ,statement_types => 'SELECT'); end; /
The package will show what will happen when I perform a select on the table:
conn vpd1/vpd1 set serveroutput on size unlimited select * from demo.emp_vpd /
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ----- ---------- --------- ----- ----------- --------- --------- ------ 7782 CLARK MANAGER 7839 6/9/1981 2450.00 10 7839 KING PRESIDENT 11/17/1981 5000.00 10 7934 MILLER CLERK 7782 1/23/1982 1300.00 10 first policy first policy emp_vpd_policy.allowed(7369,20) emp_vpd_policy.allowed(7499,30) emp_vpd_policy.allowed(7521,30) emp_vpd_policy.allowed(7566,20) emp_vpd_policy.allowed(7654,30) emp_vpd_policy.allowed(7698,30) emp_vpd_policy.allowed(7782,10) emp_vpd_policy.allowed(7788,20) emp_vpd_policy.allowed(7839,10) emp_vpd_policy.allowed(7844,30) emp_vpd_policy.allowed(7876,20) emp_vpd_policy.allowed(7900,30) emp_vpd_policy.allowed(7902,20) emp_vpd_policy.allowed(7934,10)
In my case this is done rather quickly, there’s almost no difference in timing for the query with or without the policy applied. But as you can see, the policy is executed for each and every record that is being checked. Well, not really the policy itself, but the function that is defined in the policy. So if this function takes a lot of time and your table has a lot of records then the query will run for a very long time. There has got to be a better way to do this.
Let’s analyze what happens, the actual policy is executed twice. What if we use this architecture to our benefit. In the first pass we can setup some in memory data structure to hold whatever we need, this might take some time and then in the second pass we can use this data to be used in the actual check.
First we drop the policy so we can create a new one:
begin sys.dbms_rls.drop_policy(object_schema => '&myuser' ,object_name => 'EMP_VPD' ,policy_name => 'EMP_VPD_SEL'); end; /
For our implementation we need a Nested Table type to be created in the database:
create or replace type empnos_tt is table of number(4) /
Then we create a new package to hold the policy function.
create or replace package emp_vpd_pp as function sel( owner_in in varchar2 , objname_in in varchar2 ) return varchar2; function read_g_empnos return empnos_tt; end emp_vpd_pp; / sho err
The function SEL will be used in the policy. The function READ_G_EMPNOS is needed to retrieve the data in the package variable. Then the actual implementation of the package:
create or replace package body emp_vpd_pp as g_empnos empnos_tt; beenhere boolean := false; function sel( owner_in in varchar2 , objname_in in varchar2 ) return varchar2 is begin if not(beenhere) then if user = 'VPD1' then begin select emp.empno bulk collect into g_empnos from emp where emp.deptno = 10; exception when others then dbms_output.put_line(sqlerrm); end; elsif user = 'VPD2' then begin select emp.empno bulk collect into g_empnos from emp where emp.deptno = 20; exception when others then dbms_output.put_line(sqlerrm); end; end if; end if; beenhere := not(beenhere); if ((user = 'VPD1') or (user = 'VPD2')) then return 'emp_vpd.empno in (select column_value from table(emp_vpd_pp.read_g_empnos))'; else return '1=1'; end if; end sel; function read_g_empnos return empnos_tt is begin return (g_empnos); end; begin beenhere := false; end emp_vpd_pp; / sho err
In the initialization section of the package we initialize the Boolean variable. Then, when the policy function is executed for the first time (per query) we select the column values we need and save that into the package variable. The second time we execute the policy function we use the values saved in the predicate that is being added.
begin sys.dbms_rls.add_policy(object_schema => 'DEMO' ,object_name => 'EMP_VPD' ,policy_name => 'EMP_VPD_SEL' ,function_schema => 'DEMO' ,policy_function => 'EMP_VPD_PP.SEL' ,statement_types => 'SELECT'); end; /
Notice the predicate with the use of the Nested Table is executed always, but the Nested Table is only filled up in the first execution of the policy function. Using this technique the database only has to execute the expensive query once and its result can be used multiple times at almost no cost.
Using this policy function has exactly the same result, but the execution improved dramatically. Using this technique the database only has to execute the expensive query once per query instead of for every row.
This post is also available at the AMIS blog
Dot Qualify Everything?
There is talk about Dot Qualifying Everything in your PL/SQL Code. But what are the pros and cons in this approach?
Let’s say we have a rather simple function which returns a string with alternatively upper and lower case characters. Normally I would write it as follows:
create or replace function wavey(string_in in varchar2) return varchar2 is l_returnvalue varchar2(30); begin for indx in 1 .. length(string_in) loop l_returnvalue := l_returnvalue || case mod(indx, 2) when 0 then upper(substr(string_in, indx, 1)) else lower(substr(string_in, indx, 1)) end; end loop; dbms_output.put_line(l_returnvalue); return l_returnvalue; end;
The output for this function using the ENAME column of the EMP table is like this:
sMiTh aLlEn wArD jOnEs mArTiN bLaKe cLaRk sCoTt kInG tUrNeR aDaMs jAmEs fOrD mIlLeR
But what if the was a malicious user that created a package names DBMS_OUTPUT which included all the programs in the original DBMS_OUTPUT package, but with some code added. Let’s create a simple package like this:
create or replace package dbms_output is procedure put_line(a in varchar2); end;
create or replace package body dbms_output is procedure put_line(a in varchar2) is begin sys.dbms_output.put('changed :'); sys.dbms_output.put_line(a); end; end;
Notice I just included the PUT_LINE procedure in here and I am not really doing anything malicious here. The output of my function would now be:
changed :sMiTh changed :aLlEn changed :wArD changed :jOnEs changed :mArTiN changed :bLaKe changed :cLaRk changed :sCoTt changed :kInG changed :tUrNeR changed :aDaMs changed :jAmEs changed :fOrD changed :mIlLeR
Not exactly the way I would want. How can I prevent this from happening? The answer is actually really simple. Qualify the call to DBMS_OUTPUT with the schema name where the packages resides:
create or replace function wavey(string_in in varchar2) return varchar2 is l_returnvalue varchar2(30); begin for indx in 1 .. length(string_in) loop l_returnvalue := l_returnvalue || case mod(indx, 2) when 0 then upper(substr(string_in, indx, 1)) else lower(substr(string_in, indx, 1)) end; end loop; sys.dbms_output.put_line(l_returnvalue); return l_returnvalue; end;
The output is back to what we expected
sMiTh aLlEn wArD jOnEs mArTiN bLaKe cLaRk sCoTt kInG tUrNeR aDaMs jAmEs fOrD mIlLeR
But should you Dot Qualify everything in your code? Like this?
create or replace function demo.wavey(string_in in sys.standard.varchar2) return sys.standard.varchar2 is l_returnvalue sys.standard.varchar2(30); begin <<theloop>> for indx in 1 .. length(wavey.string_in) loop wavey.l_returnvalue := wavey.l_returnvalue || case mod(theloop.indx, 2) when 0 then sys.standard.upper(standard.substr(wavey.string_in, theloop.indx, 1)) else sys.standard.lower(standard.substr(wavey.string_in, theloop.indx, 1)) end; end loop; sys.dbms_output.put_line(wavey.l_returnvalue); return wavey.l_returnvalue; end;
I don’t this adds to the readability of the code, but it sure makes your code safer against malicious users that want to implement code that can be executed by your programs. Please add your thoughts on this subject.