Easy XML - a Programming Oriented Approach
In a previous article I talked about using Natural Oracle Features to get XML data out of an Oracle database, and put data into an Oracle database, without actually having to learn a lot about XML related technologies. Still, there are many who continue to resist using OBJECTS, COLLECTIONS, and INSTEAD-OF-TRIGGERS in their daily work with the Oracle RDBMS. Since the biggest hurdle with XML seems to be parsing it into its component data elements, here is a more programming oriented approach to parsing XML data inside Oracle that should make this crowd happy. Fortunately we still find that we can do a lot of work without having to invest heavily in learning a lot of new technology.
We are going to learn about a small number of special XML operators/functions offered in Oracle SQL. Also, as I mentioned, there is a previous article I wrote about XML in Oracle that you may wish to read. Additionally, though we will be using XPATH in this discussion, I will not be teaching it here. For this article, what we need to know about XPATH will be pretty intuitive. In the end though, there is no substitute for doing your own research, so here are some URLs to get you started.
http://www.orafaq.com/node/1025
http://www.brainbell.com/tutorials/XML/Understanding_XPath.htm
http://www.oracle-base.com/articles/9i/XMLSEQUENCE.php
http://www.adp-gmbh.ch/ora/sql/extractvalue.html
http://www.adp-gmbh.ch/ora/sql/extract.html
What we want to get from this article, is how to parse an XML data packet in a more tranditional programming way without resorting to a lot of coding or use of intermediary constructs. I for one do not want to deal with Document Object Models, and SAX parsers and all that if I don't have to. For this we are going to use some Oracle operators/functions that work with XML data. Speficially we will be using XMLSEQUENCE, EXTRACT, and EXTRACTVALUE.
But, first, we need some data to work with. There was data in the previous article I wrote, so, once again taking advantage of the "lazy gene" which has advanced my developer skills so well over the years, we will use a dumbed down version of that data for this article.
/* drop table timesheet; drop table emp; drop table dept; */ create table dept (dept_id number not null,dname varchar2(10) not null) / alter table dept add constraint dept_pk primary key (dept_id) add constraint dept_Uk1 unique (dname) / insert into dept values (31,'SALES'); insert into dept values (32,'MARKETING'); create table emp (emp_id number not null,ename varchar2(10) not null,dept_id number) / alter table emp add constraint emp_pk primary key (emp_id) add constraint emp_uk1 unique (ename) add constraint emp_fk1 foreign key (dept_id) references dept / insert into emp values (33,'Joe',31); insert into emp values (34,'John',31); insert into emp values (35,'Janet',32); create table timesheet (timesheet_id number not null,week_date date not null,worked_hours number not null,emp_id number not null) / alter table timesheet add constraint timesheet_pk primary key (timesheet_id) add constraint timesheet_uk1 unique (emp_id,week_date) add constraint timesheet_fk1 foreign key (emp_id) references emp / insert into timesheet values (36,to_date('03-mar-2008','dd-mon-rrrr'),40,35); insert into timesheet values (37,to_date('10-mar-2008','dd-mon-rrrr'),40,35); commit /
Truly nothing overly exciting. Some Departments that have employees who may or may not have worked in a particular week and thus may or may not have a timesheet for that week. Note in particular from the data below that only Janet worked in the two weeks we have here, so only she has timesheets in our system. I guess Joe and John were out on the golf course leaving Janet to hold down the fort.
SQL> select * from dept; DEPT_ID DNAME ---------- ---------- 31 SALES 32 MARKETING 2 rows selected. SQL> select * from emp; EMP_ID ENAME DEPT_ID ---------- ---------- ---------- 33 Joe 31 34 John 31 35 Janet 32 3 rows selected. SQL> select * from timesheet; TIMESHEET_ID WEEK_DATE WORKED_HOURS EMP_ID ------------ --------- ------------ ---------- 36 03-MAR-08 40 35 37 10-MAR-08 40 35 2 rows selected.
Sometimes the best way to see a table is just to describe it.
SQL> set linesize 70 SQL> desc dept Name Null? Type ----------------------------------- -------- ------------------- DEPT_ID NOT NULL NUMBER DNAME NOT NULL VARCHAR2(10) SQL> desc emp Name Null? Type ----------------------------------- -------- ------------------- EMP_ID NOT NULL NUMBER ENAME NOT NULL VARCHAR2(10) DEPT_ID NUMBER SQL> desc timesheet Name Null? Type ----------------------------------- -------- ------------------- TIMESHEET_ID NOT NULL NUMBER WEEK_DATE NOT NULL DATE WORKED_HOURS NOT NULL NUMBER EMP_ID NOT NULL NUMBER
You may recall from the previous article that I am somewhat fond of the CURSOR EXPRESSION because it makes for an easy way to pull together hierarchically related data into a single query. Using CURSOR EXPRESSIONS we can easily construct a single query that will get all the data from all three of our tables, composed in a hierarchical format. This query will retreive our data.
select dept.* ,cursor( select emp.* ,cursor( select timesheet.* from timesheet where timesheet.emp_id = emp.emp_id ) timesheet_list from emp where emp.dept_id = dept.dept_id ) emp_list from dept /
In examining this query we would note that we use cursor expressions as a way of collecting a set of rows together as one "clump of stuff" that we can treat as a single column value. Additionally you may notice that we can just as easily nest cursor expressions which lays the foundation for building our hiearchy of data. Hence this query returns department rows. Each DEPT row returned has a list of EMP rows attached to it as a single column. We achieved this by using a cursor expression to get all the EMP rows for a DEPT row. Our cursor expression correlates child rows to the appropriate parent. In turn each EMP row has attached to it, its corresponding TIMESHEET rows again collected using a cursor expression. This is what makes cursor expressions so cool to work with; the ability to quickly nest hierarchies of data within a single query, wherein each nested component is correlated to a parent row. You will please notice how the WHERE clauses of each cursor expression perform the correlation of data back to a parent row.
So, this query would read like this:
for each department add all that department's employee rows for each employee row add all that employee's timesheet rows Dump each the department row
If we execute the above query, we get the following data.
DEPT_ID DNAME EMP_LIST ---------- ---------- -------------------- 31 SALES CURSOR STATEMENT : 3 CURSOR STATEMENT : 3 EMP_ID ENAME DEPT_ID TIMESHEET_LIST ---------- ---------- ---------- -------------------- 33 Joe 31 CURSOR STATEMENT : 4 CURSOR STATEMENT : 4 no rows selected EMP_ID ENAME DEPT_ID TIMESHEET_LIST ---------- ---------- ---------- -------------------- 34 John 31 CURSOR STATEMENT : 4 CURSOR STATEMENT : 4 no rows selected 2 rows selected. DEPT_ID DNAME EMP_LIST ---------- ---------- -------------------- 32 MARKETING CURSOR STATEMENT : 3 CURSOR STATEMENT : 3 EMP_ID ENAME DEPT_ID TIMESHEET_LIST ---------- ---------- ---------- -------------------- 35 Janet 32 CURSOR STATEMENT : 4 CURSOR STATEMENT : 4 TIMESHEET_ID WEEK_DATE WORKED_HOURS EMP_ID ------------ -------------------- ------------ ---------- 36 03-mar-2008 00:00:00 40 35 37 10-mar-2008 00:00:00 40 35 2 rows selected. 1 row selected. 2 rows selected.
Admittedly this is pretty ugly to look at. But we don't really care. What we are interested in is that the query properly nests data such that for each deparment, we have attached the employee rows for that department, and for each employee, we have attached the timesheet rows for that employee. You should take a moment to satisfy yourself that indeed the data above is properly nested.
You may also recall from the previous article, that we used a database package called DBMS_XMLGEN. This package as the name suggests is an Oracle supplied package that can be used to generate XML from the database without a lot of effort. In particular there are several functions in the package that will take the text of a SELECT statement, execute the query, and return the resulting data in XML format. We can therefore take the query above and supply it to one of these DBMS_XMLGEN functions and get out XML instead of the ugly data dump we just looked at.
This (notice we are passing our cursor expression heavey query to the packaged function):
set linesize 9999 set pagesize 999 set feedback 1 set long 99999999 select dbms_xmlgen.getxmltype( ' select dept.*' ||' ,cursor(' ||' select emp.*' ||' ,cursor(' ||' select timesheet.*' ||' from timesheet' ||' where timesheet.emp_id = emp.emp_id' ||' ) timesheet_list' ||' from emp' ||' where emp.dept_id = dept.dept_id' ||' ) emp_list' ||' from dept' ) payload from dual /
Generates this (notice the column name. This was not generated by Oracle, it is our supplied column name (check the query)):
PAYLOAD ---------------------------------------------------------------- <ROWSET> <ROW> <DEPT_ID>31</DEPT_ID> <DNAME>SALES</DNAME> <EMP_LIST> <EMP_LIST_ROW> <EMP_ID>33</EMP_ID> <ENAME>Joe</ENAME> <DEPT_ID>31</DEPT_ID> <TIMESHEET_LIST/> </EMP_LIST_ROW> <EMP_LIST_ROW> <EMP_ID>34</EMP_ID> <ENAME>John</ENAME> <DEPT_ID>31</DEPT_ID> <TIMESHEET_LIST/> </EMP_LIST_ROW> </EMP_LIST> </ROW> <ROW> <DEPT_ID>32</DEPT_ID> <DNAME>MARKETING</DNAME> <EMP_LIST> <EMP_LIST_ROW> <EMP_ID>35</EMP_ID> <ENAME>Janet</ENAME> <DEPT_ID>32</DEPT_ID> <TIMESHEET_LIST> <TIMESHEET_LIST_ROW> <TIMESHEET_ID>36</TIMESHEET_ID> <WEEK_DATE>03-MAR-08</WEEK_DATE> <WORKED_HOURS>40</WORKED_HOURS> <EMP_ID>35</EMP_ID> </TIMESHEET_LIST_ROW> <TIMESHEET_LIST_ROW> <TIMESHEET_ID>37</TIMESHEET_ID> <WEEK_DATE>10-MAR-08</WEEK_DATE> <WORKED_HOURS>40</WORKED_HOURS> <EMP_ID>35</EMP_ID> </TIMESHEET_LIST_ROW> </TIMESHEET_LIST> </EMP_LIST_ROW> </EMP_LIST> </ROW> </ROWSET> 1 row selected.
This is the same data we have seen three times now; first as simple dumps of tables (as seven rows), then as an ugly dump of a query with cursor expressions (as two rows), and now as a dump of XML (in one row).
Let me stress that even though our raw data started as seven rows across three tables, the feedback above from Oracle SQLPLUS clearly tells us there was only 1 ROW SELECTED. This should be obvious since the main table of the query above is the DUAL table which contains only one row. But the fact that our XML data is retuned in a formatted manner makes it easy to misinterpret the results as a series of many rows returned. But this would not be correct. We have only one row, with one column called PAYLOAD (cause that is what we named it in our select statement), whose value is all of our DEPT/EMP/TIMSHEET data as one big clump of XML. It should also be noted that the datatype of this column is XMLTYPE. The particular function we used, DBMS_XMLGEN.GETXMLTYPE returns XMLTYPE, not CLOB, not VARCHAR2. This is good because the functions we are eventually going to use expect this.
Certainly XML is a bit wordy, but if you have an XML editor, you can much more readily see the nesting of data contained in the XML and confirm yet again that this data is nested correctly with respect to our tables and their foreign keys. Reading this XML data stream we see that:
A SALES department has an employee Joe who has no timesheets, and an employee John who has no timesheets. A MARTKETING department has an employee Janet and she does in fact have two timesheets.
OK, LET US MOVE ON TO WHAT WE REALLY WANT TO DO.
If we were given the XML datagram above, how could we parse it into something we can manage inside an Oracle development environment, without a lot of trouble? I find that if a small code snippet is available, looking at it is often the best way to learn an idea, so here is a small code snippet that will do just what we want, take the above XML packet, and parse it into something that looks like actual database data. Where we go with it from there is up to us.
alter session set nls_date_format = 'dd-mon-rrrr hh24:mi:ss'; col dept_id format a10 col dname format a10 col emp_id format a10 col ename format a10 col timesheet_id format a15 col week_date format a20 col worked_hours format a15 set linesize 999 set pagesize 999 with xml_datagram as ( select dbms_xmlgen.getxmltype( ' select dept.*' ||' ,cursor(' ||' select emp.*' ||' ,cursor(' ||' select timesheet.*' ||' from timesheet' ||' where timesheet.emp_id = emp.emp_id' ||' ) timesheet_list' ||' from emp' ||' where emp.dept_id = dept.dept_id' ||' ) emp_list' ||' from dept' ) payload from dual ) select extractvalue(x.column_value,'/ROW/DEPT_ID') DEPT_ID ,extractvalue(x.column_value,'/ROW/DNAME') DNAME ,extractvalue(y.column_value,'/EMP_LIST_ROW/EMP_ID') EMP_ID ,extractvalue(y.column_value,'/EMP_LIST_ROW/ENAME') ENAME ,extractvalue(z.column_value,'/TIMESHEET_LIST_ROW/TIMESHEET_ID') TIMESHEET_ID ,extractvalue(z.column_value,'/TIMESHEET_LIST_ROW/WEEK_DATE') WEEK_DATE ,extractvalue(z.column_value,'/TIMESHEET_LIST_ROW/WORKED_HOURS') WORKED_HOURS from xml_datagram ,table(xmlsequence(extract(xml_datagram.payload,'/ROWSET/ROW')))(+) x ,table(xmlsequence(extract(x.column_value,'/ROW/EMP_LIST/EMP_LIST_ROW')))(+) y ,table(xmlsequence(extract(y.column_value,'/EMP_LIST_ROW/TIMESHEET_LIST/TIMESHEET_LIST_ROW')))(+) z / DEPT_ID DNAME EMP_ID ENAME TIMESHEET_ID WEEK_DATE WORKED_HOURS ---------- ---------- ---------- ---------- --------------- -------------------- --------------- 31 SALES 33 Joe 31 SALES 34 John 32 MARKETING 35 Janet 36 03-mar-2008 00:00:00 40 32 MARKETING 35 Janet 37 10-mar-2008 00:00:00 40 4 rows selected.
OK, don't be put off. This is a cool little statement for it shows several useful Oracle features. Let us disect it from the top down. First we see a WITH clause. I wrote an article about the WITH clause for OraFaq, and here is the URL for it should you not know what the WITH clause is about.
http://www.orafaq.com/node/1879
A possibly incorrect but useful way to think about the WITH clause is; it creates a "VIRTUAL" table, as part of a "PRE-STEP" to the execution of a main SELECT statement. Yes this description is lacking for although Oracle could in fact choose to create an in-memory virtual table for this data, it usually does not, preferring instead to do code merging before execution. But we don't really care about that right now. We can think of the WITH clause as acting this way, because it is an easy way to get our head around how we can exploit the WITH clause. You will please notice that the SQL SELECT of the WITH clause is infact the same query we executed about and page or so back (our cursor expressions heavey query fead to DBMS_XMLGEN.GETXMLTYPE) which generated XML for us. If you skipped this because you were in a hurry, back up and look at it.
With all this in mind we can play some make-believe. Using this WITH clause, we can make believe that:
1) there is a table called XML_DATAGRAM in our database
2) this table has one column on it called PAYLOAD of datatype XMLTYPE (the column name from our query)
3) XML_DATAGRAM has only one row and the PAYLOAD column of this row has our DEPT/EMP/TIMESHEET data formatted as XML
Therefore, we can reference this fictitious XML_DATAGRAM table in our main select statement, which is exactly what we do in the FROM clause.
Why did we do all this? No reason, except that if we are going to unpack some XML using a small number of fancy native Oracle operators/functions, we first need to get some XML from somewhere that we can unpack. This does that for us. This WITH clause is constructing some XML so we can unpack it.
To see how we unpack this XML, let us take a look at the FROM clause of our "unpack" query. What can we say about it?
from xml_datagram ,table(xmlsequence(extract(xml_datagram.payload,'/ROWSET/ROW')))(+) x ,table(xmlsequence(extract(x.column_value,'/ROW/EMP_LIST/EMP_LIST_ROW')))(+) y ,table(xmlsequence(extract(y.column_value,'/EMP_LIST_ROW/TIMESHEET_LIST/TIMESHEET_LIST_ROW')))(+) z /
1) it is interesting to note that there are in fact no real tables to be seen anywhere in this FROM clause. Kind of neat.
2) there are obviously four ROW SOURCES (for lack of a better term), but what is less obvious is that these ROW SOURCES are connected to each other such that each takes as input, data from the ROW SOURCE that came before it. These ROW SOURCES in effect represent a sequence of events happening one after the other, starting with acquisition of each row from the first ROW SOURCE. Indeed, the order of ROW SOURCES in the FROM clause is important here for unlike a "normal" FROM clause where joins are done in a WHERE clause to follow, the connections between ROW SOURCES here, is defined by the act of feeding values between them. The Oracle Parser (at least for some releases) requires the order of ROW SOURCES in the FROM clause from top down (or left right) to be in the order in which data is feed. Thus the first ROW SOURCE of the FROM clause here, must come before the second ROW SOURCE, because column(s) from the first ROW SOURCE in the FROM clause feed into the second. And so on.
3) we see a common combination of SQL operators in action used in our TABLE based ROW SOURCES: TABLE(XMLSEQUENCE(EXTRACT())). This is the magic sequence of operators that turns a single XML snippet containing multiple nodes, into multiple XML snippets that contain a single node, treated as rows; and thus into something that can be treated as a ROW SOURCE.
4) there is some XPATH involved here. You will notice that each TABLE based ROW SOURCE accepts an XML snippet, and an XPATH expression (see EXTRACT). The magic sequence of operators is going to scan the XML snippet and pull from it the nodes defined by the XPATH expression, break each node found by the XPATH expression into its own XLM snippet, and convert them into rows. At the risk of being technically inaccurate, let me explain what is happening because of the TABLE(XMLSEQUENCE(EXTRACT()) operator sequence. EXTRACT will pull out from the input XML snippet those nodes matching the XPATH EXPRESSION and return all found nodes as one smaller yet still single XML snippet. XMLSEQUENCE will break up the single XML snippet of multiple nodes comming out of EXTRACT into multiple XML snippets each of which contains only a single node. TABLE will allow addressing of each XML single node snippet as a row with a column of XMLTYPE called COLUMN_VALUE (some default of Oracle's (We could also have used the VALUE() function instead of the column name, and indeed that may even be more correct, I don't really know, but let us not worry about that, it is not relevent here)). You may recall that our original data contains DEPT with EMPs with TIMESHEETS. We packaged three tables in a hierarchy into our XML data stream. We also have three TABLE based ROW SOURCES in our FROM clause. This are no coinicidence as each TABLE based ROW SOURCE will be responsbile for unpacking the data that came from one table of our thre table hierarchy. Thus the first TABLE based ROW SOURCE will unpack DEPT data. The second will unpack EMP data. The third will unpack TIMEHSEET data. If you are a bit lost, it would pay for you to revisit the XML data stream and get into your head, how the data goes from three tables, to a single hierarchically organized XML data packet.
5) there is an outer-join operator that follows all the TABLE based ROW SOURCES. We will wait to see why this in necessary.
6) we note that each TABLE based ROW SOURCE is given an alias so we can reference it (in this case, X,Y, or Z).
7) LASTLY we note that each TABLE based ROW SOURCE has only one column and that column is generically named COLUMN_VALUE, an oracle default. It may be technically incorrect to reference this attribute, but I find the VALUE() syntax to be most confusing.
Here is an in-depth description of the sequence of events happening in this FROM clause:
1) We get a row off XML_DATAGRAM. As a reminder we remember that XML_DATAGRAM is the virtual table we constructed using a WITH clause only for purposes of building some XML data to work with. And we remember that XML_DATAGRAM has only one row. And we remember that this XML_DATAGRAM has only one column whose name is PAYLOAD. And we remember that the data in this PAYLOAD column is a XML datagram stuffed with our DEPT/EMP/TIMSHEET data.
2A) The PAYLOAD column is passed to the first TABLE based ROW SOURCE in the FROM clause (aliased as X). The EXTRACT operater takes in the XML value in PAYLOAD, and returns an XML value of the nodes defined by the XPATH expression also supplied.
2B) The XMLSEQUENCE operator then takes the XML output from the EXTRACT operator which is a single XML packet containing a set of nodes, and converts this into multiple XML packets each containing a single node.
2C) The TABLE operator turns the sequence of XML nodes into rows. This in effect has unpacked the DEPT rows from the XML data.
3) The EMP list of rows for any given DEPT row is housed in a column called X.COLUMN_VALUE also as XML. This is passed to the second TABLE based ROW SOURCE (aliased as Y). EXTRACT/XMLSEQUENCE/TABLE then do their stuff again. The result is that EMP rows are unpacked.
4) For each EMP row of a DEPT row, there is a list of TIMESHEETS housed in a column called Y.COLUMN_VALUE as XML. This is passed to the third TABLE based ROW SOURCE (aliased as Z) where EXTRACT/XMLSEQUENCE/TABLE once again do some unpacking, this time unpacking the timesheets (if there are any).
5) At this point, all the data from our three tables has been unpacked into simple XML snippets each of which is addressable as a row. We can eventually use the EXTRACTVALUE function in the select clause to get actual data values for DEPT/EMP/TIMESHEET data. We will see this later. At the moment we are mainly intersted in how the nested XML is unpacked to expose our original table data, which is what we have just discussed.
6) GO BACK AND REPEAT 1-5 till done.
We can take a look at some of the intermediary steps if we like.
Here is the unpacking of DEPT nodes. Notice how the outer most node is based on rows from the DEPT table.
with xml_datagram as ( select dbms_xmlgen.getxmltype( ' select dept.*' ||' ,cursor(' ||' select emp.*' ||' ,cursor(' ||' select timesheet.*' ||' from timesheet' ||' where timesheet.emp_id = emp.emp_id' ||' ) timesheet_list' ||' from emp' ||' where emp.dept_id = dept.dept_id' ||' ) emp_list' ||' from dept' ) payload from dual ) select extract(xml_datagram.payload,'/ROWSET/ROW') from xml_datagram / EXTRACT(XML_DATAGRAM.PAYLOAD,'/ROWSET/ROW') ------------------------------------------------------ <ROW> <DEPT_ID>31</DEPT_ID> <DNAME>SALES</DNAME> <EMP_LIST> <EMP_LIST_ROW> <EMP_ID>33</EMP_ID> <ENAME>Joe</ENAME> <DEPT_ID>31</DEPT_ID> <TIMESHEET_LIST/> </EMP_LIST_ROW> <EMP_LIST_ROW> <EMP_ID>34</EMP_ID> <ENAME>John</ENAME> <DEPT_ID>31</DEPT_ID> <TIMESHEET_LIST/> </EMP_LIST_ROW> </EMP_LIST> </ROW> <ROW> <DEPT_ID>32</DEPT_ID> <DNAME>MARKETING</DNAME> <EMP_LIST> <EMP_LIST_ROW> <EMP_ID>35</EMP_ID> <ENAME>Janet</ENAME> <DEPT_ID>32</DEPT_ID> <TIMESHEET_LIST> <TIMESHEET_LIST_ROW> <TIMESHEET_ID>36</TIMESHEET_ID> <WEEK_DATE>03-MAR-08</WEEK_DATE> <WORKED_HOURS>40</WORKED_HOURS> <EMP_ID>35</EMP_ID> </TIMESHEET_LIST_ROW> <TIMESHEET_LIST_ROW> <TIMESHEET_ID>37</TIMESHEET_ID> <WEEK_DATE>10-MAR-08</WEEK_DATE> <WORKED_HOURS>40</WORKED_HOURS> <EMP_ID>35</EMP_ID> </TIMESHEET_LIST_ROW> </TIMESHEET_LIST> </EMP_LIST_ROW> </EMP_LIST> </ROW> 1 row selected.
If you compare this output to the original XML datagram you will notice that it no longer contains the ROWSET gift wrapping. This should make it clearer for us to see that there are indeed two nodes in this XML and each of these nodes maps to the department rows from our original query. But EXTRACT returns only a single XML packet. We need XMLSEQUENCE (or more correctly the combination of TABLE(XMLSEQUENCE()) in order to break out the two department nodes into two XML packets each with one deparment node. This can be seen here.
with xml_datagram as ( select dbms_xmlgen.getxmltype( ' select dept.*' ||' ,cursor(' ||' select emp.*' ||' ,cursor(' ||' select timesheet.*' ||' from timesheet' ||' where timesheet.emp_id = emp.emp_id' ||' ) timesheet_list' ||' from emp' ||' where emp.dept_id = dept.dept_id' ||' ) emp_list' ||' from dept' ) payload from dual ) select x.column_value from xml_datagram ,table(xmlsequence(extract(xml_datagram.payload,'/ROWSET/ROW')))(+) x / COLUMN_VALUE ---------------------------------------------------------- <ROW> <DEPT_ID>31</DEPT_ID> <DNAME>SALES</DNAME> <EMP_LIST> <EMP_LIST_ROW> <EMP_ID>33</EMP_ID> <ENAME>Joe</ENAME> <DEPT_ID>31</DEPT_ID> <TIMESHEET_LIST/> </EMP_LIST_ROW> <EMP_LIST_ROW> <EMP_ID>34</EMP_ID> <ENAME>John</ENAME> <DEPT_ID>31</DEPT_ID> <TIMESHEET_LIST/> </EMP_LIST_ROW> </EMP_LIST> </ROW> <ROW> <DEPT_ID>32</DEPT_ID> <DNAME>MARKETING</DNAME> <EMP_LIST> <EMP_LIST_ROW> <EMP_ID>35</EMP_ID> <ENAME>Janet</ENAME> <DEPT_ID>32</DEPT_ID> <TIMESHEET_LIST> <TIMESHEET_LIST_ROW> <TIMESHEET_ID>36</TIMESHEET_ID> <WEEK_DATE>03-MAR-08</WEEK_DATE> <WORKED_HOURS>40</WORKED_HOURS> <EMP_ID>35</EMP_ID> </TIMESHEET_LIST_ROW> <TIMESHEET_LIST_ROW> <TIMESHEET_ID>37</TIMESHEET_ID> <WEEK_DATE>10-MAR-08</WEEK_DATE> <WORKED_HOURS>40</WORKED_HOURS> <EMP_ID>35</EMP_ID> </TIMESHEET_LIST_ROW> </TIMESHEET_LIST> </EMP_LIST_ROW> </EMP_LIST> </ROW> 2 rows selected.
HAHA! Our one XML packet with two nodes has become two XML packets with one node each. Further, these nodes are now addressable as rows. We now have two rows each with a smaller XML datagram containing the information from one department. So now we can continue.
Here we can see the unpacking of EMP rows.
with xml_datagram as ( select dbms_xmlgen.getxmltype( ' select dept.*' ||' ,cursor(' ||' select emp.*' ||' ,cursor(' ||' select timesheet.*' ||' from timesheet' ||' where timesheet.emp_id = emp.emp_id' ||' ) timesheet_list' ||' from emp' ||' where emp.dept_id = dept.dept_id' ||' ) emp_list' ||' from dept' ) payload from dual ) select extract(extract(xml_datagram.payload,'/ROWSET/ROW'),'/ROW/EMP_LIST/EMP_LIST_ROW') from xml_datagram / EXTRACT(EXTRACT(XML_DATAGRAM.PAYLOAD,'/ROWSET/ROW'),'/ROW/EMP_LIST/EMP_LIST_ROW') -------------------------------------------------------------------------------------------------- <EMP_LIST_ROW> <EMP_ID>33</EMP_ID> <ENAME>Joe</ENAME> <DEPT_ID>31</DEPT_ID> <TIMESHEET_LIST/> </EMP_LIST_ROW> <EMP_LIST_ROW> <EMP_ID>34</EMP_ID> <ENAME>John</ENAME> <DEPT_ID>31</DEPT_ID> <TIMESHEET_LIST/> </EMP_LIST_ROW> <EMP_LIST_ROW> <EMP_ID>35</EMP_ID> <ENAME>Janet</ENAME> <DEPT_ID>32</DEPT_ID> <TIMESHEET_LIST> <TIMESHEET_LIST_ROW> <TIMESHEET_ID>36</TIMESHEET_ID> <WEEK_DATE>03-MAR-08</WEEK_DATE> <WORKED_HOURS>40</WORKED_HOURS> <EMP_ID>35</EMP_ID> </TIMESHEET_LIST_ROW> <TIMESHEET_LIST_ROW> <TIMESHEET_ID>37</TIMESHEET_ID> <WEEK_DATE>10-MAR-08</WEEK_DATE> <WORKED_HOURS>40</WORKED_HOURS> <EMP_ID>35</EMP_ID> </TIMESHEET_LIST_ROW> </TIMESHEET_LIST> </EMP_LIST_ROW> 1 row selected.
Again using TABLE(XMLSEQUENCE()) we can convert our unpacked node data into multiple nodes as rows to get three EMP rows.
with xml_datagram as ( select dbms_xmlgen.getxmltype( ' select dept.*' ||' ,cursor(' ||' select emp.*' ||' ,cursor(' ||' select timesheet.*' ||' from timesheet' ||' where timesheet.emp_id = emp.emp_id' ||' ) timesheet_list' ||' from emp' ||' where emp.dept_id = dept.dept_id' ||' ) emp_list' ||' from dept' ) payload from dual ) select y.column_value from xml_datagram ,table(xmlsequence(extract(xml_datagram.payload,'/ROWSET/ROW')))(+) x ,table(xmlsequence(extract(x.column_value,'/ROW/EMP_LIST/EMP_LIST_ROW')))(+) y / COLUMN_VALUE ---------------------------------------------------------------------------------------------------- <EMP_LIST_ROW> <EMP_ID>33</EMP_ID> <ENAME>Joe</ENAME> <DEPT_ID>31</DEPT_ID> <TIMESHEET_LIST/> </EMP_LIST_ROW> <EMP_LIST_ROW> <EMP_ID>34</EMP_ID> <ENAME>John</ENAME> <DEPT_ID>31</DEPT_ID> <TIMESHEET_LIST/> </EMP_LIST_ROW> <EMP_LIST_ROW> <EMP_ID>35</EMP_ID> <ENAME>Janet</ENAME> <DEPT_ID>32</DEPT_ID> <TIMESHEET_LIST> <TIMESHEET_LIST_ROW> <TIMESHEET_ID>36</TIMESHEET_ID> <WEEK_DATE>03-MAR-08</WEEK_DATE> <WORKED_HOURS>40</WORKED_HOURS> <EMP_ID>35</EMP_ID> </TIMESHEET_LIST_ROW> <TIMESHEET_LIST_ROW> <TIMESHEET_ID>37</TIMESHEET_ID> <WEEK_DATE>10-MAR-08</WEEK_DATE> <WORKED_HOURS>40</WORKED_HOURS> <EMP_ID>35</EMP_ID> </TIMESHEET_LIST_ROW> </TIMESHEET_LIST> </EMP_LIST_ROW> 3 rows selected.
And continuing on, we can see the unpacking of TIMESHEET rows where we see two nodes.
with xml_datagram as ( select dbms_xmlgen.getxmltype( ' select dept.*' ||' ,cursor(' ||' select emp.*' ||' ,cursor(' ||' select timesheet.*' ||' from timesheet' ||' where timesheet.emp_id = emp.emp_id' ||' ) timesheet_list' ||' from emp' ||' where emp.dept_id = dept.dept_id' ||' ) emp_list' ||' from dept' ) payload from dual ) select extract(extract(extract(xml_datagram.payload,'/ROWSET/ROW') ,'/ROW/EMP_LIST/EMP_LIST_ROW') ,'/EMP_LIST_ROW/TIMESHEET_LIST/TIMESHEET_LIST_ROW') from xml_datagram / EXTRACT(EXTRACT(EXTRACT(XML_DATAGRAM.PAYLOAD,'/ROWSET/ROW'),'/ROW/EMP_LIST/EMP_LIST_ROW'),'/EMP_LIST ---------------------------------------------------------------------------------------------------- <TIMESHEET_LIST_ROW> <TIMESHEET_ID>36</TIMESHEET_ID> <WEEK_DATE>03-MAR-08</WEEK_DATE> <WORKED_HOURS>40</WORKED_HOURS> <EMP_ID>35</EMP_ID> </TIMESHEET_LIST_ROW> <TIMESHEET_LIST_ROW> <TIMESHEET_ID>37</TIMESHEET_ID> <WEEK_DATE>10-MAR-08</WEEK_DATE> <WORKED_HOURS>40</WORKED_HOURS> <EMP_ID>35</EMP_ID> </TIMESHEET_LIST_ROW> 1 row selected.
And again the conversion of these two nodes into two rows of an XML snippet.
with xml_datagram as ( select dbms_xmlgen.getxmltype( ' select dept.*' ||' ,cursor(' ||' select emp.*' ||' ,cursor(' ||' select timesheet.*' ||' from timesheet' ||' where timesheet.emp_id = emp.emp_id' ||' ) timesheet_list' ||' from emp' ||' where emp.dept_id = dept.dept_id' ||' ) emp_list' ||' from dept' ) payload from dual ) select z.column_value from xml_datagram ,table(xmlsequence(extract(xml_datagram.payload,'/ROWSET/ROW')))(+) x ,table(xmlsequence(extract(x.column_value,'/ROW/EMP_LIST/EMP_LIST_ROW')))(+) y ,table(xmlsequence(extract(y.column_value,'/EMP_LIST_ROW/TIMESHEET_LIST/TIMESHEET_LIST_ROW')))(+) z / COLUMN_VALUE ---------------------------------------------------------------------------------------------------- <TIMESHEET_LIST_ROW> <TIMESHEET_ID>36</TIMESHEET_ID> <WEEK_DATE>03-MAR-08</WEEK_DATE> <WORKED_HOURS>40</WORKED_HOURS> <EMP_ID>35</EMP_ID> </TIMESHEET_LIST_ROW> <TIMESHEET_LIST_ROW> <TIMESHEET_ID>37</TIMESHEET_ID> <WEEK_DATE>10-MAR-08</WEEK_DATE> <WORKED_HOURS>40</WORKED_HOURS> <EMP_ID>35</EMP_ID> </TIMESHEET_LIST_ROW> 4 rows selected.
Hey wait a minute. That says 4 rows selected but I only see two, how come? This is where the outerjoin syntax we saw earlier comes into play. In recalling the original data that went into our XML datagram: Janet has two timesheets. These are the two we see. Joe and John had no timesheets. Using the outerjoin syntax in our FROM clause lets us bring back an empty timesheet for each of these two employess who have none. Thus two plus two makes four.
The above data dumps were simply my trying to show you how the unpacking was working. In the end our real query leads us to be able to do the following:
alter session set nls_date_format = 'dd-mon-rrrr hh24:mi:ss'; col dept_id format a10 col dname format a10 col emp_id format a10 col ename format a10 col timesheet_id format a15 col week_date format a20 col worked_hours format a15 set linesize 999 set pagesize 999 with xml_datagram as ( select dbms_xmlgen.getxmltype( ' select dept.*' ||' ,cursor(' ||' select emp.*' ||' ,cursor(' ||' select timesheet.*' ||' from timesheet' ||' where timesheet.emp_id = emp.emp_id' ||' ) timesheet_list' ||' from emp' ||' where emp.dept_id = dept.dept_id' ||' ) emp_list' ||' from dept' ) payload from dual ) select extractvalue(x.column_value,'/ROW/DEPT_ID') DEPT_ID ,extractvalue(x.column_value,'/ROW/DNAME') DNAME ,extractvalue(y.column_value,'/EMP_LIST_ROW/EMP_ID') EMP_ID ,extractvalue(y.column_value,'/EMP_LIST_ROW/ENAME') ENAME ,extractvalue(z.column_value,'/TIMESHEET_LIST_ROW/TIMESHEET_ID') TIMESHEET_ID ,extractvalue(z.column_value,'/TIMESHEET_LIST_ROW/WEEK_DATE') WEEK_DATE ,extractvalue(z.column_value,'/TIMESHEET_LIST_ROW/WORKED_HOURS') WORKED_HOURS from xml_datagram ,table(xmlsequence(extract(xml_datagram.payload,'/ROWSET/ROW')))(+) x ,table(xmlsequence(extract(x.column_value,'/ROW/EMP_LIST/EMP_LIST_ROW')))(+) y ,table(xmlsequence(extract(y.column_value,'/EMP_LIST_ROW/TIMESHEET_LIST/TIMESHEET_LIST_ROW')))(+) z / DEPT_ID DNAME EMP_ID ENAME TIMESHEET_ID WEEK_DATE WORKED_HOURS ---------- ---------- ---------- ---------- --------------- -------------------- --------------- 31 SALES 33 Joe 31 SALES 34 John 32 MARKETING 35 Janet 36 03-mar-2008 00:00:00 40 32 MARKETING 35 Janet 37 10-mar-2008 00:00:00 40 4 rows selected.
Having been able to roll away the layers of the XML packaging to reveal deeper and deeper layers of data, we can finally use the EXTRACTVALUE() function in our select list to acquire the individual data elements that live at each level of the XML datagram. Remember, each reference to COLUMN_VALUE is to an XML packet from one of the ROW SOURCES, and each contains a few elemental data pieces, and a more detailed node of child rows. Said another way, when ROW SOURCE: X unpacked our department rows, it exposed as simple XML nodes, the DEPT_ID,DNAME column values, and the XML node containing nested EMP rows. When our ROW SOURCE: Y unpacked an EMP row, it exposed the EMP_ID,ENAME column values, and the XML node containing nested TIMESHEET rows. When our ROW SOURCE: Z unpacked a TIMESHEET row, it exposed all the TIMESHEET column values (TIMESHEET_ID,WEEK_DATE,WORKED_HOURS).
So as we see, using TABLE(XMLSEQUENCE(EXTRACT()) and EXTRACTVALUE() allow us to unpack an XML data packet without resorting to use of COLLECTIONS, VIEWS, and INSTEAD-OF-TRIGGERS. We rely on more traditional operator/function style programming to do it. Of course this is not perfect either. It has some things to consider.
1) Consider that there is no real typing of data elments. Everything looks like a string once it is unpacked. So you will have to do some conversion of data in order to make some items like dates and numbers meaningful.
2) Also, this method using SQL is limited in that it can only really handle a simple hierachy of data not a true tree. We can get around this if we want to use PL/SQL instead.
3) If you supply an invalid XPATH expression (eg. meaning an expression for which there is no node in the data), you do not get any error, you simply get no data. This may be either good or bad I guess.
Here is the PL/SQL version of the same unpacking.
set serveroutput on alter session set nls_date_format = 'dd-mon-rrrr hh24:mi:ss'; declare xml_datagram_v xmltype; begin select dbms_xmlgen.getxmltype( ' select dept.*' ||' ,cursor(' ||' select emp.*' ||' ,cursor(' ||' select timesheet.*' ||' from timesheet' ||' where timesheet.emp_id = emp.emp_id' ||' ) timesheet_list' ||' from emp' ||' where emp.dept_id = dept.dept_id' ||' ) emp_list' ||' from dept' ) payload into xml_datagram_v from dual ; for r1 in ( select x.column_value ,extractvalue(x.column_value,'/ROW/DEPT_ID') DEPT_ID ,extractvalue(x.column_value,'/ROW/DNAME') DNAME from table(xmlsequence(extract(xml_datagram_v,'/ROWSET/ROW'))) x ) loop null; dbms_output.put_line('(dept_id,dname)=('||r1.dept_id||','||r1.dname||')'); for r2 in ( select y.column_value ,extractvalue(y.column_value,'/EMP_LIST_ROW/EMP_ID') EMP_ID ,extractvalue(y.column_value,'/EMP_LIST_ROW/ENAME') ENAME from table(xmlsequence(extract(r1.column_value,'/ROW/EMP_LIST/EMP_LIST_ROW'))) y ) loop null; dbms_output.put_line('(emp_id,ename)=('||r2.emp_id||','||r2.ename||')'); for r3 in ( select z.column_value ,extractvalue(z.column_value,'/TIMESHEET_LIST_ROW/TIMESHEET_ID') TIMESHEET_ID ,extractvalue(z.column_value,'/TIMESHEET_LIST_ROW/WEEK_DATE') WEEK_DATE ,extractvalue(z.column_value,'/TIMESHEET_LIST_ROW/WORKED_HOURS') WORKED_HOURS from table(xmlsequence(extract(r2.column_value,'/EMP_LIST_ROW/TIMESHEET_LIST/TIMESHEET_LIST_ROW'))) z ) loop null; dbms_output.put_line('(timesheet_id,week_date,worked_hours)=('||r3.timesheet_id||','||r3.week_date||','||r3.worked_hours||')'); end loop; end loop; end loop; end; / (dept_id,dname)=(31,SALES) (emp_id,ename)=(33,Joe) (emp_id,ename)=(34,John) (dept_id,dname)=(32,MARKETING) (emp_id,ename)=(35,Janet) (timesheet_id,week_date,worked_hours)=(36,03-mar-2008 00:00:00,40) (timesheet_id,week_date,worked_hours)=(37,10-mar-2008 00:00:00,40) PL/SQL procedure successfully completed.
As you can see, we start by constructing some XML to unpack, using the same SQL query we had before. Then we have a FOR loop for each level of data we want to unpack. Loop one unpacks departments, loop two unpacks employees in a department, loop three unpacks timesheets for an employee.
Well, enjoy. Kevin Meade
- Kevin Meade's blog
- Log in to post comments
Comments
Very useful...
Very useful.
It seems that xmlsequence(extract()) could be depricated in favour of xmltable(). I haven't tried it yet, but will if it makes my code more readable.
http://www.liberidu.com/blog/?p=463
how to use ref cursor in for loop,pls give me
Here i tried this but i didn't get the answer,so any one will give me the solution with example as early as possible
declare
type refcur is ref cursor;
ec refcur;
dc refcur
v_dc dept%rowtype;
v_ec emp%rowtype;
begin
for v_dc in ec for select * from dept
loop
print(v_dc.deptno||' '||v_dc.dname);
end loop;
for v_ec in dc for select * from emp
loop
print(v_ec.ename||' '||v_ec.empno);
end loop;
end;
How To Convert .rdf to BI Publisher
Hi
Plaese help me, how to convert
rdf file of oracle apps to bi publisher,
send me code or navigation or examples.
tool used by me. reports 10g, oracle apps 11.5.10, 12
yes though xmltable seems to have DATE issues
Thanks for this tip. Here is a re-write using XMLTABLE. At first glance this is much better. Especially as it seems to allow us to specify actual datatypes for the columns in our virtual table. But, I can't get dates to work. Seems if we want to avoid a format error when using dates, we are forced to use the XML DATE FORMAT yyyy-mm-dd which contains no time component. Thus the hours, minutes, and seconds of our dates get lost. Specifying anything other than 'rrrr-mm-dd' (or an equivelant) causes the error. I hope it is me doing something wrong but I have been unable to locate a solution to it on the web. Seems almost every example of XMLTABLE is either conspicously missing a date in its data, or returns the date as a character string. Wonder if anyone knows a soluction? In any event, here is the equivelant sql using XMLTABLE (misus the date column in timesheet for now).
Run this after creating and loading the tables in the origianl post.
Uncommenting timesheet.week_date above, will cause an error, given our session's date format (bummer).
Kevin
Comparing two rows of a same table using objects types
hi,
I want to write a procedure where i have to compare two rows of a same table.....say there are 4 columns of any data type
consider no primary keys for instance ........
if the data in the row matches then pop up some relative message .
else relative message.
i want this ASAP ....
Thank you Kevin, now all is
Thank you Kevin, now all is working.
XMLTABLE Returning no data
Trying to extract name and ID from the below XML File using XMLTable but it is returning no data
XML File is -
Query I am using:
Tried with above query too
Please guide me on this. Thanks in advance for your help