Home » SQL & PL/SQL » SQL & PL/SQL » Traverse CLOB with Connect By (Oracle, 19.0.0.0.0, Linux)
Traverse CLOB with Connect By [message #689570] |
Thu, 15 February 2024 07:26  |
Duane
Messages: 585 Registered: December 2002
|
Senior Member |
|
|
I'm trying to see if this is possible by using a "Connect By" without having to loop over the CLOB.
Result: Insert the values of the CLOB into a table.
Data:
ID,Term,Recruiter||chr(10)||chr(13)
A01,202420,Name 1||chr(10)||chr(13)
A0102,202420,Name 2||chr(10)||chr(13)
A010203,202420,Name 3||chr(10)||chr(13)
A01020304,202420,Name 4||chr(10)||chr(13)
A01,202420,Name 5||chr(10)||chr(13)
A02,202420,Name 6||chr(10)||chr(13)
A03,202420,Name 7||chr(10)||chr(13)
A04,202420,Name 8||chr(10)||chr(13)
A05,202420,Name 9||chr(10)||chr(13)
Table:
CREATE TABLE FILE_CONTENTS
(
ID VARCHAR2(10 CHAR),
TERM VARCHAR2(6 CHAR),
RECRUITER VARCHAR2(50 CHAR)
)
Query:
select txt
from (select 'ID,Term,Recruiter'||chr(10)||chr(13)||
'A01,202420,Name 1'||chr(10)||chr(13)||
'A0102,202420,Name 2'||chr(10)||chr(13)||
'A010203,202420,Name 3'||chr(10)||chr(13)||
'A01020304,202420,Name 4'||chr(10)||chr(13)||
'A01,202420,Name 5'||chr(10)||chr(13)||
'A02,202420,Name 6'||chr(10)||chr(13)||
'A03,202420,Name 7'||chr(10)||chr(13)||
'A04,202420,Name 8'||chr(10)||chr(13)||
'A05,202420,Name 9'||chr(10)||chr(13) txt
from dual)
connect by level <= regexp_count(txt,'([^\'||chr(10)||chr(13)||']+)')
Final Result: CLOB values are inserted into Table.
FILE_CONTENTS
ID TERM RECRUITER
A01 202420 Name 1
A0102 202420 Name 2
A010203 202420 Name 3
A01020304 202420 Name 4
A01 202420 Name 5
A02 202420 Name 6
A03 202420 Name 7
A04 202420 Name 8
A05 202420 Name 9
|
|
|
Re: Traverse CLOB with Connect By [message #689571 is a reply to message #689570] |
Thu, 15 February 2024 08:32   |
 |
Michel Cadot
Messages: 68748 Registered: March 2007 Location: Saint-Maur, France, https...
|
Senior Member Account Moderator |
|
|
Split CLOB into lines:
SQL> with
2 data as (
3 select 'ID,Term,Recruiter'||chr(10)||chr(13)||
4 'A01,202420,Name 1'||chr(10)||chr(13)||
5 'A0102,202420,Name 2'||chr(10)||chr(13)||
6 'A010203,202420,Name 3'||chr(10)||chr(13)||
7 'A01020304,202420,Name 4'||chr(10)||chr(13)||
8 'A01,202420,Name 5'||chr(10)||chr(13)||
9 'A02,202420,Name 6'||chr(10)||chr(13)||
10 'A03,202420,Name 7'||chr(10)||chr(13)||
11 'A04,202420,Name 8'||chr(10)||chr(13)||
12 'A05,202420,Name 9'||chr(10)||chr(13) txt
13 from dual
14 )
15 select regexp_substr(txt, '[^'||chr(10)||chr(13)||']+', 1, level) line
16 from data
17 connect by level <= regexp_count(txt,chr(10)||chr(13))
18 /
LINE
---------------------------------------------------------------------------
ID,Term,Recruiter
A01,202420,Name 1
A0102,202420,Name 2
A010203,202420,Name 3
A01020304,202420,Name 4
A01,202420,Name 5
A02,202420,Name 6
A03,202420,Name 7
A04,202420,Name 8
A05,202420,Name 9
10 rows selected.
Split lines into columns:
SQL> col col1 format a10
SQL> col col2 format a10
SQL> col col3 format a10
SQL> with
2 data as (
3 select 'ID,Term,Recruiter'||chr(10)||chr(13)||
4 'A01,202420,Name 1'||chr(10)||chr(13)||
5 'A0102,202420,Name 2'||chr(10)||chr(13)||
6 'A010203,202420,Name 3'||chr(10)||chr(13)||
7 'A01020304,202420,Name 4'||chr(10)||chr(13)||
8 'A01,202420,Name 5'||chr(10)||chr(13)||
9 'A02,202420,Name 6'||chr(10)||chr(13)||
10 'A03,202420,Name 7'||chr(10)||chr(13)||
11 'A04,202420,Name 8'||chr(10)||chr(13)||
12 'A05,202420,Name 9'||chr(10)||chr(13) txt
13 from dual
14 ),
15 lines as (
16 select regexp_substr(txt, '[^'||chr(10)||chr(13)||']+', 1, level) line
17 from data
18 connect by level <= regexp_count(txt,chr(10)||chr(13))
19 )
20 select regexp_substr(line, '[^,]+', 1, 1) col1,
21 regexp_substr(line, '[^,]+', 1, 2) col2,
22 regexp_substr(line, '[^,]+', 1, 3) col3
23 from lines
24 /
COL1 COL2 COL3
---------- ---------- ----------
ID Term Recruiter
A01 202420 Name 1
A0102 202420 Name 2
A010203 202420 Name 3
A01020304 202420 Name 4
A01 202420 Name 5
A02 202420 Name 6
A03 202420 Name 7
A04 202420 Name 8
A05 202420 Name 9
10 rows selected.
|
|
|
|
|
|
Re: Traverse CLOB with Connect By [message #689575 is a reply to message #689574] |
Thu, 15 February 2024 11:15   |
 |
Michel Cadot
Messages: 68748 Registered: March 2007 Location: Saint-Maur, France, https...
|
Senior Member Account Moderator |
|
|
SQL> col col1 format a10
SQL> col col2 format a10
SQL> col col3 format a10
SQL> with
2 data as (
3 select 'ID,Term,Recruiter'||chr(10)||chr(13)||
4 'A01,202420,Name 1'||chr(10)||chr(13)||
5 'A0102,202420,Name 2'||chr(10)||chr(13)||
6 'A010203,202420,Name 3'||chr(10)||chr(13)||
7 'A01020304,202420,Name 4'||chr(10)||chr(13)||
8 'A01,202420,Name 5'||chr(10)||chr(13)||
9 'A02,202420,Name 6'||chr(10)||chr(13)||
10 'A03,202420,Name 7'||chr(10)||chr(13)||
11 'A04,202420,Name 8'||chr(10)||chr(13)||
12 'A05,202420,Name 9'||chr(10)||chr(13) txt
13 from dual
14 ),
15 lines as (
16 select regexp_substr(txt, '[^'||chr(10)||chr(13)||']+', 1, level) line, level rn
17 from data
18 connect by level <= regexp_count(txt,chr(10)||chr(13))
19 )
20 select regexp_substr(line, '[^,]+', 1, 1) col1,
21 regexp_substr(line, '[^,]+', 1, 2) col2,
22 regexp_substr(line, '[^,]+', 1, 3) col3
23 from lines
24 where rn > 1
25 /
COL1 COL2 COL3
---------- ---------- ----------
A01 202420 Name 1
A0102 202420 Name 2
A010203 202420 Name 3
A01020304 202420 Name 4
A01 202420 Name 5
A02 202420 Name 6
A03 202420 Name 7
A04 202420 Name 8
A05 202420 Name 9
9 rows selected.
|
|
|
Re: Traverse CLOB with Connect By [message #689576 is a reply to message #689575] |
Thu, 15 February 2024 13:46   |
 |
mathguy
Messages: 108 Registered: January 2023
|
Senior Member |
|
|
Since Oracle added support for JSON in Oracle 12.1 (and improved it with each new release), a more efficient way of splitting delimited strings emerged - transform the input string into a JSON document and use JSON_TABLE or similar for the splitting. Extensive testing performed by Stew Ashton (and confirmed by numerous other tests) shows that this approach is faster than other, older methods.
I will describe the solution below in some detail. You didn't tell us where the CLOB is coming from; in my illustration I create it on the spot in a WITH clause. (This would not work exactly as written for a true CLOB, meaning one of some size, but that isn't relevant for the current question.)
I only show a SELECT statement that splits everything as needed - you can then use this in an INSERT statement.
with
input_data (txt) as (
select to_clob(
'ID,Term,Recruiter'||chr(10)||chr(13)||
'A01,202420,Name 1'||chr(10)||chr(13)||
'A0102,202420,Name 2'||chr(10)||chr(13)||
'A010203,202420,Name 3'||chr(10)||chr(13)||
'A01020304,202420,Name 4'||chr(10)||chr(13)||
'A01,202420,Name 5'||chr(10)||chr(13)||
'A02,202420,Name 6'||chr(10)||chr(13)||
'A03,202420,Name 7'||chr(10)||chr(13)||
'A04,202420,Name 8'||chr(10)||chr(13)||
'A05,202420,Name 9'||chr(10)||chr(13))
from dual
)
, prep (json_arr) as (
select '[' || replace(replace(json_array(substr(txt, 1, length(txt) - 2)), ',', '","'), '\n\r', '"],["') || ']'
from input_data
)
select id, term, recruiter
from prep nested json_arr, '$[*]'
columns (ord for ordinality,
id varchar2(10 char) path '$[0]',
term varchar2( 6 char) path '$[1]',
recruiter varchar2(50 char) path '$[2]')
where ord > 1
;
The PREP subquery (inline view) converts the input (comma and chr10-chr13 delimited string) into a JSON array of JSON arrays. The inner-most transformation, the SUBSTR, is to remove the line-end marker from the last line (since JSON arrays use delimiters, not terminators). Then the resulting string is passed to JSON_ARRAY. "Array" in this context is not truly an array: we pass a single scalar value (a long string), and we get a JSON array with a single element. This does two things: it appends [" and prepends "] to the string, and it escapes JSON special characters. (Namely: [ and ", but also control characters like chr10 and chr13 - which is actually a slight complication.) If we could assume that the "real data" in the CLOB does not include " or [ and such, the code could be simplified a bit; but I think it is a good practice to write our code as if we did need to be careful.
Then we replace the commas in the original text with ",", and the chr10-chr13 pairs with "],[". This will cause the tokens in the original text to appear enclosed in double-quotes (remember that any double-quotes in the input tokens are now escaped!), and the "lines" of the original text to become sub-arrays in the global JSON array created earlier. These are just mindless text manipulations, they don't use JSON tools; but we were careful enough earlier that this step will work as needed. Two things to note: the REPLACE function now must hunt for \n\r, since the chr(10) and chr(13) characters were escaped when we applied JSON_ARR; and the order of the two calls to REPLACE is essential: we must replace the original commas first (appending and prepending " to each), and ONLY THEN apply the REPLACE to the line separators, because these will add new commas, and they must be handled differently from the original commas in the text.
Finally, we must append [ and prepend ] to the result, since the ones added by JSON_ARR have become the opening [ for the first nested array and the closing ] for the last nested array, respectively.
At this point, it would help to remove the main SELECT from the end of the code, and replace it with SELECT JSON_ARR FROM PREP to see exactly what we did so far.
The rest is a trivial application of JSON_TABLE to extract data from an array of arrays. I used the newer (simpler) syntax introduced in Oracle 19, but that's just for fun.
The condition ORD > 1 in the WHERE clause is the one that causes the headers to be excluded. I could have excluded the headers much earlier in the process, but I thought this gives more flexibility (what if tomorrow we must use this on a CLOB where the first three lines are header lines, etc.)
|
|
|
Re: Traverse CLOB with Connect By [message #689577 is a reply to message #689576] |
Thu, 15 February 2024 16:10   |
Duane
Messages: 585 Registered: December 2002
|
Senior Member |
|
|
mathguy wrote on Thu, 15 February 2024 19:46Since Oracle added support for JSON in Oracle 12.1 (and improved it with each new release), a more efficient way of splitting delimited strings emerged - transform the input string into a JSON document and use JSON_TABLE or similar for the splitting. Extensive testing performed by Stew Ashton (and confirmed by numerous other tests) shows that this approach is faster than other, older methods.
I used your example on a really large CLOB and I'm getting this error "[Error] Execution (1: 1): ORA-40478: output value too large (maximum: 4000)". The CLOB I used has a data length of 66169.
Is there a work around for that?
|
|
|
|
Re: Traverse CLOB with Connect By [message #689579 is a reply to message #689578] |
Thu, 15 February 2024 19:27   |
Duane
Messages: 585 Registered: December 2002
|
Senior Member |
|
|
mathguy wrote on Fri, 16 February 2024 01:10I don't understand. I thought you already talked about "working example"; do you mean that this is in fact your first attempt at accessing the 66k+ character CLOB? Where do you get the error? I assume from the very beginning; do you get it if right after the first subquery in the WITH clause, you SELECT TXT FROM INPUT_DATA?
There are workarounds, but let's make sure I understand your question first. This is a distinct question, it has nothing to do with the rest of the problem. Or does it?
That is correct. I have a working example with data much smaller than the 66K+ character CLOB. When I try the example with the 66K+ character CLOB then I get the error I pointed out. I would say it's right at the beginning.
Distinct question. Works fine with smaller data but not the 66K+ CLOB.
[Updated on: Thu, 15 February 2024 19:28] Report message to a moderator
|
|
|
Re: Traverse CLOB with Connect By [message #689580 is a reply to message #689579] |
Thu, 15 February 2024 20:49   |
Duane
Messages: 585 Registered: December 2002
|
Senior Member |
|
|
Ok, I got this to work by adding "returning clob" within the "json_array()" statement.
This does return data so I must have something wrong in the json statement. Actually, removing ORD > 1 returns the HEADER but no actual data. Not sure what else is wrong.
with
input_data (txt) as
(select file_contents txt
from files
where id = '1999'),
prep (json_arr) as
(select '[' || replace(replace(json_array(substr(txt, 1, length(txt) - 2) returning clob), ',', '","'), '\n\r', '"],["') || ']'
from input_data)
select *
from prep
|
|
|
Re: Traverse CLOB with Connect By [message #689581 is a reply to message #689580] |
Thu, 15 February 2024 21:22   |
Duane
Messages: 585 Registered: December 2002
|
Senior Member |
|
|
Must be the data. I wonder if it's the difference between the test data having hard coded values of "||chr(10)||chr(13)||" and the production data having CR/LR.
Test Data:
Select * from prep
[["ID","Term","Recruiter"],["A01","202420","Name 1"],["A0102","202420","Name 2"],["A010203","202420","Name 3"],
66K+ Data:
Select * from prep
[["Campus","Complete","ip","program","term","id","\r\nDL"," "," ","57","209990","88","\r\nDL"," ","I","84","209990","17","\r\nDL"," ","I","50","209990","20","\r\nDL"," ","I","79","209990","26","\r\nDL"," ","I","92","209990","37","\r\nDL"," ","I","57","209990","47","\r\nDL","
I wonder if it's because the header has that extra ",". Data also has an extra "," before the CR/LR.
HEADER:
Campus,Complete,ip,program,term,id,
[Updated on: Thu, 15 February 2024 21:24] Report message to a moderator
|
|
|
Re: Traverse CLOB with Connect By [message #689582 is a reply to message #689581] |
Thu, 15 February 2024 23:46   |
 |
mathguy
Messages: 108 Registered: January 2023
|
Senior Member |
|
|
Good catch on the RETURNING CLOB option to JSON_ARRAY. I forgot that the default is VARCHAR2.
Frankly chr(10) || chr(13) as the line terminator looked odd, but suum cuique as the saying goes. CRLF - that is chr(13) || chr(10) - is the line terminator in Windows; Unix and its descendants use chr(10), which was adopted by Oracle also, and I heard that "old" Mac OS used chr(13) (just for spite I suppose), until they joined the Unix gang; but I haven't seen LFCR - or chr(10) || chr(13) - as line terminator. (As an aside, note that chr(10) is known as LF, for Line Feed; there is no LR.) In any case, I wrote the code for the sample data you posted.
Of course, if the real-life data uses chr(13) || chr(10), then using them swapped in the code won't work. What happened when you changed that in the code? (You did try, right?)
The extra comma in each line (meaning that tokens are also "terminated" by comma, rather than just "delimited" or "separated") should not cause problems. That's because in the approach I showed, that extra comma causes each nested array to have one additional token in the last position, which is an empty string in all cases. In the JSON_TABLE application, I only select the first three elements from each such nested array; if there is (or there isn't) a fourth element, empty or otherwise, does not matter, since it's not referenced either way.
So, if these are the only deviations between the real-life data and the sample you gave us, then simply fixing the part of the code that refers to the line terminator should suffice. But, looking at what you posted - what is that DL after each \r\n at the end of each line? Is there something else in the raw file, in addition to chr(13) || chr(10) at the end of each line?
Where is the raw text? Is it a text file on the server? Under what operating system? Every operating system has tools to look at a raw text file to see what special characters it contains (in Oracle you would use the DUMP function); what do you see in the file - or in the table, using DUMP, if in the Oracle db already?
[Updated on: Thu, 15 February 2024 23:53] Report message to a moderator
|
|
|
|
Re: Traverse CLOB with Connect By [message #689587 is a reply to message #689582] |
Fri, 16 February 2024 07:44   |
Duane
Messages: 585 Registered: December 2002
|
Senior Member |
|
|
mathguy wrote on Fri, 16 February 2024 05:46Good catch on the RETURNING CLOB option to JSON_ARRAY. I forgot that the default is VARCHAR2.
Frankly chr(10) || chr(13) as the line terminator looked odd, but suum cuique as the saying goes. CRLF - that is chr(13) || chr(10) - is the line terminator in Windows; Unix and its descendants use chr(10), which was adopted by Oracle also, and I heard that "old" Mac OS used chr(13) (just for spite I suppose), until they joined the Unix gang; but I haven't seen LFCR - or chr(10) || chr(13) - as line terminator. (As an aside, note that chr(10) is known as LF, for Line Feed; there is no LR.) In any case, I wrote the code for the sample data you posted.
Of course, if the real-life data uses chr(13) || chr(10), then using them swapped in the code won't work. What happened when you changed that in the code? (You did try, right?)
The extra comma in each line (meaning that tokens are also "terminated" by comma, rather than just "delimited" or "separated") should not cause problems. That's because in the approach I showed, that extra comma causes each nested array to have one additional token in the last position, which is an empty string in all cases. In the JSON_TABLE application, I only select the first three elements from each such nested array; if there is (or there isn't) a fourth element, empty or otherwise, does not matter, since it's not referenced either way.
So, if these are the only deviations between the real-life data and the sample you gave us, then simply fixing the part of the code that refers to the line terminator should suffice. But, looking at what you posted - what is that DL after each \r\n at the end of each line? Is there something else in the raw file, in addition to chr(13) || chr(10) at the end of each line?
Where is the raw text? Is it a text file on the server? Under what operating system? Every operating system has tools to look at a raw text file to see what special characters it contains (in Oracle you would use the DUMP function); what do you see in the file - or in the table, using DUMP, if in the Oracle db already?
"what is that DL after each \r\n at the end of each line?" That is the first column of data. See my data example.
Ok, I flipped the CR/LF in the query and ran it with a small number of records. That worked with the 30 lines of data including the HEADER.
Data:
Campus,Complete,ip,program,term,id,
DL, , ,57,202330,88,
DL, ,I,84,202330,17,
DL, ,I,50,202330,20,
DL, ,I,79,202230,26,
DL, ,I,92,202330,37,
DL, ,I,57,202230,47,
DL, , ,57,202320,19,
DL, ,I,80,202330,26,
DL, ,I,92,202230,27,
DL, ,I,74,202330,33,
DL, ,I,80,202320,42,
DL, ,I,50,202330,46,
DL, ,I,92,202230,48,
DL, , ,57,202330,49,
DL, , ,57,202220,49,
DL, , ,57,202320,56,
DL, , ,57,202330,57,
DL, ,I,57,201810,58,
DL, ,I,84,202130,61,
DL, ,I,89,202330,61,
DL, ,I,84,202330,62,
DL, ,I,51,202330,65,
DL, ,I,50,202230,66,
DL, ,I,92,202320,70,
DL, , ,57,202330,70,
DL, ,I,92,202230,70,
DL, ,I,50,202330,74,
DL, ,I,50,202330,74,
DL, ,I,57,202330,74,
Query
with
input_data (txt) as
(select file_contents txt
from uploaded_files
where id = '2'),
prep (json_arr) as
(select '[' || replace(replace(json_array(substr(txt, 1, length(txt) - 2) returning clob), ',', '","'), '\r\n', '"],["') || ']'
from input_data)
select campus, complete, ip, program, term, id
from prep nested json_arr,
'$[*]' columns (ord for ordinality,
campus varchar2(2 char) path '$[0]',
complete varchar2(1 char) path '$[1]',
ip varchar2(1 char) path '$[2]',
program varchar2(1 char) path '$[3]',
term varchar2(6 char) path '$[4]',
id varchar2(10 char) path '$[5]')
Taking the data above and copying it to be 2060 lines of data returned "[Error] Execution (1: 1): ORA-01406: fetched column value was truncated".
Code:
with
input_data (txt) as
(select file_contents txt
from uploaded_files
where id = '3'),
prep (json_arr) as
(select '[' || replace(replace(json_array(substr(txt, 1, length(txt) - 2) returning clob), ',', '","'), '\r\n', '"],["') || ']'
from input_data)
select campus, complete, ip, program, term, id
from prep nested json_arr,
'$[*]' columns (ord for ordinality,
campus varchar2(2 char) path '$[0]',
complete varchar2(1 char) path '$[1]',
ip varchar2(1 char) path '$[2]',
program varchar2(1 char) path '$[3]',
term varchar2(6 char) path '$[4]',
id varchar2(10 char) path '$[5]')
Table that is used to store the CLOB results while testing.
CREATE TABLE UPLOADED_FILES
(
ID NUMBER,
FILE_CONTENTS CLOB
)
[Updated on: Fri, 16 February 2024 07:46] Report message to a moderator
|
|
|
Re: Traverse CLOB with Connect By [message #689588 is a reply to message #689587] |
Fri, 16 February 2024 08:02   |
Duane
Messages: 585 Registered: December 2002
|
Senior Member |
|
|
Never mind. It works with 30 lines or 2060 lines of data. My problem was the HEADER. Putting "ord > 1" solved my problem.
My only question now is about double quotations '"' around the data.
Data:
campus,complete,ip,program,term,id,
"DL", , ,57,"202330",88,
DL, ,I,84,202330,17,
DL, ,I,50,202330,20,
Result: (term data is not returned)
"DL 57 88
DL I 84 17
DL I 50 20
Query:
with
input_data (txt) as
(select file_contents txt
from uploaded_files
where id = '2'),
prep (json_arr) as
(select '[' || replace(replace(json_array(substr(txt, 1, length(txt) - 2) returning clob), ',', '","'), '\r\n', '"],["') || ']'
from input_data)
select campus, complete, ip, program, term, id
from prep nested json_arr,
'$[*]' columns (ord for ordinality,
campus varchar2(2 char) path '$[0]',
complete varchar2(1 char) path '$[1]',
ip varchar2(1 char) path '$[2]',
program varchar2(1 char) path '$[3]',
term varchar2(6 char) path '$[4]',
id varchar2(10 char) path '$[5]')
where ord > 1;
[Updated on: Fri, 16 February 2024 08:03] Report message to a moderator
|
|
|
Re: Traverse CLOB with Connect By [message #689589 is a reply to message #689588] |
Fri, 16 February 2024 08:15   |
Duane
Messages: 585 Registered: December 2002
|
Senior Member |
|
|
Corrected the "program" length.
Query:
with
input_data (txt) as
(select file_contents txt
from uploaded_files
where id = '2'),
prep (json_arr) as
(select '[' || replace(replace(json_array(substr(txt, 1, length(txt) - 2) returning clob), ',', '","'), '\r\n', '"],["') || ']'
from input_data)
select campus, complete, ip, program, term, id
from prep nested json_arr,
'$[*]' columns (ord for ordinality,
campus varchar2(2 char) path '$[0]',
complete varchar2(1 char) path '$[1]',
ip varchar2(1 char) path '$[2]',
program varchar2(10 char) path '$[3]',
term varchar2(6 char) path '$[4]',
id varchar2(10 char) path '$[5]')
where ord > 1;
Result: (left " is left within the data but the right " is removed)
"DL 57 "202330
DL I 84 202330 17
DL I 50 202330 20
[Updated on: Fri, 16 February 2024 08:15] Report message to a moderator
|
|
|
|
Re: Traverse CLOB with Connect By [message #689591 is a reply to message #689590] |
Fri, 16 February 2024 09:07   |
Duane
Messages: 585 Registered: December 2002
|
Senior Member |
|
|
Yes. I changed the first line to test if the double quotations would be removed. I could be wrong but I thought you had said you account for the data having double quotations. Half way works as you can see.
Data:
campus,complete,ip,program,term,id,
"DL", , ,57,"202330",88,
DL, ,I,84,202330,17,
DL, ,I,50,202330,20,
Result:
"DL 57 "202330
DL I 84 202330 17
DL I 50 202330 20
|
|
|
|
Re: Traverse CLOB with Connect By [message #689593 is a reply to message #689592] |
Fri, 16 February 2024 10:15   |
Duane
Messages: 585 Registered: December 2002
|
Senior Member |
|
|
You did say that. My fault.
If what I'm reading is correct, then the result set is not correct. The result shows only one double-quote (left side) remains in the result set. Where's the closing double-quote for the right side of the data?
Input Data: (Double-quotes around "DL" and "202330")
campus,complete,ip,program,term,id,
"DL", , ,57,"202330",88,
Result: (DL has a double-quote on the left side but nothing on the right side. Same with 202330. The ID value is also missing for that line.)
All in all, thank you for your help and coming up with this query. It's what I needed.
[Updated on: Fri, 16 February 2024 10:15] Report message to a moderator
|
|
|
|
Re: Traverse CLOB with Connect By [message #689595 is a reply to message #689594] |
Fri, 16 February 2024 10:39   |
Duane
Messages: 585 Registered: December 2002
|
Senior Member |
|
|
Ah, gotcha. Will need to account for that in the data. Never thought of that.
I think you had mentioned that a lot of CSV files have double-quoted values. This particular file doesn't.
We're all good then.
Thanks again.
|
|
|
Re: Traverse CLOB with Connect By [message #689596 is a reply to message #689576] |
Fri, 16 February 2024 20:51   |
Duane
Messages: 585 Registered: December 2002
|
Senior Member |
|
|
mathguy wrote on Thu, 15 February 2024 19:46
I only show a SELECT statement that splits everything as needed - you can then use this in an INSERT statement.
Are you accounting for the CR/LF with "length(txt) - 2" within the following statement? Just trying to understand why it's "length(txt) - 2" and not just "length(txt)".
substr(txt, 1, length(txt) - 2)
I just want to make sure I totally understand all the parts of the query.
|
|
|
Re: Traverse CLOB with Connect By [message #689597 is a reply to message #689596] |
Fri, 16 February 2024 21:54   |
 |
mathguy
Messages: 108 Registered: January 2023
|
Senior Member |
|
|
I do that to remove the last CR LF from the text. (If I just left LENGTH(TXT) as the last argument, then I wouldn't need SUBSTR at all - the result would be just TXT!)
There is a meaningful distinction between "separated" and "terminated" lists. For example, each "row" in your data is terminated by CR LF - which means that the last "row" includes the terminal CR LF. Similarly, each "token" in each of your "rows" is terminated by a comma; that means that the last character before CR LF in each "row" is always a comma, indicating the end of the last token.
JSON arrays use comma as a delimiter, not as a terminator. A JSON array consisting of three numbers looks like [3,2,-9] - not [3,2,-9,]. The latter array consists of FOUR tokens, the last one of which is empty, which doesn't really make sense: there is no "missing value" concept in JSON. Strict JSON syntax will flag that as invalid JSON, while lax syntax will allow it by simply ignoring the last comma. JSON has the null value (but it must be spelled out, it can't just be a non-existing token) and it supports empty strings (enclosed in double-quotes!), but not completely empty space. Note though that my "manual" manipulation of the string adds double-quotes around every value; that array would look like ["3","2","-9",""] after my manipulations, and the last token is now an empty string, which is valid.
This is about the inner JSON arrays. With regard to the "global" (outer) JSON array, which represents all the data, I replace each CR LF with (essentially) a comma; more characters to enclose tokens and end and begin inner arrays, but as a separator, CR LF becomes comma in the outer JSON array. Here if I allow the last CR LF to stand, I end up with a mess; this means that after the last nested array (representing a row of real data), if I allow the last CR LF to stand (and to be treated as a separator in the outer array!), there will be an extra nested array at the end, consisting in a single empty string (enclosed in double-quotes). I don't want to have to deal with that - it may throw an error, or end up inserting a row of NULL in the table; bad either way. I want CR LF to be a separator in your CLOB, not a terminator - so I simply chop off the last CR LF from the outset, by leaving out the last two characters of your CLOB.
In more delicate situations you would also have to handle the issue of comma being a token terminator, not separator, in each row; but in this case, you don't, for the reason I explained earlier.
|
|
|
Re: Traverse CLOB with Connect By [message #689598 is a reply to message #689597] |
Sat, 17 February 2024 11:03  |
Duane
Messages: 585 Registered: December 2002
|
Senior Member |
|
|
I meant the last CR/LF but didn't specify. I pulled apart the query returning sections of the data to see what was happening. I counted out the characters so it appeared that's what you were doing.
Thanks for the explanation.
|
|
|
Goto Forum:
Current Time: Tue Mar 11 06:22:36 CDT 2025
|