how to compare thousands of pairs of XML-documents??? [message #133827] |
Mon, 22 August 2005 06:27 |
Uden
Messages: 4 Registered: August 2005
|
Junior Member |
|
|
Hi all,
i need to compare two sets of data containing XML-documents and fullfill the following requirements:
1) characterise the differences and present the results in a report.
E.g. something like: 1150 xml-documents with 99 documents containing different fragments, 230 documents with missing fragments, and 112 documents with new fragments.
Of the differing fragments, 23 are found in the name-part, 46 in the adres-part and xx in the yy-part of the document.
Etc.
2) show the two differing documents side-by-side and highlight the differences.
As a solution for requirement no. 2 i use the the Oracle XMLDiff Java class and this works reasonably well.
My solution for requirement no. 1 is to select the two xml-documents, convert them to lowercase characters to rule out trivial differences like writing a name as "mr. X" or "Mr. X" and calculate the size of the document. When the size differs i take xml-document fragments (such as the name part, the address part, or other information) and repeat the step of calculating the size of this fragment and comparing the results.
In this way i can narrow down the possition of the differences but it is awfully complex and slow for the large and complicated xml-documents i have to deal with.
Who has better ideas to compare and characterise the differences of large numbers of xml-documents stored as complete documents in an xmltype column in the database???
thanks!!
|
|
|