need help about Oracle architecture decision ...

From: Giovanni Azua <bravegag_at_hotmail.com>
Date: Mon, 20 Sep 2004 11:30:10 +0200
Message-ID: <2r7mh6F16stdfU1@uni-berlin.de>

Hello all,

The problem is that my company is at a very critical architectural decision regarding the future of a product, the architecture decision is all related to Oracle.

I have many ideas that I would like to have your feedback as these ideas cover many disparate issues regarding OS and technologies connected to Oracle which would take me presumably quite some time to test before coming to a conclusion.

This is the scenario:

The existing product description and requirements is as follows:

There is a schema with +/- 10 tables which implements fast person lookup "Names List Matching" (NLM) using Oracle TEXT. The statement is generated dynamically, because of some implementation details can not be yet implemented static or template-based using binding.
Initially the condition for the lookup/searches were over two tables, _later_ according to customer increasing requirements it was needed to search over or display column fields located in many other tables, creating query joins on the region of 6-8 tables. At this point it already existed an ETL process and customer installations which would obviously break if the initial schema was changed in order to improve performance.
Currently, depending on the selectivity of the names (fuzzy) search and the selectivity of the index used when specifying exact/hard match conditions it takes from +/- 300 milliseconds to 10 seconds, also depending on hardware etc etc. Some parameters appear as very impacting on performance like e.g. CURSOR_SHARING I set it ussually to FORCE.
This Names Lists database is not readonly but editable and transactional, because there might be several users or batch processes searching over it, at the same time as some users editing the Names Lists and in at a worst case some ETL process running.
This Database application must be portable, run stable and consistently the same over many different platforms: Windows W2K/XP, Unix (Solaris, HP/UX), Linux, etc.

The management is not happy with this implementation and has asked some "third-party" to come up with a better approach. They do not currently consider improving on the existing solution (I would like to change that). The main reason for they being unhappy is performance. They want the queries generated to run permanently 10-200X faster than what it currently is. The "third-party" came up with this approach:

Keep the Database shema exactly as it currently is.
Stop using Oracle TEXT and implement fuzzy matching themselves using C/C++ they claimed C/C++ = faster immediately.
Continue using the same overall approach for the product but have the dynamically generated queries call the C/C++ based functions from SQL instead or to replace Oracle TEXT functionality.
They claimed Oracle TEXT had bugs and needed patches (I discovered the three bugs which are now fixed). The bugs provoked some strange results and were connected to the Optimizer & Oracle TEXT, again these are now fixed.

This is how I see it and my arguments against this third-party proposed solution:

Implementing the fuzzy matching algorithm using C/C++ from my point of view is simply insane because of these reasons:
1. As they plan to generate maximum optimized C/C++ code, this might not work consistently among different platforms. With my C++ experience I know that heavily optimized C/C++ might behave differently or yet worse it will be needed to maintain different versions in parallel for different platforms, this chance increases among disparate platforms and different compilers they will have to use e.g. from W2K vs Unix.
2. Since C/C++ is not Java nor PL/SQL which are (memory) managed, from C/C++ the programmers are not protected and would be free to make mistakes and access and corrupt any OS memory, not only chances are they could access violate the Oracle session process running the SQL statement that invokes the C/C++ subprograms but depending on the OS I think they could even bring down the complete Oracle Server.
3. This solution will not scale, any performance gains (which I doubt) because of simply using C/C++ will overbalance because the external C/C++ subprograms invoked from SQL will not run in parallel, will be harder for the Oracle optimizer to figure out the best execution plans.
4. As I explained before the application is transactional, so they will have to figure out (given they plan to query their own C/C++ proprietary indexes) when data changes or maintain the Names List data themselves. At the best of the cases they will endup using Pro*C/C++ which I made a quick search on Metalink and hit more than 160 documented bugs compared to Oracle TEXT which is already fixed the three affecting us.
5. If they plan to do all by themselves they will have to deal with: concurrency, scalability, transaction and portability which would be very hard to achieve using C/C++, I can immediately think of constraints like multi-threading which is completely OS dependant, different APIs for different platforms e.g. never heard of multi-thread library in the standard library not to talk about including disparate open-source projects.
Implementing the solution in C/C++ will take in development time ages compared to the required effort in improving on the existing one. Implementing the solution in Java would be a valid alternative, though. Java can be as fast as C++ depending how it is used e.g. limiting OO usage overhead and function calls in addition to using the Oracle native compilation of Java classes can make Java run as fast as heavily optimized C/C++. http://www.javaworld.com/javaworld/jw-02-1998/jw-02-jperf_p.html
Additionally will have many unwelcome additions to the existing infrastructure like: different C/C++ compilers, third-party libraries even potentially some open-source stuff. Exponentially increased complexity on configuration and installation.

Thank you very much,
Best Regards,
Giovanni Received on Mon Sep 20 2004 - 04:30:10 CDT