Hello all,
The problem is that my company is at a very critical
architectural decision regarding the future of a product,
the architecture decision is all related to Oracle.
I have many ideas that I would like to have your feedback
as these ideas cover many disparate issues regarding OS
and technologies connected to Oracle which would take
me presumably quite some time to test before coming to a
conclusion.
This is the scenario:
The existing product description and requirements is as
follows:
- There is a schema with +/- 10 tables which implements
fast person lookup "Names List Matching" (NLM) using
Oracle TEXT. The statement is generated dynamically,
because of some implementation details can not be yet
implemented static or template-based using binding.
- Initially the condition for the lookup/searches were over
two tables, _later_ according to customer increasing
requirements it was needed to search over or display
column fields located in many other tables, creating query
joins on the region of 6-8 tables. At this point it already
existed an ETL process and customer installations which would
obviously break if the initial schema was changed in order
to improve performance.
- Currently, depending on the selectivity of the names (fuzzy)
search and the selectivity of the index used when specifying
exact/hard match conditions it takes from +/- 300 milliseconds
to 10 seconds, also depending on hardware etc etc. Some
parameters appear as very impacting on performance like e.g.
CURSOR_SHARING I set it ussually to FORCE.
- This Names Lists database is not readonly but editable and
transactional, because there might be several users or batch
processes searching over it, at the same time as some users
editing the Names Lists and in at a worst case some ETL process
running.
- This Database application must be portable, run stable and
consistently the same over many different platforms: Windows
W2K/XP, Unix (Solaris, HP/UX), Linux, etc.
The management is not happy with this implementation and has
asked some "third-party" to come up with a better approach. They do
not currently consider improving on the existing solution (I would
like to change that). The main reason for they being unhappy is
performance. They want the queries generated to run permanently
10-200X faster than what it currently is. The "third-party" came
up with this approach:
- Keep the Database shema exactly as it currently is.
- Stop using Oracle TEXT and implement fuzzy matching themselves
using C/C++ they claimed C/C++ = faster immediately.
- Continue using the same overall approach for the product but have
the dynamically generated queries call the C/C++ based functions
from SQL instead or to replace Oracle TEXT functionality.
- They claimed Oracle TEXT had bugs and needed patches (I discovered
the three bugs which are now fixed). The bugs provoked some strange
results and were connected to the Optimizer & Oracle TEXT, again
these are now fixed.
This is how I see it and my arguments against this third-party
proposed solution:
- Implementing the fuzzy matching algorithm using C/C++ from my
point of view is simply insane because of these reasons:
- As they plan to generate maximum optimized C/C++ code, this
might not work consistently among different platforms. With my
C++ experience I know that heavily optimized C/C++ might behave
differently or yet worse it will be needed to maintain different
versions in parallel for different platforms, this chance increases
among disparate platforms and different compilers they will have
to use e.g. from W2K vs Unix.
- Since C/C++ is not Java nor PL/SQL which are (memory) managed, from
C/C++ the programmers are not protected and would be free to make
mistakes and access and corrupt any OS memory, not only chances are
they could access violate the Oracle session process running the SQL
statement that invokes the C/C++ subprograms but depending on the OS
I think they could even bring down the complete Oracle Server.
- This solution will not scale, any performance gains (which I
doubt) because of simply using C/C++ will overbalance because the
external C/C++ subprograms invoked from SQL will not run in parallel,
will be harder for the Oracle optimizer to figure out the best
execution plans.
- As I explained before the application is transactional, so they will
have to figure out (given they plan to query their own C/C++
proprietary
indexes) when data changes or maintain the Names List data themselves.
At the best of the cases they will endup using Pro*C/C++ which I made
a quick search on Metalink and hit more than 160 documented bugs
compared
to Oracle TEXT which is already fixed the three affecting us.
- If they plan to do all by themselves they will have to deal with:
concurrency, scalability, transaction and portability which would be
very
hard to achieve using C/C++, I can immediately think of constraints
like
multi-threading which is completely OS dependant, different APIs for
different platforms e.g. never heard of multi-thread library in the
standard library not to talk about including disparate open-source
projects.
- Implementing the solution in C/C++ will take in development time ages
compared
to the required effort in improving on the existing one. Implementing
the solution in Java would be a valid alternative, though. Java can be as
fast as C++ depending how it is used e.g. limiting OO usage overhead and
function calls in addition to using the Oracle native compilation of Java
classes can make Java run as fast as heavily optimized C/C++.
http://www.javaworld.com/javaworld/jw-02-1998/jw-02-jperf_p.html
- Additionally will have many unwelcome additions to the existing
infrastructure like: different C/C++ compilers, third-party libraries
even potentially some open-source stuff. Exponentially increased
complexity on configuration and installation.
Thank you very much,
Best Regards,
Giovanni
Received on Mon Sep 20 2004 - 04:30:10 CDT