Re: Fastest way to count exact number of rows in a very large table

From: Andy Sayer <andysayer_at_gmail.com>
Date: Mon, 5 Oct 2020 22:33:16 +0100
Message-ID: <CACj1VR6x9uZBJZXbToLJcEuVmTF5us9sR+CML8NNvj=FNq7OVw_at_mail.gmail.com>



There’s no point doing it at all :)
If the XTTS succeeds and you don’t end up with a physically corrupt file, all the rows that were in the table before you made the tablespace read only will still be there.

BTW I have strong doubts that just sticking a query inside a pipelined function will make it any faster. Assuming you aren’t writing to the table at the same time (which would be a silly thing to do if the goal is to get the same count as on a different system). /*+parallel*/ is valid syntax for running a query in parallel. I'd guess that this specific code would be much slower as it opens a parallel cursors to get the partition names from some table and needs that cursor open before it can query the actual table - you'll be wasting parallel threads doing nothing.

Mladen, RE: APPROX_FOR_COUNT_DISTINCT, it reduces the CPU impact for count distinct but you don't have that requirement for a standard count - there's no need to decide whether you've seen the value before. It is incredib ly fast (and highly accurate) when compared to count distinct but a standard count(*) would be quicker (and here the main time taken would probably be from IO).

Thanks,
Andy

On Mon, 5 Oct 2020 at 22:00, Reen, Elizabeth <elizabeth.reen_at_citi.com> wrote:

> I assumed that it would be done in parallel. Agreed that
> there was no sense doing it serially.
>
>
>
>
>
> Liz
>
>
>
>
>
>
>
> *From:* [External] ahmed.fikri at t-online.de <ahmed.fikri_at_t-online.de>
> *Sent:* Monday, October 5, 2020 4:35 PM
> *To:* gogala.mladen_at_gmail.com; Reen, Elizabeth [ICG-IT]; Ashoke Mandal
> *Cc:* andysayer_at_gmail.com; list, oracle
> *Subject:* AW: Fastest way to count exact number of rows in a very large
> table
>
>
>
> If you are just interested in getting the exact number of rows in a large
> partitioned table, then the fasted way (in my opinion) is to use PL/SQL. Looping
> the partition serially will not help (this is more slower than select
> count(*) from huge_table).You have to create a pipelined function and use
> the parallel_enable feature.
>
> Pseudo code( give only the Idea, if you adjust it you can reduce the time
> significantly):
>
> create or replace package pkg_test as
>
> type xxx IS RECORD(column_value VARCHAR2(200));
>
> type t_parallel_test_ref_cursor IS REF CURSOR RETURN xxx;
>
> function fu_count(p_cursor t_parallel_test_ref_cursor);
>
> end;
>
> /
>
> create or replace package body pkg_test as
>
> function fu_count_part(p_part_nameVARCHAR2, p_dop NUMBER) RETURN NUMBER IS
>
> v_count number := 0;
> BEGIN
>
> -- you can try here to use bind variable to avoid hard parsing (just
> make sure that the partition pruning works
> EXECUTE IMMEDIATE 'select /*+ look how to use parallel hint */ count(*)
> from huge_table partition('||p_part_name||')' into v_count;
> RETURN v_count;
>
> EXCEPTION
>
> when others then
>
> return 0;
> END fu_count_part;
>
>
>
> function fu_count(p_cursor t_parallel_test_ref_cursor) RETURN t2_list
> PIPELINED PARALLEL_ENABLE(PARTITION p_cursor BY HASH(column_value)) IS
> ret t2;
> x VARCHAR2(33);
> BEGIN
> LOOP
> FETCH p_cursor INTO x;
> exit when p_cursor%NOTFOUND;
> ret.id
> <https://urldefense.com/v3/__http:/ret.id__;!!Jkho33Y!w1JYGGiqBHQ42f2zBi1qLYakaWuRcTJMLEK2e97Hiw8tye54siYMHHQONRltggJkdQ$>
> := fu_count_part(x);
> PIPE ROW(ret);
> END LOOP;
> END fu_count;
>
> end pkg_test
>
>
>
> And then this will return the total count:
>
> SELECT count(*) FROM pkg_test.fu_count(CURSOR(SELECT /*+ parallel(t 16) */
> partition_name COLUMN_VALUE FROM table_hold_all_partitions_name t),16);
>
>
>
> If you really want to cut the query time to less than 10 minutes, just try
> getting the code above working.
>
>
>
> Best regards
>
> Ahmed
>
>
>
>
>
> -----Original-Nachricht-----
>
> Betreff: Re: Fastest way to count exact number of rows in a very large
> table
>
> Datum: 2020-10-05T21:58:55+0200
>
> Von: "Mladen Gogala" <gogala.mladen_at_gmail.com>
>
> An: "elizabeth.reen_at_citi.com" <elizabeth.reen_at_citi.com>, "Ashoke Mandal" <
> ramukam1983_at_gmail.com>
>
>
>
>
>
>
>
> You can also try the following:
>
> SELECT /*+ PARALLEL(16) */ 100000*count(*) FROM TABLE SAMPLE(0.001);
>
> That would give almost correct count of rows in the table. Accuracy would
> be similar to SELECT NUM_ROWS from USER_TABLES WHERE
> TABLE_NAME=<TABLE_NAME>;
>
>
>
> BTW, has anyone played with APPROX_FOR_COUNT_DISTINCT parameter in 19c?
>
> Regards
>
>
>
> On Mon, 2020-10-05 at 19:40 +0000, Reen, Elizabeth wrote:
>
> Since it is partitioned, why don’t you just count each
> partition separately? Have you run stats on the old partitions? Assuming
> that prior years do not change, you should be able to get a count there.
> If you do an import, then you can get the number of rows from the log. A
> transportable tablespace, will not mount if there is something wrong.
>
>
>
> Liz
>
>
>
> Liz Reen
> *CPB Database Administration*
>
>
>
> *Dev Servers: Oracle | ORasS | Golden Gate | Sybase | MS-SQL |
> MSaaS | Neo4j | HBase - Prod Servers: Golden Gate | Neo4j |
> HBase - Team Email: *CPB Database - Hotline: 1 718 248 1416*
>
>
>
> ServiceNow For: Oracle | OraaS | Golden Gate : *ICG NA EX US CPB ORACLE
> APPLICATION DBAS * – For Sybase | MS-SQL | Neo4j | HBase: *ICG NA EX CPB
> SYBASE APPLICATION DBAS*
>
>
>
>
>
> *From:* [gmail.com
> <https://urldefense.com/v3/__http:/gmail.com__;!!Jkho33Y!w1JYGGiqBHQ42f2zBi1qLYakaWuRcTJMLEK2e97Hiw8tye54siYMHHQONRlaOO1SKg$>]
> Ashoke Mandal <ramukam1983_at_gmail.com>
> *Sent:* Monday, October 5, 2020 3:14 PM
> *To:* Reen, Elizabeth [ICG-IT]
> *Cc:* andysayer_at_gmail.com; ahmed.fikri_at_t-online.de; list, oracle
> *Subject:* Re: Fastest way to count exact number of rows in a very large
> table
>
>
>
> Hello Ahmed/Andy/Jackson/Mark/Gogala/Liz, Thanks for your response to my
> posting. Even though my question was how to improve the query time of a
> query to check row count in a big table, you have also brought up many good
> points related to cross platform migration.
>
>
>
> Here is some information regarding my table:
>
> We have used transportable tablespace for data migration.
>
> The primary key consists of three columns (UT_ID, UT_SEQ, TEST_DATE).
>
> This table is partitioned by date and has one partition for every month.
> So, 12 partitions for every calendar year and has data for the last 20
> years.
>
> After adding a parallel hint as described below the query time went down
> from 2.2 hours to 42 min. *Let me know if you have any more
> recommendations to improve the query time for this select statement.*
>
>
>
> select /*+ parallel */ to_char(count(*), '999,999,999,999') from
> test_data;
>
>
>
> Ashoke
>
>
>
> On Mon, Oct 5, 2020 at 10:40 AM Reen, Elizabeth <elizabeth.reen_at_citi.com>
> wrote:
>
> We just completed such a transition. We kept the Oracle
> version the same so we could see the impact of Linux. Transportable
> tablespaces was how we did it. We were able to move a 17 terabyte database
> in under 10 hours.
>
>
>
>
>
> Liz
>
>
>
>
>
> *From:* [External] oracle-l-bounce_at_freelists.org <
> oracle-l-bounce_at_freelists.org> *On Behalf Of *[External] Andy Sayer
> *Sent:* Friday, October 2, 2020 3:09 PM
> *To:* ahmed.fikri_at_t-online.de
> *Cc:* list, oracle; ramukam1983_at_gmail.com
> *Subject:* Re: Fastest way to count exact number of rows in a very large
> table
>
>
>
> Just because a table has the same number of rows, it doesn’t mean it has
> the same data. With 108 billion rows, your data is going to be changing
> quickly, in order to get accurate counts at the right point in time you’re
> going to end up keeping your application offline for a window before and
> after your migration.
>
>
>
> What you need to do is determine where you expect data to go missing and
> work out a way to check.
>
>
>
> This will depend on how you’re doing your migration, I would suggest you
> use Cross-Platform Transportable Tablespaces (Doc Id *371556.1)* as that
> would allow you to do a physical import and just convert the files to the
> right endianness. This starts by making sure all data has been written to
> your data files (so they can be read only on the source system). As you’re
> working with the physical data files rather than the logical data (rows in
> tables), the only way you’re going to loose rows is by corrupting your
> files. You can check for corruption using RMAN once you’ve imported the
> converted files. No need to count all your rows, and no need to hope that
> that’s all you need to compare.
>
>
>
> Hope that helps,
>
> Andy
>
>
>
>
>
>
>
> On Fri, 2 Oct 2020 at 19:38, ahmed.fikri_at_t-online.de <
> ahmed.fikri_at_t-online.de> wrote:
>
> Hi Ashoke,
>
>
>
>
>
>
>
> could you send the execute plan of the query too? I think there is no
> general approach for that, it depends on several factors: whether the table
> has indexes (normal/bitmap) and in case the table has indexes the size of
> the table compared to the existing index...... But generally parallel
> processing should help.
>
>
>
>
>
>
>
> Best regards
>
>
>
> Ahmed
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -----Original-Nachricht-----
>
>
>
> Betreff: Fastest way to count exact number of rows in a very large table
>
>
>
> Datum: 2020-10-02T19:45:19+0200
>
>
>
> Von: "Ashoke Mandal" <ramukam1983_at_gmail.com>
>
>
>
> An: "ORACLE-L" <oracle-l_at_freelists.org>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Dear All,
>
> I have a table with 108 billion rows and migrating this database from
> Oracle 11g on Solaris to Oracle 12c on Linux.
>
>
>
>
>
>
>
> After the migration I need to compare the row count of this table in both
> the source DB and the destination DB. It takes almost two hours to get the
> row count from this table.
>
>
>
> SQL> select to_char(count(*), '999,999,999,999') from test_data;
>
> TO_CHAR(COUNT(*)
> ----------------
> 108,424,262,144
> Elapsed: 02:22:46.18
>
>
>
>
>
>
>
> Could you please suggest some tips to get the row count faster so that it
> reduces the cut-over downtime.
>
>
>
>
>
>
>
> Thanks,
>
>
>
> Ashoke
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Mladen Gogala
> *Database Consultant *
> * Tel: (347) 321-1217*
>
> 
>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Oct 05 2020 - 23:33:16 CEST

Original text of this message