Skip navigation.

Pythian Group

Syndicate content
Love Your Data
Updated: 7 hours 51 min ago

What is Big Data and Do You Really Need it?

Wed, 2015-12-16 12:09

Enhancing efficiency and cost-effectiveness for any company on a data-driven path is imperative to survival in the modern business world. For those who are willing to push the envelope and evolve from a defensive operation into a more transformative competitor, understanding what Big Data is, and if you really need it, is essential.

In the following on-demand webinar, Pythian CTO, Alex Gorbachev, provides an in-depth guide to help you better understand what exactly big data is, it’s benefits and use cases, and how to determine whether or not your company needs to implement a big data project.

This webinar is essential if you’re planning or thinking about completing a big data initiative in 2016 or simply trying to answer the question, “what is big data”? Download this webinar to watch now!

Categories: DBA Blogs

Log Buffer #453: A Carnival of the Vanities for DBAs

Fri, 2015-12-11 08:38

This week, the Log Buffer Edition digs deep into the world of Oracle, SQL Server and MySQL and brings you some of the best blog posts around.

Oracle:

  • Regardless of what type of industry or business you are involved in, the bottom-line goal is to optimize sales; and that involves replacing any archaic tech processes with cutting-edge technology and substituting any existing chaos with results-driven clarity.
  • Oracle Private Cloud Appliance 2.1.1 Released.
  • Every version of the optimizer enhances existing mechanisms and introduces new features, while 12c has introduced some of the most sophisticated transformation to date.
  • PLSQL, syslog and the story of Bolas spiders.
  • Here is why you need to be super careful when using LOB’s within triggers.

SQL Server:

  • Phil Factor talks about late in the day for a DBA.
  • This article details SMKs, DMKs and certificates in SQL Server as they relate to Transparent Data Encryption and Encrypted Backups.
  • In traditional relational schema there can be a lot of one-to-many relationships (e.g. Person may have several phones, or several email addresses).
  • Building Apps for Windows 10 with Visual Studio 2015.
  • The GA of System Center Configuration Manager 1511.

MySQL:

  • rows_examined_per_scan, rows_produced_per_join: EXPLAIN FORMAT=JSON answers on question “What number of filtered rows mean?”.
  • Secure communications is a core component of a robust security policy, and MySQL Server 5.7.10 – the first maintenance release of MySQL Server 5.7 – introduces needed improvements in this area.
  • MariaDB 5.5.47 and updated connectors now available.
  • Google Cloud SQL is a fully managed database service that makes it easy to set-up, maintain, manage, and administer your relational MySQL databases in the cloud. Cloud SQL allows you to focus on your applications rather than administering your databases.
  • A long time ago, libmysqlclient came in two versions: one that was thread safe and one that wasn’t. But that was a long time ago. Since MySQL 5.5, the thread safe libmysqlclient_r library has just been a symlink to the libmysqlclient library, which has been thread safe at least since then.

 

Learn more about Pythian’s expertise in Oracle SQL Server & MySQL.

Categories: DBA Blogs

How to Decipher Oracle Internal Datatype Storage

Fri, 2015-12-11 08:31

What started out as an investigation into how the optimizer deals with predicates that are outside the known range of value became something else when I tried to determine just what Oracle believes low and high values of the range to be.

I didn’t expect to have anything to add to the topic, as it has been rather well covered; I just wanted to better understand it by creating a few examples that demonstrate what can happen.

As of yet, I have not yet gotten that far.

One of the first things I wanted to know for this is what Oracle believes the low and high values to be.

These can be seen in both DBA_TAB_COLUMNS and DBA_TAB_COL_STATISTICS in the LOW_VALUE and HIGH_VALUE columns.

The DBA_TAB_COL_STATISTICS view is preferred, as these columns are maintained in DBA_TAB_COLUMNS only for backward compatibility with Oracle 7.


SQL> desc dba_tab_col_statistics
 Name              Null?    Type
 ----------------- -------- ------------------------------------
 OWNER                      VARCHAR2(128)
 TABLE_NAME                 VARCHAR2(128)
 COLUMN_NAME                VARCHAR2(128)
 NUM_DISTINCT               NUMBER
 LOW_VALUE                  RAW(1000)
 HIGH_VALUE                 RAW(1000)
 DENSITY                    NUMBER
 NUM_NULLS                  NUMBER
 NUM_BUCKETS                NUMBER
 LAST_ANALYZED              DATE
 SAMPLE_SIZE                NUMBER
 GLOBAL_STATS               VARCHAR2(3)
 USER_STATS                 VARCHAR2(3)
 NOTES                      VARCHAR2(63)
 AVG_COL_LEN                NUMBER
 HISTOGRAM                  VARCHAR2(15)
 SCOPE                      VARCHAR2(7)

The LOW_VALUE and HIGH_VALUE values are stored as RAW, so they must be using Oracle’s internal storage format for whichever datatype the column consists of.

Oracle does supply conversion routines via the DBMS_STATS package.

These routines are deployed as procedures. As Oracle 12c allows using functions defined in a SQL statement these procedures can be used in queries written for a 12c database.

Using the DBMS_STATS conversion procedure in databases < 12c requires creating functions so that the values may be returned to a SQL statement. While that method will work, it is often not desirable, and may not even be possible, particularly in a production database.

When I say ‘not even be possible’ what I mean is not that it cannot be done, but that doing so is probably not allowed in many databases.

To create a SQL statement that can show the high and low values, it will be necessary to use some other means.

Let’s start off by creating some data to work with.


define chars='ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz'

create table low_high
as
select id
   , mod(id,128) n1
   , substr('&&chars',mod(id,42)+1, 20) c1
   , sysdate-(mod(id,1000)+1) d1
from (
   select level id from dual
   connect by level <= 128 * 1024
)
/

exec dbms_stats.gather_table_stats(ownname => user, tabname => 'LOW_HIGH', method_opt => 'for all columns size auto')

Now that we have a table, let's take a look a the ranges of values.
 Note: I am using the _TAB_COLUMNS views for some queries just for simplification of the SQL for demonstration.
col low_value format a40
col high_value format a40

prompt
prompt NUMERIC
prompt

select column_name, low_value, high_value
from user_tab_columns
where table_name = 'LOW_HIGH'
   and data_type = 'NUMBER'
/

prompt
prompt VARCHAR2
prompt

select column_name, low_value, high_value
from user_tab_columns
where table_name = 'LOW_HIGH'
   and data_type = 'VARCHAR2'
/

prompt
prompt DATE
prompt

select column_name, low_value, high_value
from user_tab_columns
where table_name = 'LOW_HIGH'
   and data_type = 'DATE'
/

NUMERIC

COLUMN LOW_VALUE                                HIGH_VALUE
------ ---------------------------------------- ----------------------------------------
ID     C102                                     C30E0B49
N1     80                                       C2021C

2 rows selected.

VARCHAR2

COLUMN LOW_VALUE                                HIGH_VALUE
------ ---------------------------------------- ----------------------------------------
C1     303132333435363738396162636465666768696A 666768696A6B6C6D6E6F70717273747576777879

1 row selected.

DATE

COLUMN LOW_VALUE                                HIGH_VALUE
------ ---------------------------------------- ----------------------------------------
D1     7871030D121C04                           78730C07121C04

1 row selected.

Clearly the values being stored for LOW_VALUE and HIGH_VALUE are of little use to us in their current format.

What can we do?

For the NUMBER and character data types (VARCHAR2, VARCHAR, CHAR) the package UTL_RAW can be used to get the actual values.

Here is an example of converting some of these to a human readable format.


col low_value format 999999999999
col high_value format 999999999999

select  column_name
   , utl_raw.cast_to_number(low_value) low_value
   , utl_raw.cast_to_number(high_value) high_value
from user_tab_columns
where table_name = 'LOW_HIGH'
   and data_type = 'NUMBER'
/

col low_value format a20
col high_value format a20

select  column_name
   , utl_raw.cast_to_varchar2(low_value) low_value
   , utl_raw.cast_to_varchar2(high_value) high_value
from user_tab_columns
where table_name = 'LOW_HIGH'
   and data_type = 'VARCHAR2'
/

COLUMN                             LOW_VALUE    HIGH_VALUE
------------------------------ ------------- -------------
N1                                         0           127
ID                                         1        131072

2 rows selected.

COLUMN                         LOW_VALUE            HIGH_VALUE
------------------------------ -------------------- --------------------
C1                             0123456789abcdefghij fghijklmnopqrstuvwxy

1 row selected.

These values can be verified, as shown here with the N1 column:

SQL> select min(n1), max(n1) from low_high;

 MIN(N1) MAX(N1)
---------- ----------
 0 127

1 row selected.

So far I have done this only with these simple versions of these data types.
Variations such as NVARCHAR2, BINARY_FLOAT and others may require different handling.

What is missing? The DATE column has not yet been handled.

Converting the raw date format to a readable date is not so straightforward as there does not seem to be any conversion function available for that (If you know of one, please write about it in the comments section of this article).

 

Oracle DATE Format

First it will be necessary to know how Oracle stores a date in the database. Oracle’s internal date format has been documented a number of times and is well known, such as in the following Oracle Support Note:

How does Oracle store the DATE datatype internally? (Doc ID 69028.1)

Oracle dates consist of seven parts: century, year, month of the year, day of the month, and the hours, minutes and seconds after midnight.

The internal representation of this format can be seen by running the script in Example 1.

 

Example 1: dumping the internal date format

alter session set nls_date_format = 'mm/dd/yyyy hh24:mi:ss';
col today format a40
drop table t1;

create table t1
as select sysdate today
from dual;

select to_char(today) today
from t1
union
select dump(today) today
from t1;

TODAY
----------------------------------------
12/09/2015 13:13:57
Typ=12 Len=7: 120,115,12,9,14,14,58

2 rows selected.

The hour, minute and second are all stored in excess-1 notation, so 1 must be subtracted from them to get the correct time. Using excess-1 notation prevents a zero byte from being stored.

The month and day are both stored with the actual value, which can be seen in the SELECT output.

The values for the century and year are stored in excess-100 notation.

This means that 100 must be subtracted from the value before using it.

In the case of the date in Example 1 the year is clearly seen by subtracting 100 from 104.
The century is somewhat different. Not only must 100 be subtracted from the value, it must then be multiplied by 100.

The following example demontrates how the components of a date can be extracted from the information returned by the dump() function.

col cyear format 9999
col month format a2
col day format a2
col hour format 99
col minute format 99
col second format 99

select
        -- extract the century and year information from the
        -- internal date format
        -- century = (century byte -100) * 100
        (
                to_number(
                        -- parse out integer appearing before first comma
                        substr( startup_dump, 1, instr(startup_dump,',')-1) - 100
                ) * 100
        )
        +
        -- year = year byte - 100
        (
                to_number(
                        substr(
                                startup_dump,
                                -- get position of 2nd comma
                                instr(startup_dump,',',2)+1,
                                -- get position of 2nd comma - position of 1st comma
                                instr(startup_dump,',',1,2) - instr(startup_dump,',',1,1) -1
                        )
                )
                - 100
        ) cyear
         , substr(
            startup_dump,
            instr(startup_dump,',',1,2)+1,
            instr(startup_dump,',',1,3) - instr(startup_dump,',',1,2) -1
         ) month
         , substr(
            startup_dump,
            instr(startup_dump,',',1,3)+1,
            instr(startup_dump,',',1,4) - instr(startup_dump,',',1,3) -1
         ) day
         , to_number(substr(
            startup_dump,
            instr(startup_dump,',',1,4)+1,
            instr(startup_dump,',',1,5) - instr(startup_dump,',',1,4) -1
         ))-1 hour
         , to_number(substr(
            startup_dump,
            instr(startup_dump,',',1,5)+1,
            instr(startup_dump,',',1,6) - instr(startup_dump,',',1,5) -1
         ))-1 minute
         , to_number(substr(
            startup_dump,
            instr(startup_dump,',',1,6)+1
         ))-1 second
from (
        -- return just the date bytes from the dump()
        select substr(dump(startup_time),15) startup_dump
        from v$instance
) a

SQL> /

CYEAR MO DA HOUR MINUTE SECOND
----- -- -- ---- ------ ------
 2015 11 18   17     33     32

1 row selected.

Note: the internal format for SYSDATE is not the same as dates stored in a table.
This is also true for TIMESTAMP and SYSTIMESTAMP.

The internal format for TIMESTAMP columns can be seen in this OraFaq Article.

 

Putting it All Together

So, now we can make use of this to examine the values Oracle stores to bind the ranges of columns, this time including the DATE columns.

col low_value format a20
col high_value format a20
col table_name format a10 head 'TABLE'
col data_type format a20
col column_name format a6 head 'COLUMN'

set linesize 200 trimspool on
set pagesize 60

select
   us.table_name,
   uc.data_type,
   us.column_name,
   case
      when uc.data_type in ('VARCHAR2','VARCHAR','CHAR')  then
         utl_raw.cast_to_varchar2(us.low_value)
      when uc.data_type = 'NUMBER' then
         to_char(utl_raw.cast_to_number(us.low_value) )
      when uc.data_type = 'DATE' then
         -- extract the century and year information from the
         -- internal date format
         -- century = (century byte -100) * 100
         to_char((
            to_number(
                  -- parse out integer appearing before first comma
                  substr( substr(dump(us.low_value),15), 1, instr(substr(dump(us.low_value),15),',')-1) - 100
            ) * 100
         )
         +
         -- year = year byte - 100
         (
            to_number(
                  substr(
                     substr(dump(us.low_value),15),
                     -- get position of 2nd comma
                     instr(substr(dump(us.low_value),15),',',2)+1,
                     -- get position of 2nd comma - position of 1st comma
                     instr(substr(dump(us.low_value),15),',',1,2) - instr(substr(dump(us.low_value),15),',',1,1) -1
                  )
            )
            - 100
         )) --current_year
                  || '-' ||
                  lpad(
                     substr(
                        substr(dump(us.low_value),15),
                        instr(substr(dump(us.low_value),15),',',1,2)+1,
                        instr(substr(dump(us.low_value),15),',',1,3) - instr(substr(dump(us.low_value),15),',',1,2) -1
                     ) -- month
                     ,2,'0'
                  )
                  ||  '-' ||
                  lpad(
                     substr(
                        substr(dump(us.low_value),15),
                        instr(substr(dump(us.low_value),15),',',1,3)+1,
                        instr(substr(dump(us.low_value),15),',',1,4) - instr(substr(dump(us.low_value),15),',',1,3) -1
                     ) -- day
                     ,2,'0'
                  )
                  || ' ' ||
                  lpad(
                     to_char(to_number(
                        substr(
                              substr(dump(us.low_value),15),
                              instr(substr(dump(us.low_value),15),',',1,4)+1,
                              instr(substr(dump(us.low_value),15),',',1,5) - instr(substr(dump(us.low_value),15),',',1,4) -1
                        )
                     )-1)
                     ,2,'0'
                  ) -- hour
                  || ':' ||
                  lpad(
                     to_char(
                        to_number(
                              substr(
                              substr(dump(us.low_value),15),
                              instr(substr(dump(us.low_value),15),',',1,5)+1,
                              instr(substr(dump(us.low_value),15),',',1,6) - instr(substr(dump(us.low_value),15),',',1,5) -1
                              )
                        )-1
                     )
                     ,2,'0'
                  ) -- minute
                  || ':' ||
                  lpad(
                     to_char(
                        to_number(
                              substr(
                              substr(dump(us.low_value),15),
                              instr(substr(dump(us.low_value),15),',',1,6)+1
                              )
                        )-1
                     )
                     ,2,'0'
                  ) --second
         else 'NOT SUPPORTED'
         end low_value,
         -- get the high value
   case
      when uc.data_type in ('VARCHAR2','VARCHAR','CHAR')  then
         utl_raw.cast_to_varchar2(us.high_value)
      when uc.data_type = 'NUMBER' then
         to_char(utl_raw.cast_to_number(us.high_value) )
      when uc.data_type = 'DATE' then
         -- extract the century and year information from the
         -- internal date format
         -- century = (century byte -100) * 100
         to_char((
            to_number(
                  -- parse out integer appearing before first comma
                  substr( substr(dump(us.high_value),15), 1, instr(substr(dump(us.high_value),15),',')-1) - 100
            ) * 100
         )
         +
         -- year = year byte - 100
         (
            to_number(
                  substr(
                     substr(dump(us.high_value),15),
                     -- get position of 2nd comma
                     instr(substr(dump(us.high_value),15),',',2)+1,
                     -- get position of 2nd comma - position of 1st comma
                     instr(substr(dump(us.high_value),15),',',1,2) - instr(substr(dump(us.high_value),15),',',1,1) -1
                  )
            )
            - 100
         )) --current_year
                  || '-' ||
                  lpad(
                     substr(
                        substr(dump(us.high_value),15),
                        instr(substr(dump(us.high_value),15),',',1,2)+1,
                        instr(substr(dump(us.high_value),15),',',1,3) - instr(substr(dump(us.high_value),15),',',1,2) -1
                     ) -- month
                     ,2,'0'
                  )
                  ||  '-' ||
                  lpad(
                     substr(
                        substr(dump(us.high_value),15),
                        instr(substr(dump(us.high_value),15),',',1,3)+1,
                        instr(substr(dump(us.high_value),15),',',1,4) - instr(substr(dump(us.high_value),15),',',1,3) -1
                     ) -- day
                     ,2,'0'
                  )
                  || ' ' ||
                  lpad(
                     to_char(to_number(
                        substr(
                              substr(dump(us.high_value),15),
                              instr(substr(dump(us.high_value),15),',',1,4)+1,
                              instr(substr(dump(us.high_value),15),',',1,5) - instr(substr(dump(us.high_value),15),',',1,4) -1
                        )
                     )-1)
                     ,2,'0'
                  ) -- hour
                  || ':' ||
                  lpad(
                     to_char(
                        to_number(
                              substr(
                              substr(dump(us.high_value),15),
                              instr(substr(dump(us.high_value),15),',',1,5)+1,
                              instr(substr(dump(us.high_value),15),',',1,6) - instr(substr(dump(us.high_value),15),',',1,5) -1
                              )
                        )-1
                     )
                     ,2,'0'
                  ) -- minute
                  || ':' ||
                  lpad(
                     to_char(
                        to_number(
                              substr(
                              substr(dump(us.high_value),15),
                              instr(substr(dump(us.high_value),15),',',1,6)+1
                              )
                        )-1
                     )
                     ,2,'0'
                  ) --second
         else 'NOT SUPPORTED'
         end high_value
from all_tab_col_statistics us
join all_tab_columns uc on uc.owner = us.owner
   and uc.table_name = us.table_name
   and uc.column_name = us.column_name
   and us.owner = USER
   and us.table_name = 'LOW_HIGH'
order by uc.column_id

SQL&amp;gt; /

TABLE      DATA_TYPE            COLUMN LOW_VALUE            HIGH_VALUE
---------- -------------------- ------ -------------------- --------------------
LOW_HIGH   NUMBER               ID     1                    131072
LOW_HIGH   NUMBER               N1     0                    127
LOW_HIGH   VARCHAR2             C1     0123456789abcdefghij fghijklmnopqrstuvwxy
LOW_HIGH   DATE                 D1     2013-03-13 17:27:03  2015-12-07 17:27:03

4 rows selected.

Verify the D1 column values

SQL>  select min(d1) min_d1, max(d1) max_d1 from low_high;

MIN_D1              MAX_D1
------------------- -------------------
2013-03-13 17:27:03 2015-12-07 17:27:03

1 row selected.

And there you have it. We can now see in human readable form the low and high values that Oracle has stored for each column. While it is a rather complex SQL statement, it really is not difficult to understand once you know the purpose behind. And the beauty of this script is that no functions or procedures need to be created to make use of it.

If you would like add TIMESTAMP or any other value to the script, please, do so!
The SQL can be found here in the Low-High GitHub repo.

Now that the values can be viewed, the next task will be to put the script to use by using some examples to see how Oracle handles predicates outside the known range of values.

 

Categories: DBA Blogs

Benchmarking Google Cloud SQL Instances

Thu, 2015-12-10 11:00

Google Cloud SQL is a fully managed database service that makes it easy to set-up, maintain, manage, and administer your relational MySQL databases in the cloud. Cloud SQL allows you to focus on your applications rather than administering your databases. Hosted on Google Cloud Platform, Cloud SQL provides a database infrastructure for applications running anywhere.

To evaluate the performance of Google’s Cloud SQL Instances, we ran the tpcc-mysql benchmarking utility with a scale factor of 500 warehouses. The purpose of running tpcc-mysql on each system was to determine the throughput of each instance under varying loads (number of connections or threads).

While the benchmark tests were being run, we took regular snapshots of the following data using MySQL Workbench.

  • Number of select transactions executed per second
  • Number of insert transactions executed per second
  • Number of update transactions executed per second
  • InnoDB Disk Reads – MB/s
  • InnoDB Disk Reads – MB/s
Testing Methodology

Following is the methodology used for performing the tpcc benchmarks:

    1. All the testing was done with the tpcc-mysql package, with a scale factor of 500 Warehouses.
    2. All testing was done using a Google Compute VM
      • Ubuntu 15.04 machine
      • Specifications: n1-standard-4 – 4 vCPU, 15 GB Memory
      • Zone: us-central1-a
    3. Following Cloud SQL instances were bench-marked:
Cloud SQL InstanceCloud SQL Instance TypeZoneMemoryD8Standard instancesus-central1-a4GD16Standard instancesus-central1-a8GD32Standard instancesus-central1-a16Gdb-n1-standard-4Performance instances – Standardus-central1-a15Gdb-n1-standard-8Performance instances – Standardus-central1-a30Gdb-n1-standard-16Performance instances – Standardus-central1-a60Gdb-n1-highmem-2Performance instances – Hi Memoryus-central1-a13Gdb-n1-highmem-4Performance instances – Hi Memoryus-central1-a26Gdb-n1-highmem-8Performance instances – Hi Memoryus-central1-a52Gdb-n1-highmem-16Performance instances – Hi Memoryus-central1-a104G
  1. We used 100, 200, 300 and 400 concurrent user sessions (threads).
  2. We used a scale factor of 500 warehouses which roughly transforms to 50 GB data in the database.
  3. We performed four iterations of tests per instance by varying the number of connections (threads) per run. Tests were performed using the 100, 200, 300 and 400 threads.
  4. For each set of user sessions, we performed a half hour-long run, gathering data for new order transactions every 10 seconds.
  5. Following data was gathered:
    • Throughout expressed as tpm-C – Orders Processed per Minute
    • Number of select transactions executed per second
    • Number of insert transactions executed per second
    • Number of update transactions executed per second
    • InnoDB Disk Reads – MB/s
    • InnoDB Disk Reads – MB/s
Benchmarking Results – DML Transactions Per Minute

The following graph represents the transactions for each instance under varying loads for SELECT transactions.

  • For 100, 200 and 300 threads db-n1-highmem-16 outperformed the rest by a large margin
  • For 400 threads db-n1-highmem-8 and db-n1-highmem-16 performed the similar
  • db-n1-standard-8 and db-n1-standard-16 performed the same for all thread counts
  • Rest of the instances performed similarly

The following graph represents the transactions for each instance under varying loads for INSERT transactions.

  • For 100, 200 and 300 threads db-n1-highmem-16 outperformed the rest by a large margin
  • For 400 threads db-n1-highmem-8 and db-n1-highmem-16 performed same
  • db-n1-himem-4, db-n1-standard-8 and db-n1-standard-16 performed the same for all thread counts
  • Rest of the instances performed similarly

The following graph represents the transactions for each instance under varying loads for UPDATE transactions.

  • For 100, 200 and 300 threads db-n1-highmem-16 outperformed the rest by a large margin
  • db-n1-standard-8 and db-n1-standard-16 performed the same for all thread counts
  • Rest of the instances performed similarly
Conclusion

The following graph represents the overall throughput of all instances under varying loads. A few notable conclusions follow.

  • The greatest throughput was achieved at 100 threads
  • db-n1-standard-16 (60G) performed very close to db-n1-himem-16 (104G).
  • db-n1-himem-8 (52G) performance was appreciably lower than db-n1-standard-16 (60G)
  • Google Cloud SQL High Performance instances performed significantly better than currently being offered Google Cloud SQL Standard Class Instances by a scale of 100-200%

Want to learn more? Schedule an assessment with one of our cloud experts.

Categories: DBA Blogs

Why Locking Oracle Accounts is a Bad Idea

Thu, 2015-12-10 08:59

 

Time and time again I run into database accounts, which are marked “LOCKED” or “EXPIRED & LOCKED”. The main problem here lies with how Oracle handles a failed login attempt when the account is locked. In this blog I will discuss why locking Oracle accounts is a bad idea.

Let’s consider the following scenario:

create user scott identified by tiger account lock;

User created.

select username, account_status from dba_users where username='SCOTT';

USERNAME                       ACCOUNT_STATUS
------------------------------ --------------------------------
SCOTT                          LOCKED

 

So what happens if I put on my black hat, and try to get into this database? I may probe for some common users, and just happen to come across this:

connect scott/abc
ERROR:
ORA-28000: the account is locked
Warning: You are no longer connected to ORACLE.

 

What Oracle does there is give me a very valuable piece of information: it tells me that this user exists in the database. Why is that important?

Let’s see what we can find out – without even being able to connect, based solely on the account status of some common accounts:

 

USERNAME		       ACCOUNT_STATUS
------------------------------ --------------------------------
ANONYMOUS		       EXPIRED &amp; LOCKED
APEX_030200		       LOCKED
APEX_PUBLIC_USER	       LOCKED
CTXSYS			       EXPIRED &amp; LOCKED
DIP			       EXPIRED &amp; LOCKED
EXFSYS			       EXPIRED &amp; LOCKED
FLOWS_FILES		       LOCKED
OLAPSYS 		       EXPIRED &amp; LOCKED
ORACLE_OCM		       EXPIRED &amp; LOCKED
OUTLN			       EXPIRED &amp; LOCKED
SQLTXADMIN		       EXPIRED &amp; LOCKED
WMSYS			       EXPIRED &amp; LOCKED
XDB			       EXPIRED &amp; LOCKED
XS$NULL 		       EXPIRED &amp; LOCKED

 

Simply by trying to connect to some of these, and Oracle telling me that the account is locked, I now know that the database has all of the following installed:

 

– APEX
– OLAP
– Oracle Text
– XML Database

 

That’s a lot of information I was just given for free. Depending on the components I’d find, I could also deduce that the Oracle JVM is installed in the database. And this frequently hits the news with newly discovered vulnerabilities.

In essence this means that by locking your accounts, you leave the door open way wider than you’re thinking. It’s a totally counter-productive way of doing things.

So what’s better?

The best approach is a very simple one. Putting my white hat back on, I just assign the user an impossible password hash, like so:

alter user scott account unlock identified by values 'impossible';

 

It’s not possible for this user to ever log in while this hash is in place. And if we try, all we get is:

SQL&gt; connect scott/abc
ERROR:
ORA-01017: invalid username/password; logon denied</code>

 

Warning: You are no longer connected to ORACLE.

The second thing you’d want to do is ensure that those users’ passwords never expire. Or you’d end up with the same EXPIRED & LOCKED status again.

Happy unlocking, and stay secure! :)

 

Discover more about our expertise in the world of Oracle.

Categories: DBA Blogs

The End of an Era – Preparing for a SQL Server 2005 Upgrade

Wed, 2015-12-09 07:02

 

With the impending End of Support for SQL Server 2005 coming up in April 2016, most companies know that they need to find an upgrade path in order to benefit from the new features.

There are many upgrade paths and targets you can take to get out of SQL Server 2005 and into 2014 or even straight into the cloud with Azure SQL Database or Azure SQL DataWarehouse.

Here at Pythian we have created a handy 45 minute webinar to cover upgrade scenarios, upgrade tools and we are even share our own process on how we approach SQL Server upgrade projects. We’re also including demos of upgrading to 2014 and to Azure SQL Database.

As much as we’ve enjoyed working with SQL Server 2005, it’s time to say good bye and jump 10 years ahead.

 

And just for fun, let’s go down memory lane and analyze what an amazing milestone SQL Server 2005 was for Microsoft data professionals.

SSIS was introduced. And still going strong after 10 years with countless improvements.

DMVs and DMFs were introduced. How did we ever do engine troubleshooting before!? Such a huge difference from SQL 2000 and was the foundation of the insane amount of instrumentation that is built into the product today.

CLR was introduced. .NET into the engine, a whole new world of programming capabilities opened.

Row multi-versioning was introduced in the engine. Opening up new levels of concurrency and closing the gap on a major piece of functionality that Oracle had from day 1.

Table partitioning was introduced. Another major feature needed to take SQL Server into the VLDB and warehousing space.

Database mirroring introduced. Shared-nothing, easy high availability without the need of shared storage. Obviously the foundation of the current AlwaysOn Availability Groups.

DDL triggers, XML support, ranking functions, recursive CTEs, separation of user and schema. Not much to add here.

And that’s not a complete list. Truly, the 2005 release finally placed SQL Server as a real enterprise-grade relational database system. No wonder it took 5 years, as still to this day, SQL 2005 is the foundation that has taken us all the way to 2016 and the massive world of cloud relational databases on Azure. It also kick started many careers (mine included) as more and more companies began to see the potential of the product and adopted Microsoft’s data platform.

If you’re considering upgrading from SQL Server 2005, let us know. We’ll connect you with one of our SQL Server experts to review your requirements and discuss how we can work with your team to ensure the success of your upgrade.

Categories: DBA Blogs

Locks, Blocks and Deadlocks – What’s the Difference?

Fri, 2015-12-04 07:42

 

We are often being paged by development teams talking about locks, blocks or deadlocks and some people make the wrong use of the terms.

There is a big difference between the three and it will explained at a high level in this post:

 

Lock
Lock is acquired when any process accesses a piece of data where there is a chance that another concurrent process will need this piece of data as well at the same time. By locking the piece of data we ensure that we are able to action on that data the way we expect.

For example, if we read the data, we usually like to ensure that we read the latest data. If we update the data, we need to ensure no other process is updating it at the same time, etc.
Locking is the mechanism that SQL Server uses in order to protect data integrity during transactions.

 

Block
Block (or blocking lock) occurs when two processes need access to same piece of data concurrently so one process locks the data and the other one needs to wait for the other one to complete and release the lock. As soon as the first process is complete, the blocked process resumes operation. The blocking chain is like a queue: once the blocking process is complete, the next processes can continue. In a normal server environment, infrequent blocking locks are acceptable. But if blocking locks are common (rather than infrequent), there is probably some kind of design or query implementation problem and the blocking may simply be causing performance issues.
A block can be described like this:

Block

A blocking situation may NOT be resolved by itself (i.e. if the blocking process did not complete the transaction properly) or may take a long time to complete. In these extreme situations, the blocking process may need to be killed and/or redesigned.

 

Deadlock
Deadlock occurs when one process is blocked and waiting for a second process to complete its work and release locks, while the second process at the same time is blocked and waiting for the first process to release the lock.

In a simplified way, the deadlock would look like this:

Deadlock

In a deadlock situation, the processes are already blocking each other so there needs to be an external intervention to resolve the deadlock. For that reason, SQL Server has a deadlock detection and resolution mechanism where one process needs to be chosen as the “deadlock victim” and killed so that the other process can continue working. The victim process receives a very specific error message indicating that it was chosen as a deadlock victim and therefore the process can be restarted via code based on that error message.
This article explains how SQL Server detects and resolves deadlocks: Deadlocks are considered a critical situation in the database world because processes are just being automatically killed. Deadlocks can and should be prevented.Deadlocks are resolved by SQL Server and do not need manual intervention.

 

Lock-Avoiding Design Strategies

Some of the strategies are described here:

“There are a few design strategies that can reduce the occurrence of blocking locks and deadlocks:

  • Use clustered indexes on high-usage tables.
  • Avoid high row count SQL statements that can cause a table lock. For example, instead of inserting all rows from one table to another all at once, put a single INSERT statement in a loop and insert one row at a time.
  • Break long transactions up into many shorter transactions. With SQL Server, you can use “bound connections” to control the execution sequence of the shorter transactions.
  • Make sure that UPDATE and DELETE statements use an existing index.
  • If you use nested transactions, be sure there are no commit or rollback conflicts.

My additions:

  • Access objects always in the same order (i.e.: update Table1, Table2 and Table3 rather than sometimes Table2 first).
  • Don’t schedule long data updating processes to run concurrently, if possible.
  • Keep transactions as short as possible.

 

Discover more about our expertise in SQL Server.

Categories: DBA Blogs

Log Buffer #452: A Carnival of the Vanities for DBAs

Fri, 2015-12-04 07:28

This Log Buffer Edition covers top Oracle, SQL Server and MySQL blog posts of the week.

Oracle:

  • In Oracle EBS 12.0 and 12.1 the Workflow notification system was not enabled to send e-mail notifications to users or roles who happened to have multiple e-mail addresses associated to them.
  • Just how can a SQL Developer user quickly build out a SQL script for a database user that will include ALL of their privileges, roles, and system grants?
  • Oracle BI 12c has been released for some time now. There are a few changes in the way it is installed compared to the previous 11g releases. This post is about installing and configuring OBIEE 12c with detailed step-by-step instructions (Linux x86-64 in this case).
  • In today’s digital economy, customers want effortless engagements and answers to their questions regardless of how they connect with a brand.
  • Upgrade to Oracle Database 12c and Avoid Query Regression.

SQL Server:

  • Continuous integration (CI) is the process of ensuring that all code and related resources in a development project are integrated regularly and tested by an automated build system.
  • SSIS Issues after Master DB Corruption – “Please Recreate Master Key” When Running Package.
  • Check FileSize and LogUsage for all DBs.
  • Other Users Cannot Execute SSIS Packages after migration.
  • How to Get Started Using SQL Server in Azure.

MySQL:

  • Amazon Aurora in sys bench benchmarks.
  • “Data” and “Performance” is where MySQL Cluster’s heart is. In-memory performance and always-up drives our agenda. The Percona Live Data Performance Conference is coming up with two submitted sessions about Cluster.
  • Fixing errant transactions with mysqlslavetrx prior to a GTID failover.
  • MariaDB CONNECT storage engine handles access to JSON files through standard SQL. It comes with a set of UDFs (user defined functions) to manipulate the JSON format. This JSON content can be stored in a normal text column.
  • Become a ClusterControl DBA: Managing your Database Configurations.

 

Learn more about Pythian’s expertise in Oracle SQL Server & MySQL.

Categories: DBA Blogs

How to Troubleshoot an ORA-28030 Error

Fri, 2015-12-04 07:22

ORA-28030: Server encountered problems accessing LDAP directory service.
Cause: Unable to access LDAP directory service.
Action: Please contact your system administrator.

 

There are many reasons for causing this error when you are trying to login to the database with your oracle internet directory (OID) authentication. The error sample is shown as below:

SQL> conn howie@dbtest
Enter password:
ERROR:
ORA-28030: Server encountered problems accessing LDAP directory service


Warning: You are no longer connected to ORACLE.

 

Here how I usually troubleshoot this kind of issue. Two examples.

First of all, you need to enable the trace to dump the actual errors in the database:

SQL> alter system set events '28033 trace name context forever, level 9';

 

Sencond, regenerate the error:

SQL> conn howie@dbtest
Enter password:
ERROR:
ORA-28030: Server encountered problems accessing LDAP directory service

 

Third, disable the trace:

SQL> alter system set events '28033 trace name context off';

After checking the trace files, I found errors. This is related to the OID server lnx-ldap DNS configuration. Check /etc/hosts or DNS to make sure the OID server lnx-ldap or the port 3131 is reachable.

KZLD_ERR: failed to open connection to lnx-ldap:3131
KZLD_ERR: 28030
KZLD_ERR: failed from kzldob_open_bind.

Or you may see the error like this, this is because the wallet files were corrupted, you need to recreate the wallet, and make sure the wallet path is defined properly:

kzld_discover received ldaptype: OID
KZLD_ERR: failed to get cred from wallet
KZLD_ERR: Failed to bind to LDAP server. Err=28032
KZLD_ERR: 28032
KZLD is doing LDAP unbind
KZLD_ERR: found err from kzldini.

There are many possibilities to throw out ORA-28030, in this blog I am just simply giving you the hints for identifying the root cause.

Hope it helps!

 

Discover more about our expertise in the world of Oracle.

Categories: DBA Blogs

Step-by-Step Upgrades to Cloudera Manager and CDH

Fri, 2015-12-04 07:03

 

Introduction

Lately, several of our security conscious clients have expressed a desire to install and/or upgrade their Hadoop distribution on cluster nodes that do not have access to the internet. In such cases the installation needs to be performed using local repositories. Since I could not find a step-by-step procedure to accomplish this I thought I would publish it myself.

The following step-by-step procedure has been implemented using the following configuration and specifications:

Cloudera Manager Node : m3.large EC2 Instance running Centos 6.5 (CentOS-6.5-GA-03.3-f4325b48-37b0-405a-9847-236c64622e3e-ami-6be4dc02.2 (ami-8997afe0))
Name Node: m3.large EC2 Instance running Centos 6.5 (CentOS-6.5-GA-03.3-f4325b48-37b0-405a-9847-236c64622e3e-ami-6be4dc02.2 (ami-8997afe0))
Data Nodes (3): m3.large EC2 Instance running Centos 6.5 (CentOS-6.5-GA-03.3-f4325b48-37b0-405a-9847-236c64622e3e-ami-6be4dc02.2 (ami-8997afe0))

Existing Version of Cloudera Manager: 5.4.3
Existing Version of CDH: 5.4.2

Upgrade to Version of Cloudera Manager: 5.5.0
Upgrade to Version of CDH: 5.5.0

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.6 (Santiago)

# cat /proc/version
Linux version 2.6.32-504.16.2.el6.x86_64 (mockbuild@x86-028.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-9) (GCC) ) #1 SMP Tue Mar 10 17:01:00 EDT 2015

 

Upgrade Steps

We will be completing the upgrade of the cluster in two steps. In the first step only Cloudera Manager will be upgraded to version 5.5. Once the cluster has been verified to be functional with Cloudera Manager 5.5 then we will upgrade CDH to version 5.5.

 

1. Upgrade Cloudera Manager

 

1. Let’s start by creating the local repository for Cloudera Manager. Download latest version of Cloudera Manager from link below on Local Repository Host:

# wget -r –no-parent –reject “index.html*” “http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5.5/”# wget “http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/RPM-GPG-KEY-cloudera”2. Copy downloaded files to pub/repos/cloudera-manager directory on Local Repository Host. After that start a local web server with pub/repos root directory. You may use any webserver including Python SimpleHTTPServer or Apache. Following are steps to use the SimpleHTTPServer:# cd pub/repos# nohup python -m SimpleHTTPServer 8000 &Expected output for http://Local Repository Host:8000/pub/repos/cloudera-manager [/vc_column_text] 82485_1

Expected output for http://Local Repository Host:8000/pub/repos/cloudera-manager/RPMS/x86_64

82485_2

3. Make sure the local repository for Cloudera Manager on ap-hdpen1t.oneadr.net:$ cat /etc/yum.repos.d/cloudera-manager.repo
[cloudera-manager] name=Cloudera Manager package mirror
baseurl=http://Local Repository Host:8000/pub/repos/cloudera-manager
gpgkey=http://Local Repository Host:8000/pub/repos/cloudera-manager/RPM-GPG-KEY-cloudera
gpgcheck=14. Log on to Cloudera Manager. Stop Cloudera Management Service:

82485_3

5. Make sure all services are stopped. Sample screens after stopping below:

82485_4

6. Stop the Hadoop Cluster:

7. SSH to Cloudera Manager Server. Stop Cloudera Manager Service:# sudo service cloudera-scm-server status
cloudera-scm-server (pid 6963) is running…
# sudo service cloudera-scm-server stop
Stopping cloudera-scm-server: [ OK ] # sudo service cloudera-scm-server status
cloudera-scm-server is stopped8. Before proceeding with the upgrade make sure you backup the Cloudera Manager Databases used by CDH services like Hive Metastore, Oozie, Sentry etc.9. When you are ready to upgrade issue command to upgrade Cloudera Manager:

# yum upgrade cloudera-manager-server cloudera-manager-daemons

Make sure the Upgrade Version for Cloudera Manager is as below:

82485_8

10. To verify if the upgrade is successful issue the following command:# rpm -qa ‘cloudera-manager-*’
cloudera-manager-daemons-5.5.0-1.cm550.p0.61.el6.x86_64
cloudera-manager-agent-5.5.0-1.cm550.p0.61.el6.x86_64
cloudera-manager-server-5.5.0-1.cm550.p0.61.el6.x86_6411. Start the Cloudera Manager:# service cloudera-scm-server start
Starting cloudera-scm-server: [ OK ]12. Monitor the Cloudera Manager Server Log for errors. The Cloudera Manager Server console is ready for use once you see the “Started Jetty Server” message in the log:# tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log13. Log on to Cloudera Manager. You should now see the following screen. Note the running version:

82485_9

14. Choose Option as below to upgrade Cloudera Manager Agents. Press Continue:

82485_10

15. Choose Custom Repository:

In first box add: http://Local Repository Host:8000/pub/repos/cloudera-manager
In second box add: http://Local Repository Host:8000/pub/repos/cloudera-manager/RPM-GPG-KEY-cloudera

82485_11

Press Continue.

 

16. Check JDK/Java options as below and press Continue:

82485_12

17. Provide SSH credentials and Press Continue:

82485_13

18. Cloudera Manager will now upgrade the Agents:

82485_14

19. Verify Completion. Press Continue:

82485_15

20. Inspect Hosts for Correctness. Press Continue:

82485_16

21. You should now see a Confirmation Screen as below:

82485_17

22. Upgrade Cloudera Management Service. Press Continue:

82485_18

23. Confirm Restart of Cloudera Management Service:

82485_19

24. Verify Cloudera Management Service restarted. Press Finish:

82485_20

25. On the Cloudera Manager Home Screen. Choose Deploy Client Configuration:

82485_21

26. Verify Client Configurations Deployed:

82485_22

27. Start the Cluster:

82485_23

28. Verify Services on the Cluster are Active:

82485_24

29. Verify Cloudera Manager Version:

82485_25

30. Verify Agents Upgraded. Issue the following commands on all nodes:

# rpm -qa ‘cloudera-manager-*’
cloudera-manager-daemons-5.5.0-1.cm550.p0.61.el6.x86_64
cloudera-manager-agent-5.5.0-1.cm550.p0.61.el6.x86_64

31. Congratulations. Upgrade of Cloudera Manager was successful:

 

2. Upgrade Cloudera DistributionNow that the Cloudera Manager has been upgraded lets upgrade CDH to version 5.5.1. Download latest version of CDH from link below on Local Repository Host:wget http://archive.cloudera.com/cdh5/parcels/5.5/CDH-5.5.0-1.cdh5.5.0.p0.8-el6.parcel
wget http://archive.cloudera.com/cdh5/parcels/5.5/CDH-5.5.0-1.cdh5.5.0.p0.8-el6.parcel.sha1
wget http://archive.cloudera.com/cdh5/parcels/5.5//manifest.json2. Create/Refresh the local repository for Cloudera Manager by copying the downloaded files to pub/repos/cloudera-cdh5/ directory on Local Repository Host.
Expected output for http://Local Repository Host:8000/pub/repos/cloudera-cdh5/

82485_26

3. Back up HDFS metadata using the following command:$ whoami
hdfs
$ hdfs dfsadmin -fetchImage ~
15/11/27 19:23:58 INFO namenode.TransferFsImage: Opening connection to http://ip-10-169-250-118.ec2.internal:50070/imagetransfer?getimage=1&txid=latest
15/11/27 19:23:58 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
15/11/27 19:23:58 INFO namenode.TransferFsImage: Transfer took 0.09s at 2715.91 KB/s
$ ls -l
total 244
-rw-rw-r–. 1 hdfs hdfs 244838 Nov 27 19:23 fsimage_00000000000000154184. Backup databases used for the various CDH services. The following screen shows the databases details used for various services like Oozie, HUE, Sentry etc:

82485_27

5. Log on to Cloudera Manager.6. Verify the parcel download setting is pointing to the local repository for CDH. Press the Parcels icon on the Cloudera Manager Home Page. Press Edit settings:

82485_28

7. Choose the following Option to start upgrade of CDH:

82485_29

8. Choose version 5.5:

82485_30

9. Make sure you have backed up all databases:

82485_31

10. The following screen indicates that we are all set to proceed. Press Continue:

82485_32

11. CDH Version 5.5 parcels will now be downloaded, distributed to all nodes and unpacked. Press Continue:

82485_33

12. Hosts will be inspected for correctness. Press Continue:

82485_34

13. Verify that no party is using the HH-TEST Cluster. Choose Full Cluster Restart. Press Continue:

82485_35

14. The HH-TEST cluster will now be stopped. Upgraded and restarted. Press Continue:

82485_36

15. Confirmation screen show now show the upgraded version of CDH. Press Continue:

82485_37

16. Review additional post-upgrade instructions. Press Finish.17. Verify CDH version on Cloudera Manager Home Page:

82485_38

18. Verify CDH version on back-end. SSH to any node in the cluster:$ hadoop version
Hadoop 2.6.0-cdh5.5.0
Subversion http://github.com/cloudera/hadoop -r fd21232cef7b8c1f536965897ce20f50b83ee7b2
Compiled by jenkins on 2015-11-09T20:37Z
Compiled with protoc 2.5.0
From source with checksum 98e07176d1787150a6a9c087627562c
This command was run using /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars/hadoop-common-2.6.0-cdh5.5.0.jar
$ hadoop fs -ls /
Found 3 items
drwxrwxr-x – solr solr 0 2015-11-26 20:58 /solr
drwxrwxrwt – hdfs supergroup 0 2015-11-27 02:29 /tmp
drwxr-xr-x – hdfs supergroup 0 2015-11-27 02:29 /user19. This completes the upgrade of CDH:

[/vc_column][/vc_row]

 

Discover more about our expertise in Hadoop.

Categories: DBA Blogs

DATA COLLECTOR AND POLICY-BASED MANAGEMENT: PART 3

Fri, 2015-12-04 06:32

 

In my previous post I talked about how create and use a policy against your SQL Server Instance (don’t forget to check our part 1 in this series too). Now we will talk about the Data Collector and how configure it on your environment.

What is the Data Collector?

The Data Collector (DC) is one of the main components in the set of tools for data collection provided by SQL Server. With the DC we can define a centralized point for storage of all collected metrics through the instances of the SQL Server database in your infrastructure, and these metrics can be from multiple sources and not just related to performance metrics.

To increase the efficiency of the metrics collected, you must adjust the DC according to each existing infrastructure environment (development, approval, production). The DC stores all the collected information in a Management Data Warehouse (MDW) and allows you to set different retention periods for each metric that will be collected.

As the DC has a programming interface (API), we can customize collections for any other type of desired metric. However, in this article, we will focus only on the three collections of the DC system: Disk Usage, Query Activity and Server Activity. Figure 28 shows how the DC fits in the strategy for collecting and managing data in a SQL Server database.

 

Image28
Figure 28. Data collection strategy.

 

The Data Collector Architecture

Before starting the implementation of the DC, it is necessary to understand which components are part of this feature. They are:

  • Target: An instance of the SQL Server database that supports the process of collecting metrics by using the DC.
  • Target Type: Defines the type of target which will collect metrics. For example, an instance of SQL Server database has different metrics than the metrics collected from a SQL Server database itself.
  • Data provider: A data source that will provide metrics for the collector type.
  • Collector Type: A delimiter for the packages in the SQL Server Integration Service (SSIS), which provides the mechanism for the collection and storage of the metrics in the MDW.
  • Collection Item: Is a collection item in which are defined which the metrics will be collected, how often this gathering will be held and what is the retention time of the metric stored.
  • Collector Set: A set of Collection Items.
  • Collection Mode: The way that the metrics will be collected and stored in the MDW. The metrics can be collected on an ongoing basis (Cached Mode) or through a scheduling sporadic (Non-Cached Mode).
  • Management Data Warehouse (MDW): The relational database used to store all the collected metrics.

 Note: In SQL Server 2014 we have the following collector types: Generic T-SQL Query, Generic SQL Trace, Performance Counters and Query Activity.

The Figure 29 shows the dependencies and relationships between the components of the DC.

 

Image29
Figure 29. Relationship between the components of the DC.

The data provider is an external component in DC architecture and which, by definition, has an implicit relationship with the target. A data provider is specific to a particular target and provides metrics through views, performance counters and components of Windows Management Instrumentation (WMI) are consumed by the DC.

We can visualize from Figure 29 that a collector type is associated to a particular target, and that this relationship also defines how the metrics will be collected and what the storage schema of these metrics, for the collector type also provides the location of MDW, which can be on the server that is running the collection or on a centralized server.

A collection item has a default and collection frequency that can only be created within a collector set. The collector set, in turn, is created on the instance of SQL Server that will be monitored through the DC and consists of one or more collection items. The collection of the set of metrics defined in the collector set is accomplished through Jobs executed by the SQL Server Agent service, and the metrics collected are stored in the MDW periodically through predefined schedules.

The Figure 30 shows a collector set called system Disk Usage, in which we visualize that the configuration was performed with the collection mode set to Non-Cached, using two collection items of type Generic T-SQL Query Collector Type, and that the metrics are collected every 60 seconds, with retention of these metrics in the MDW for 730 days.

 

Image30
Figure 30. Definition of the collector system set Disk Usage.

It is important to note that the DC is fully integrated with the SQL Server Agent service and using Integration Services, using both intensively. After the DC configuration, the process of collecting and recording of metrics is accomplished by a set of SQL Server Agent Jobs created and started automatically.

 

Management Data Warehouse (MDW)

So we can use the metric collection through the DC, though you must first perform the configuration of the MDW that will be the relational database responsible for storing all the metrics collected by the collector sets.

For this we can use an existing relational database and configure it as a MDW. However, it is recommended that you set a new database, because during the configuration process of the MDW several schemas and tables relating to DC will be created. The schemas are generated automatically after the configuration of DC are the core and the snapshot. A third schema, custom_snapshots, name will be created when a collector set as customized is set by the administrator of the Bank.

The main schema of the MDW is the core, because it has the tables, stored procedures, and views that are available to all collector types that will also be used to organize and identify the metrics collected. To ensure the integrity and security of MDW, all database objects belonging to the core schema can only be changed by members of the Profiles database db_owner and mdw_admin.

The Table 2 lists all the existing tables in the core schema and their respective descriptions.

Table2

Table 2. Core schema tables.

 

The schema snapshot, in turn, owns the objects required for the storage of collected metrics through the system collection sets. The tables in this schema can only be changed by members belonging to the database profile mdw_admin.

The Table 3 illustrates which tables are used by collection sets of Server Activity system and Query Statistics, created after the setting of DC.

Table3
Table 3. Tables used by collection sets.

 

Already the custom_snapshot schema has the tables and views that were created when a custom collection set has been configured. Any custom collection set that you need a new table for to store collected metrics can create tables in this schema. The tables can be added by any member of the mdw_writer database.

 

Configuring the Data Collector

To exemplify the metric using the DC collection, we have the instance VITADB\SQLCMS, responsible for hosting the MDW database, and instances VITADB\SQLINSTANCE1 and VITADB\SQLINSTANCE2, which will have your metrics collected through the collector sets. That said, the first step to setting the DC is the creation of the MDW in VITADB\SQLCMSinstance, as the following steps:

1)   Through the Object Explorer, select the folder Management.

2)    Right-click on Data Collection -> Task -> Configure Management data warehouse.

3)     In the dialog box that appears (see Figure 31), select the VITADB\SQLCMS instance and create the MDW database via the button New.

4)     Select which logins have access to the MDW database ( Figure 32), and then click Finish.

 

Image31
Figure 31. Creation of the MDW.

 

Image32
Figure 32. Definition of permissions to the MDW.

 

Note: The members of the mdw_admin database have permission of SELECT, INSERT, UPDATE, and DELETE, in addition to being able to change any .MDW schema and perform the Jobs of maintaining DC. The members of the mdw_writer database have permission to upload the collected metrics to the MDW. Already members of the database profile mdw_reader have only SELECT permission on MDW.

After the creation and configuration of the MDW in VITADB\SQLCMS instance, you must start the process of collecting metrics on instances VITADB\SQLINSTANCE1 and VITADB\SQLINSTANCE2 by setting the collector sets of system in each of the instances and directing the metrics collected for the MDW database.

For the configuration of the collector sets of a system, we have the following steps:

1)  Through the Object Explorer, select the folder Management.

2)  Right-click on Data Collection -> Tasks -> Configure Data Collection.

3)  In the dialog box that appears (Figure 33), connect on VITADB\SQLCMS instance, select the MDW database, the collector set that you want, and then click Finish. For this example we will use the collector set of system System Data Collection Sets, which will be automatically created after Setup.

 

Image33
Figure 33. Collector set definition.

Complete the configuration of the collection, we’re creating three collector sets of system: Disk Usage, Query Statistics and Server Activity. The collector set Disk Usage collects metrics about the growth of data files (.mdf and .NDF) and log files (.ldf) user databases and existing systems in instance monitored by DC. With this information it is possible to know what the daily growth trend is, in MB, of the files examined.

Table 4 shows the properties of the collector set of system Disk Usage.

Table4

Table 4. Collector set properties of system Disk Usage.

In turn, the collector set of Server Activity system collects server activity metrics, statistics, performance, blocking chains, General information of memory, CPU, and network.

The Table 5 shows the properties of the collector set of Server Activity system.

Table5

Table 5. Collector set properties of Server Activity system.

Finally, the collector set of Query Statistics collection system metrics for queries executed against the database monitored by the ad, as statistics, execution plans, most costly queries in relation to your use of disk, CPU, memory and queries that took more time to be finalized.

Table 6 shows the properties of the collector set of Query Statistics system.

Table6

Table 6. Properties of the collector set of Query Statistics System.

 

You will see the metrics collected:

The metrics collected by DC can be accessed directly by T-SQL queries. However, after setting the collector sets of system, some standardized reports become available for viewing. To access them it is necessary to right-click on Data Collection -> Reports -> Management Data Warehouse. After the configuration of the collector sets, three reports are available. We will look at each of them in the following subtopics.

 

Server Activity History

All available information in this report is related to the use of resources of the database server, such as CPU or memory allocated in total, of which the biggest wait types that exist in SQL Server, the value of IOPs, among others. All this information is extremely useful for troubleshooting and tuning.

The Figure 34 shows the top of the Server Activity Historyreport, extracted from the VITADB\SQLINSTANCE2 instance.

Image34

Figure 34. The top of the report Server Activity History.

 

At the top of the report we visualize which SQL Server instance metrics are displayed and at what date and time it was requested. Below this information you can select what time period, in which the metrics were collected, must be loaded in the report. In each of the graphics presented, there is information about the operating system (green lines) and on the SQL Server (blue lines).

 

Figure 35 shows Server Activity History, extracted from the VITADB\SQLINSTANCE2 instance at the bottom of the report.

 

Image35
Figure 35. The bottom of the Server Activity History report.

 

These reports are also extremely useful in the process of performance analysis and troubleshooting, because they display the biggest wait types and what main types of activities that occur in the instance. From any of these reports we can visualize more details by selecting one of the lines or data bars and performing a drill-down on the desired information.

Note: Immediately after the setting of DC there will be information to be loaded in the reports, and the more metrics collected and stored in the MDW, the greater the detail achieved through the reports.

 

Disk Usage Summary

This report lists the size of the databases that are monitored by the ad and what the average growth is over a period of time. The metrics displayed by the report are separated by data files and the log files of monitored databases. As shown in Figure 36, each of the data files and log files has the initial size information, the current size and the average growth per day in MB.

 

Image36
Figure 36. Disk Usage Summary Report.

 

Query Statistics History

The most common reason for performance issues found in SQL Server is writing T-SQL commands inefficiently. Therefore, the collection of performance metrics of these consultations is an essential part for the tuning process. By default, you can view the 10 queries that consume more CPU, but you can change this filter and view the queries that carried out more IO operations, how long they were running, held more physical reads or carried out more logical writings.

 

Figure 37 shows the report Query Statistics History, extracted from the VITADB\SQLINSTANCE2 instance.

 

Image37
Figure 37. Query Statistics History Report.

 

Recommendations for Configuration of DC

To ensure that there is minimal impact during the environmental monitoring process of a SQL Server database by DC, adopt the following recommendations:

  • Use a centralized server to the MDW, it allows that there is only one location for execution and visualization of reports.
  • All database SQL Servers that will be monitored by the DC should be part of the same domain.
  • When creating a custom collector set using the collector type Generic SQL Trace, define a set of filters so that only the really necessary metrics are collected, because in this way the MDW doesn’t store unnecessary information.
  • Before you create a custom collector set using performance counters, you can be sure the collector set of Server Activity system is no longer collecting this metric.
  • If any collections of metrics across multiple T-SQL queries are carried out with the same frequency, combine them in a single collector set. Doing this we will reduce the amount of memory used by the DC executable (DCCEXEC.exe) while gathering metrics. Similarly, combine multiple collection items of type Performance Counters in a single collection item whenever possible.
  • Combine multiple collection items in a single collector set whenever possible. The only reason to create collector sets apart is if there are different retention periods or a different collection schedule.
  • A collector set using the collection mode set to Cached should always keep a running collection process. If the metrics are collected often, this is more efficient than starting and stopping the collection process where new metrics should be collected. In contrast, the collection mode set to NonCached doesn’t have a running collection process most of the time, that is, a new collection process will be started according to the predefined schedule and so will be stopped again, avoiding the excessive use of server hardware resources. So, if the metric collection occurs rarely, the collection mode set to NonCached is more efficient than leaving the collection process on hold most of the time. As a general rule, if the metric needs to be collected every five minutes or more often than that, consider configuring a collector set using the collection mode: Cached. If the collection of metrics can be performed with a frequency greater than five minutes, it is recommended to configure a collector set using the collection mode: Non-Cached.
  • The higher the frequency, the greater the overhead on the database server. In this way, choose to always configure the lowest frequency possible that meets the need of collecting.

 

Conclusion

As described earlier, beginning with SQL Server 2008 we have two tools that facilitate the management of a SQL Server database consisting of multiple instances. They are: the Policy-Based Management and the Data Collector.

Using the PBM you can create policies that assess certain conditions in existing objects on the instance of a SQL Server database. These policies can be designed manually or imported through XML files available after the installation of the database engine. Policies can be evaluated manually (OnDemand), following a predefined schedule (OnSchedule) or at the time a particular property of a SQL Server object is changed (OnChange) and can also be evaluated on multiple SQL Server instances at once, through the Central Management Server functionality.

With the DC we have a feature that collects metrics for all SQL Server instances and stores them in a centralized database, called the Management Data Warehouse. Through the configuration of DC and the Management Data Warehouse, three collector sets of the system are created. They collect metrics concerning the utilization of server hardware resources (CPU, memory, disk, and network), growth of data and log files of monitored databases in addition to the more costly T-SQL queries executed on the database server, which is accomplished through the collection of Jobs defined on a SQL Server Agent.

It is worth noting that the DC also provides a wide range of reports, so that the metrics collected by the collector sets can be evaluated during troubleshooting and tuning processes. Finally, note that the DC is a complete monitoring tool, that needs to be configured in the best possible way to avoid a high overload to database servers.

Until next time!

 

Discover more about our expertise in SQL Server.

Categories: DBA Blogs

AWS Redshift Cluster Sizing

Thu, 2015-12-03 15:50

 

Sizing a database is one of the primary DBA functions no matter what the technology. Though Redshift is highly managed for us, we must still address this task.

The first thing to note is that in sizing a cluster, we start with an estimated need of storage capacity, since the amount of storage available per node of the cluster is a fixed amount. While you get the disk space you pay for, AWS guidelines and user experience shows that performance can suffer when space becomes tight (>80%). So when sizing the cluster for a specific capacity an extra 20% will need to be tacked onto your calculations.

AWS currently offers two types of instances for Redshift clusters, dense compute (dc1.*) or dense storage (ds2.*) servers. The dc1 series offers about 6x the CPU and 6x the memory per terabyte of storage. These are good for use cases where there is a regular load of heavy analytics querying involving multiple joins. The ds2 series is more cost effective when the tables are highly denormalized and the analytics can be offloaded into a BI tool such as Microstrategy. It is possible to migrate to a new node type through snapshots, but the process will involve some downtime.

To address a few points of the Redshift architecture, note that only the compute nodes hold storage so the leader node is not considered (nor do you have to pay for it). Each storage disk is replicated to two others for redundancy, but these additional disks are not part of the calculations or specifications used for sizing (though they affect the disk monitoring calculations that we see later). Here we examine example clusters on a budget of around $10k using yearly contracted instances (saving 30+% over on-demand instances):

(7) dc1.large with 160Gb SSD @ $1380/year = 1120Gb @ $9660/year.
(2) ds2.xlarge with 2Tb HHD @ $4295/year = 4000Gb @ $8590/year.

The dc1 instances in this case are around 4x as expensive per terabyte (though still quite the bargain as compared to hosting the cluster in-house) and give a 30-80% performance gain (depending on the benchmarks, (example)). And while you can always add nodes to accommodate data growth, you can only add nodes of the same instance type which could potentially become quite expensive if you’re using instances of the small disk capacity dc1’s.

Once your cluster is up, it is vital to monitor disk and CPU utilization so you know when to add new nodes. It is highly advisable to watch the graphs under the Performance tab in the Redshift Console as you add new load to the cluster.

There are built in Cloudwatch alarms for disk usage, and these should be configured to alert above 70%. I like to know well in advance when it is getting there, so I regularly use Period= 5 minutes, Statistic = average, over 1 consecutive period, but since loads and vacuums can create usage surge spikes, you might want to configure the alert over more or longer periods. While Cloudwatch is great for this monitoring, it is convenient to also be able to compute capacity. There are several ways to query disk usage that render subtly different results, unfortunately none of which will yield the stats given by Cloudwatch. Here’s an example for a 6-node 12Tb cluster that currently shows disk space as 32% used on each node in the Console yet displays as 23%:

select
host
,sum(capacity)/3 as total
,sum(used)/3 as used
,sum(capacity)/3 – sum(used)/3 as free
,(((sum(used)/3)/(sum(capacity)/3.00)) * 100.00) as pct_used
from STV_PARTITIONS
group by host

hosttotalusedfreepct_used01904780450450145433023.6511904780449895145488523.6221904780449776145500423.6131904780450673145410723.6641904780451483145329723.751904780447840145694023.51

The point here is to be wary of querying disk space and rely on Cloudwatch and the Console.

Compression encoding on your tables can save substantial space (50-75% on average depending on the encoding) and can improve performance by 100+%. Encoding can also increase CPU usage, so we want to monitor it as we implement encoding. A cluster can be brought to a standstill by just one node hitting 100% CPU utilization, so we also need to setup Cloudwatch alarms to make sure we average < 70% or don’t hit spikes > 90% for more than 10 minutes. Once we bump up against those metrics, it’s time to add another node.

 

Discover more about our expertise in the Cloud.

Categories: DBA Blogs

Syncing Inconsistent MySQL Slaves

Wed, 2015-12-02 11:52

 

Checksum is a standard practice among DBAs to verify the data consistency across replicated nodes. In this post we’re going to review the syncing options for an inconsistent MySQL slave of Galera cluster node.

Here we’re assuming a setup of regular replication to a MySQL instance from one of the Galera cluster nodes.
PXC-slave-checksum
In the usual MySQL replication setup, standard practice involves the usage of the pt-table-checksum tool to identify the discrepancies and usage of pt-table-sync to bring them in sync. The checksum tool, pt-table-checksum, can run across Galera cluster node to verify the data consistency and confirm if the MySQL slave is consistent with a chosen primary node.

What happens if this Galera cluster’s regular MySQL slave sees data inconsistency on it? Will pt-table-sync work there? The answer to this depends…
pt-table-sync when used with –sync-to-master causes it to take locks on master but Galera doesn’t like those lock attempts.

You may ask, why locks on a master?

Coz’ pt-table-sync will treat the master as the source and the slave as the destination. It will lock the table (–lock=1) on master, apply changes on master which will eventually be replicated to slave and thus causing the sync.

 

Respective snippet of code from pt-table-sync script V.2.2.15:

	lock_server(src => $src, dst => $dst, %args);
	...
	$exit_status |= sync_a_table(
               src   => $src,
               dst   => $dst,
               where => 1,  # prevents --where from being used
               diff  => $diff,
               %args,
            );
	 ...
         unlock_server(src => $src, dst => $dst, %args);

Again… coming back to our point, pt-table-sync wouldn’t work well on Galera with –sync-to-master. Let’s do an attempt:

PTDEBUG=1 ./pt-table-sync --verbose --execute --replicate=pyth.checksum --sync-to-master --tables slave_repl.tmp h=localhost,u=root,p=$pw > slave_repl.tmp.sql

If you run the above command (on slave) with debug you will note following error:

	# TableSyncer:6114 4650 Committing DBI::db=HASH(0x177cf18)
	Deadlock found when trying to get lock; try restarting transaction at line 6115 while doing slave_repl.tmp on localhost

(without PTDEBUG you won’t see much on slave except it will report nothing changed!)

Great, so why the error? Let’s again check the code snippet (pt-table-sync V.2.2.15):

	sub lock_table {
	   my ( $self, $dbh, $where, $db_tbl, $mode ) = @_;
	   my $query = "LOCK TABLES $db_tbl $mode";
	   PTDEBUG && _d($query);
	...

As you see, it’s calling up for LOCK TABLES and Galera, which as we know, doesn’t support explicite locking because of the conflict with multi-master replication. That’s the reason for the error above.

Okay, let continue… Upon executing pt-table-sync on slave, the “master”‘s (Galera node’s) error-log will show the following error:

	2015-08-27 14:45:07 6988 [Warning] WSREP: SQL statement was ineffective, THD: 17, buf: 1399
	QUERY: commit
	 => Skipping replication

 

We already have a bug report in place and if it affects you, go ahead and mark it so.

So how would you fix this?

Easy Answer: Do a complete rebuild of slave from a fresh data backup of the cluster node.
Desync the cluster node, take the backup and restore it on a slave machine and setup replication.

But let’s think about an alternate method other than a complete restore…

“fixing using pt-slave-restart?”

But pt-table-sync is “not Galera Ready” as they say!!

Even then, pt-table-sync can help us understand the differences and that’s the long answer :)

 

It can still work and prepare SQL for you using –print –verbose options.

./pt-table-sync --verbose --print --replicate=pyth.checksum --sync-to-master --tables slave_repl.tmp h=localhost,u=root,p=$pw > slave_repl.tmp.sql

So, all you need to do is run the SQL against a slave to fix the discrepancies. You may choose to desync the node and run the pt-table-sync to generate differential sql. You’ll still need to confirm if slave got synced by re-running the pt-table-checksum and the discrepancies are resolved.

Our steps to resync a PX Cluster’s slave using pt-table-sync are as follows:

(Note: It’s advisable to stop writes on cluster to fix the discrepancies on slave. “Why?” “Explained later.”)

– Desync master node:

set global wsrep_desync=ON;

– Generate differential SQL:

./pt-table-sync --verbose --print --replicate=pyth.checksum --sync-to-master --tables slave_repl.tmp h=localhost,u=root,p=$pw > slave_repl.tmp.sql

– Review the sql generated and execute them on slave.

– Once slave is synced, you can:

set global wsrep_desync=OFF;

– Finally rerun the pt-table-checksum to verify the discrepancies.

That concludes our solution.

 

“Wait, but why desync?”
hmm… Well wsrep_desync is a dynamic variable which controls whether the node can participate in Flow Control.

 

“hold on!! Flow control??”
Galera has synchronous replication where in node provides the feedback to rest in the group – fellas-I’m-late-hold-on OR okay-let’s-continue-the-job. So this communication feedback is flow control. (You should read galera-documentation & Jay’s post).

When we will set wsrep_desync=ON on master, it will continue to replicate in and out the writesets as usual; but flow control will no longer take care of the desynced node. So, other nodes of the group won’t bother about the deynced node lagging behind. Thus by desyncing we’re making sure that our operations on one node are not affecting the whole cluster. This should answer why writes need to be stopped before starting to sync.

Hope this helps.

 

Discover more about our expertise in MySQL.

Categories: DBA Blogs

Data Collector and Policy-Based Management: Part 2

Wed, 2015-12-02 11:20

 

In my previous post we talked about the architecture of Policy-Based Management. In this post we will create and run a policy against some SQL Server Instances.

Creating a Policy

In SQL Server 2014 you are allowed to create a policy manually via T-SQL, export existing policies in an instance of a SQL Server database or import policies made available by Microsoft. When designing a policy we can use advanced conditions that allow almost unlimited amounts of validation on database objects existing in an instance.

To facilitate the understanding of all the components used in a policy and how the same interact with each other, we will create a policy from scratch. To do this, you must first define a condition and then the policy used in this condition. With the policy set, it will be possible to categorize it and define which targets should be evaluated.

In order to illustrate this in a practical way we will use PBM. From now on, we will set a policy that will evaluate if all databases in a given SQL Server instance are disabled with the AutoShrink property.

Creating a Condition

The first step in the creation of the policy is the setting for the condition, which can be accomplished through SQL Server Management Studio (SSMS) or by using T-SQL commands.

To create a condition with SSMS, we have the following steps:

1)  Using the Object Explorer, open the folder Management and then the Policy Management folder (see Figure 8):

Image8
Figure 8. Folder Policy Management.

 

2)  Within that folder, right-click the folder Conditions and select the New Condition.

3)  This will open the window for the creation of the new condition, as it exposes the Figure 9.

 

Image9

Figure 9. Condition creation window.

 

According to this figure, while creating the condition these were the following options:

  • Name: The condition was created with the name AutoShrink False.
  • Facet: The facet used was the Database.
  • Expression: The property was valued @AutoShrink and its value must be equal to false.

Additionally, you can include a detailed description about the condition through the Description, as shown in Figure 10.

 

Image10
Figure 10. Condition description window.

The second option for creating the condition is through the system stored procedure SP_SYSPOLICY_ADD_CONDITION. The Listing 1 shows that the creation of condition AutoShrink is False, with the same options set in SSMS.

Listing 1. Creating the condition with T-SQL.

&amp;lt;/pre&amp;gt;
Declare @condition_id int

EXEC msdb. dbo. sp_syspolicy_add_condition @name=N ' AutoShrink False ', @description=N ' ', @facet=N ' Database ', @expression=N ' &amp;lt;Operator&amp;gt;

&amp;lt;TypeClass&amp;gt;Bool&amp;lt;/TypeClass&amp;gt;

&amp;lt;OpType&amp;gt;EQ&amp;lt;/OpType&amp;gt;

&amp;lt;Count&amp;gt;2&amp;lt;/Count&amp;gt;

&amp;lt;Attribute&amp;gt;

&amp;lt;TypeClass&amp;gt;Bool&amp;lt;/TypeClass&amp;gt;

&amp;lt;Name&amp;gt;AutoShrink&amp;lt;/Name&amp;gt;

&amp;lt;/Attribute&amp;gt;

&amp;lt;Function&amp;gt;

&amp;lt;TypeClass&amp;gt;Bool&amp;lt;/TypeClass&amp;gt;

&amp;lt;FunctionType&amp;gt;False&amp;lt;/FunctionType&amp;gt;

&amp;lt;ReturnType&amp;gt;Bool&amp;lt;/ReturnType&amp;gt;

&amp;lt;Count&amp;gt;0&amp;lt;/Count&amp;gt;

&amp;lt;/Function&amp;gt;

&amp;lt;/Operator&amp;gt; ' , @is_name_condition=0, @obj_name=N ' ', @condition_id=@condition_id OUTPUT

Select @condition_id

GO

 

Creating a Policy

Now that we have a condition, we can define a policy that uses it. As a condition, the policy can be configured either through the SQL Server Management Studio, such as through T-SQL.

First we create a policy using the SSMS:

1)  Using the Object Explorer, open the folder Management and then the Policy Management folder (see Figure 8).

2)  Within that folder, right-click on the Policies folder and select New Policy.

3) This will open the window for the creation of the new policy, as shown in the Figure 11.

Image11

Figure 11. Window for policy creation.

 

According to the Figure 11, during the creation of the policy the following options were set:

  • Name: The policy was created with the name Checks AutoShrink.
  • Check Condition: The condition evaluated by policies is the AutoShrink False.
  • Against Targets: The policy should be evaluated in any existing database on the instance of SQL Server.
  • Evaluation Mode: The evaluation mode selected was the OnDemand, which determines that this policy must be performed manually.
  • Server Restriction: To this policy there will be no conditions that restrict the database servers.

Note: As a target it won’t always be a database. The targets change based on the context and evaluation of the condition. For example, if we create a policy to standardize the name of new tables using the facet Tables, Against Targets option will display All Tables.

The evaluation modes available in the list depend on the facets available in the condition. All facets support the OnChange and OnSchedule, but the OnChange: Prevent depends on the possibility of using the DDL triggers to perform the transaction rollback procedure. Already the evaluation mode OnChange: Log Only is based on the capacity of changes in the facet being captured for an event.

To facilitate the maintenance and management of configured policies, we can add more details through the Description, like Figure 12.

Image12

Figure 12. Policy description window.

 

In addition, you can also perform the creation of policies using the system stored procedures SP_SYSPOLICY_ADD_OBJECT_SET, SP_SYSPOLICY_ADD_TARGET_SET, SP_SYSPOLICY_ADD_TARGET_SET_LEVEL and SP_SYSPOLICY_ADD_POLICY.

The Listing 2 exemplifies the creation of policy Checks AutoShrink, with the same options set in SSMS.

Listing 2. Creation of the policy with T-SQL.


Declare @object_set_id int

EXEC msdb. dbo. sp_syspolicy_add_object_set @object_set_name=N ' AutoShrink_ObjectSet_1 ' Checks, @facet=N ' Database ', @object_set_id=@object_set_id OUTPUT

GO

Declare @target_set_id int

EXEC msdb. dbo. sp_syspolicy_add_target_set @object_set_name=N ' AutoShrink_ObjectSet_1 ' Checks, @type_skeleton=N ' Server/Database ', @type=N ' DATABASE ', @enabled=True, @target_set_id=@target_set_id OUTPUT

GO

EXEC msdb. dbo. sp_syspolicy_add_target_set_level @target_set_id=@target_set_id, @type_skeleton=N ' Server/Database ', @level_name=N ' Database ', @condition_name=N ' ', @target_set_level_id=0

GO

Declare @policy_id int

EXEC msdb. dbo. sp_syspolicy_add_policy @name=N ' Checks AutoShrink ', @condition_name=N ' AutoShrink False ', @execution_mode=0, @policy_id=@policy_id OUTPUT, @root_condition_name=N ' ', @object_set=N ' AutoShrink_ObjectSet_1 ' Checks

GO

Note: All policies created are stored in the system database msdb. After we create our policies, we have to make sure the system database msdb is part of the backup strategy used in the environment, because in this way it will be possible to rebuild all policies if there is some migration or disaster with the current environment.

 

Importing Policies

In order to increase the efficiency in the administration of a database environment, we can import policies made available by Microsoft during the SQL Server installation process. These policies are stored in XML format files that are located in the Tools directory where SQL Server was installed.

The import of predefined policies has some advantages, because along with the policies all necessary conditions for the correct operation are created. In addition to the predefined policies we can import an XML file that was generated based on customized policies already created by the database administrator using the PBM.

To run the import of a policy we must perform the following steps:

1)  Using the Object Explorer, open the Management folder (see Figure 8).

2)  Within that folder, right-click the Policies folder and select the option Import Policy.

3)  This will open the window to import the XML file with policy settings, as shown in the Figure 13.

Image13

Figure 13. Importing a policy.

 

To avoid duplication during the import, select Replace duplicates with Items Imported to overwrite any policy and condition that has the same name as the policy that is being imported. To overwrite an existing policy, the information will not be lost if the validation was already carried out within the policy.

We can also preserve the default configurations of the policy being imported, as well change them after importation policy, as Figure 14 presents.

Image14

Figure 14 . Definition of the status of the policy.

 

After importing the XML file containing the definitions of the policy, we can see it in the Policies folder. As shown in Figure 15, the new policy was created with the name Data and Log File Location.

Image15

Figure 15. Policy created through the import process.


It is interesting to note that unlike the previous created policy, the policy created through the import process has a constraint that limits the server validation of existing conditions only in instances of SQL Server using Enterprise or Standard editions.

Figure 16 shows the General information that was automatically inserted when imported through the policy XML file. The category, description and hyperlinks were populated, facilitating the process of documentation that states the reason why this policy has been implemented.

Image16

Figure 16. Description of the policy created through the import process.

 

Exporting Policies

Similarly, we can import policies using XML format files, and you can export the policies already created for files of this format. From these files can import these same policies in other SQL Server servers.

To perform the export policy procedure in XML format, there are two options:

  1. Exporting an existing policy.
  2. Exporting the current state of a facet.

To export an existing policy to an XML file, simply execute the following steps in SQL Server Management Studio:

1)  Using the Object Explorer, open the Management folder (Figure 8).

2)  Within that folder, open the folder Policy Management -> Policies to list existing policies (see Figure 17).

3)  Right-click the policy that you want to export and select Export Policy option (see Figure 18).

4)  According to the Figure 19, select the desired location to write the XML format file and the name of the same.

Image17

Figure 17. List of existing policies.

 

 

Image18
Figure 18. Export the policy.

 

 

Image19
Figure 19. Selecting policy storage location.

 

You can also export a policy by sourcing the current setting from the properties of a facet. For example, after setting up a customization for facet properties called Surface Area Configuration, you can export these settings to a file in XML format as follows:

1)  Through the Object Explorer, right-click the SQL Server instance and select Facets (see Figure 20).

2)  Select the facet you want, and then click Export Current State as Policy (Figure 21).

3)  According to Figure 22, fill in the name of the policy, the name of the condition and the export target of policy.

 

Image20
Figure 20. Listing the facets.

 

Image21
Figure 21. Exporting the current state of the facet as a policy.

 

Image22
Figure 22. Setting the policy storage location.

 

Policy Evaluation

Policy evaluation is the process in which we execute the policy in a target determined and reviewed by the same results. The PBM allows a policy to be evaluated in a single instance or a group of instances using the CMS. As the purpose of this article is the management of multiple instances, we will use the CMS to evaluate the policy Checks AutoShrink, created earlier, in two instances of the SQL Server database.

The Table 1 shows instances of SQL Server that will be used to assess the policy defined by the PBM.

Name of the InstanceVersionEditionBuildVITADB\SQLCMSSQL Server 2014Enterprise12.0.2000VITADB\SQLINSTANCE1SQL Server 2014Enterprise12.0.2000VITADB\SQLINSTANCE2SQL Server 2014Enterprise12.0.2000

Table 1. Instances used by PBM

 

To use the VITADB\SQLCMS instance as our management, we must perform the following steps:

1) Right-click the Central Management Servers option and select the option Register Central Management Server (see Figure 23).

2) In the dialog box New Server Registration, fill in the connection information, in accordance with the Figure 24.

3) Right-click the VITADB\SQLCMS instance and select New Server Registration (Figure 25).

4) Repeat the procedure described in step 3 and record the instances VITADB\SQLINSTANCE1 and VITADB\SQLINSTANCE2.

5) Right-click the VITADB\SQLCMS instance and select Evaluate Policies.

6) In the dialog box, select the instance that has the list of policies,  of which will be evaluated and, as shown in Figure 26, click the Evaluate button to start the validation of the selected rule.

 

Image23
Figure 23. Starting the creation of a central management instance.

 

 

Image24

Figure 24. Connection properties.

 

 

Image25
Figure 25. Registering a new server in the CMS.

 

 

Image26
Figure 26. By selecting the policy will be evaluated.

 

After the evaluation of the policy Checks AutoShrink can analyze, as in the Figure 27, there is a database named DBTeste1 on the VITADB\SQLINSTANCE1 instance and a database named DBTeste2 on the VITADB\SQLINSTANCE2 instance that are out of politics. That is to say, where the AutoShrink property is enabled, contrary to the previously set condition that determines how to correct the situation with a disabled property.

 

Image27
Figure 27. Result of the policy Checks AutoShrink.

As you can see, PBM has managed to create and evaluate policies in one or more database instances. In this way we have a simpler and more efficient management environment composed of multiple instances.

In the last part of this series, we will take a look at the Data Collector tool and how to use it in relation to a centralized monitoring of instances of SQL Server.

 

Discover more about our expertise in SQL Server.

Categories: DBA Blogs

SQL 2016 – Improve Performance with the Query Store feature: Part 1

Mon, 2015-11-30 15:19

 

After playing with this great new feature for some time and speaking about it during a SQL Saturday in Montreal, I can say that I am very excited about Query Store in SQL 2016!
Performance tuning is our bread and butter here at Pythian and I think that almost every DBA will agree that, unless you have some 3rd party tool (and even if you have such a tool), it’s not always straight forward to tune T-SQL code. It’s usually a tedious process and it’s a science by itself that requires a good background and understanding of how things work “behind the scenes”, as well as how to help the optimizer “make” the right decisions.

Here are some great links to start with:

If you don’t have a dev/testing environment, you can even start with Azure DB which has this feature already. Just note that this feature is currently at a CTP version, meaning that it is not yet supported. Azure DB is cheap and affordable. Another option would be using a dev/test Azure Virtual Machine with SQL Server 2016 on it.

 

Good luck and feel free leave us questions or comments!

 

Discover more about our expertise in SQL Server.

Categories: DBA Blogs

Data Collector and Policy-Based Management: Part 1

Fri, 2015-11-27 14:23

 

This article will address the main techniques for the efficient management of multiple instances in a SQL Server database 2014 using the Tools Data Collector and Policy-Based Management. In addition, it will be demonstrated in a practical way how to configure each of the tools, as well as how to extract and analyze the metrics collected.

With the exponential growth of the amount of data generated by applications, comes the increased complexity in managing database environments for the database administrator. With this growth, combined with the low cost of storage media, servers began a scenario in which the database administrator left to administer dozens of databases and proceeded to administer hundreds of them.

Since the main responsibilities of a database administrator is to ensure the integrity, performance and stability of all instances of SQL Server under its administration, the greater the number of instances and databases used within an enterprise, the greater the difficulty in monitoring and managing such an environment in proactive and automated fashion.

For this type of scenario, SQL Server makes it possible to centralize both the execution of routine tasks for an administrator, since the collection of performance metrics from all instances and existing databases is through the Tools Data Collector (DC) and Policy-Based Management (PBM). For example, there is the need for all databases that have the recovery model parameter set to Full to perform a backup of the log file every hour. So, instead of this policy existing only as a  concept, requiring you to check manually on all database servers, you can use the PBM to create “physically” a policy and ensure that it is applied at once in all instances of SQL Server.

In order to facilitate the understanding of the management of multiple instances, the presentation of the tools will be performed in the following order: first we will look at the Policy-Based Management and then we will know the Data Collector.

 

What is the Policy-Based Management?

The Policy-Based Management (PBM) is a feature available starting with SQL Server 2008 that enables the creation and implementation of policies on their SQL Server instances. The PBM works similarly to the created group policy through the Active Directory.

Note: Group policies provide centralized management of applications and users, by means of rules created by system administrators and that can be applied at various levels of the directory structure defined in Active Directory.

PBM applies a policy on a particular target, for example, a database, a table, a view, or a stored procedure and then it is checked to see if the target is in accordance with the rules of this policy. If the target does not agree, it is possible to both enforce the rules of politics as raise an alert to the administrator of the database so he/she knows of this violation.

One of the great advantages of the PBM is the implementation of a policy on multiple instances of a SQL Server database at once, facilitating the Administration and management of all the infrastructure of the Corporation Bank.

Many features of SQL Server 2014, such as Resource Governor, Data Compression and In-Memory OLTP need Enterprise Edition or Developer. This is not the case for the PBM, which is available in all editions of SQL Server, including Express (although with the Express Edition is not possible to create a Central Management Server).

As soon as the instance of SQL Server is installed in 2014, it is possible to create and evaluate the policies against any existing SQL Server in your environment, including in versions prior to 2014.

 

Policy-Based Management Components

The PBM is composed of three main components: Policies, Conditions and Facets, as shown in Figure 1. These components are arranged in a sort of hierarchical order for using the PBM. A facet is required for creating a condition, and the condition is necessary for the creation of policies. The policies, in turn, are applied to specific targets .

Image1
Figure 1. Policy Management in SQL Server 2014.

 

Targets

The targets are the managed objects for a particular policy and can be of various types: servers, databases, instances, stored procedures, etc. An important detail is that you can use more than one target at the same time in a policy. For example, we have a policy which States that only object names starting with the db _ prefix are correct and perform a validation of this rule on tables, functions and stored procedures of one or more instances at the same time.

 

Facets

A facet is a group of properties that relate to a particular target. SQL Server 2014 has 86 facets, each containing several different properties. This allows the use of hundreds of properties in the creation of a policy.

You can view the properties of a facet expanding Facets folder and double-clicking any of the options. For example, the facet Data File has several properties, such as maximum size of the data file, number of readings and writings and if the data file is online, as shown in Figure 2.

Note: The facets are available as read-only, i.e. it is not possible to create customized facets or modifications of existing ones. Currently new facets can be included only by Microsoft, through service packs or by upgrading the version of SQL Server.

Image2
Figure 2. Facet Properties Data File.

 

Conditions

A condition can be described as a rule for a policy to be evaluated. Basically, the rule checks a target and, if this target is not in accordance with the policy, it fails. It is worth mentioning that a policy can evaluate only one condition, however it is possible that a condition has different rules for different properties. A condition can be viewed by expanding the Conditions folder and double-clicking any one of the available options, as shown in Figure 3.

Note: In an instance of SQL Server, a database will not exist conditions customized unless previously imported or created manually, that is, initially there will only be the conditions of system.

Image3
Figure 3. Condition created on Facet Database.

 

Policies

The policies are complete packages that include conditions, facets, targets, assessment modes and Server restrictions (the evaluation modes and the server are discussed in the next topic).

When created, the policies are stored in the system database msdb, but you can export them into an XML format. This portability allows database administrators with greater ease to share and compare the policies created. To view a policy it is necessary to expand the Policies folder and double-click any one of the options, as shown in Figure 4.

Image4
Figure 4. Details of a policy created

Note: In an instance of SQL Server, databases will not exist as customized policies, unless previously imported or created manually, that is, there will be only the initial policies.

 

Policy Evaluation Modes

The PBM has four distinct ways of performing a policy and that determines how the evaluation will occur under a predefined target. The following modes of evaluation may be available, depending on the facet used in policies:

  • On Demand: This evaluation mode specifies that the implementation should occur manually. By default, any policy with this evaluation mode is disabled automatically after it is created. However, even though I disabled it it can still be evaluated at any time.
  • On Schedule: By selecting this mode you can schedule the evaluation policy to be evaluated at any time. By default, you can select a schedule already created or create a new schedule that meets your needs. Creating a schedule allows you to set options such as the recurrence of execution, execution frequency per day, frequency of execution per hour and how long a policy should be executed. For example, you could run a particular policy for the next two weeks.
  • On Change: Log Only: when you select this mode, the policy will be evaluated only if a change is made to the target specified. If the change violates the policy, the event will be executed and the results of the violation will be stored in the event log and in the system database msdb. This evaluation mode helps the database administrator without affecting the performance of the environment.
  • On Change: Prevent: This evaluation mode is very similar to the On Change: Log Only, namely, the assessment will be the moment an event to perform any change in target. But unlike the Log Only option, Prevent performs the rollback procedure of any amendment which violates the policy.

The Figure 5 shows an example of a policy and evaluation modes available for the same.

Image5
Figure 5. Evaluation Modes.

 

Server Restrictions

In conjunction with the targets and the facets, the server restrictions are another way to control how a policy is evaluated. A server restriction is nothing more than a condition used to delete a particular policy server through the facet Server.

With the use of the PBM, you can create a server restriction to limit the evaluation of a policy only on instances of SQL Server using the Standard editions or Enterprise. When this policy is applied it will not be assessed by the instances that do not use these specific issues.

 

Management of Policies

SQL Server 2014 has some features that facilitate the management and evaluation of policies created. One of these features is the ability to create categories to group similar policies and use the Central Management Server (CMS) to execute the policies throughout the database environment.

 

Categories

The categories are a logical group of one or more policies that assist in the management and execution of the same. For example, you can create a policy group that will be evaluated only in test or development environments. When a policy is created, specify a category Description option, as shown in Figure 6.

Image6

Figure 6. Category definition

 

Central Management Server (CMS)

CMS functionality is not part of the architecture of the PBM, but has become extremely important in the use of policies in a SQL Server database consisting of multiple servers.

Through the CMS you can specify a database instance (or greater) to be a central management and store a list of registered instances that can be organized into one or more groups, as shown in Figure 7.

Image7
Figure 7. Central Management Server (CMS)

 

Alerts

Once the policies are configured and implemented, there is no need to constantly check the servers to make sure that they are in accordance with the conditions set out in the policies. Instead, we can use the SQL Server Agent alerts to receive notifications automatically when a policy is violated.

In the next post of this 3 part series we will learn how to create a policy and how to use it.

 

Discover more about our expertise in SQL Server.

Categories: DBA Blogs

SQL On The Edge #5 – Custom Visualizations in PowerBi

Fri, 2015-11-27 13:39

 

Have you ever used Microsoft’s PowerBi service? If you have, have you used it within the last year? The reason I ask is that the current PowerBi service is so radically different from the initial release that pretty much only the name is what they have in common. Today I’m going to do a short summary of how we got here, where the service is, where it’s going and the actual topic for my video, the new awesome custom visualization functionality!

A Short History Lesson

A few years ago, Microsoft went on the direction of empowering business analysts on what is one of the most common business applications in the world: Excel. With this push, they started releasing some amazing plugins for Excel: PowerPivot, PowerQuery, PowerView, PowerMap. Suddenly we could import millions of rows from all kinds of sources into Excel! And transform them! And visualize them! Then with the release of Office 365 and Sharepoint Online, a service was created to make it easy to share and consume all these Excel-born reports. And thus PowerBi was born but it required all these other tools and subscriptions. It didn’t catch on.

Fast Forward To Today

This initial offering of PowerBi had too many external dependencies. You needed Excel with all the bells and whistles to do the reports and then all these other satellite cloud services. Thankfully someone saw and executed on a clearer vision: one service, no dependencies, no cost of entry, fully mobile friendly. It’s been a very interesting journey from a market perspective to see how MS released something, course corrected and then went head-on with their improved vision.

The PowerBi desktop designer is free. The mobile apps (all major mobile OS’es AND Windows Phone), free as well. The service itself also has a free fully functional tier, it’s only limited by the amount of data and the enterprise sharing capabilities. Add the ease of use of the tools and the natural language query capabilities and this is now a strong tool that can finally become a contender with the Tableau and Qlikviews of the world.

No, it’s not perfect but it is growing and an insane amount of new features are getting added constantly. New vendors are adding content packs and now custom visualizations have made an appearance.

Community Involvement

The idea behind the first batch of custom visuals was great. MS opened a contest for people to submit their best custom visuals and the community responded with amazing creativity and participation. Not only do these immediately provide more value to every current user of PowerBi but they also serve as examples for further development of more custom visuals.

The full gallery of custom visuals can be found in the PowerBi Visual Gallery.

And if you don’t have your PowerBi account, try it out, it’s free! Head over to the official PowerBi site.

Demo Time

For the demo of working with a PowerBi project on the desktop, online and importing a custom visual, let’s check out the video! Enjoy!

 

 

Discover more about our expertise in SQL Server.

Categories: DBA Blogs

Log Buffer #451: A Carnival of the Vanities for DBAs

Fri, 2015-11-27 12:59

 

The show goes on. This Log Buffer Edition picks some blogs which are discussing new and old features of Oracle, SQL Server and MySQL.

Oracle:

  • Directory Usage Parameters (ldap.ora) list the host names and port number of the primary and alternate LDAP directory servers.
  • Data Visualization Cloud Service (DVCS) is a new Oracle Cloud Service. It is a subset offering of the currently supported Business Intelligence Cloud Service (BICS).
  • ORA-24247: network access denied by access control list (ACL).
  • Latches are low level serialization mechanisms, which protect memory areas inside SGA. They are light wait and less sophesticated than enqueues and can be acquired and released very quickly.
  • handling disks for ASM – when DB, Linux and Storage admins work together.

SQL Server:

  • How to use the Performance Counter to measure performance and activity in Microsoft Data Mining.
  • Phil Factor demonstrates a PowerShell-based technique taking the tedium out of testing SQL DML.
  • Sandeep Mittal provides an introduction to the COALESCE function and shows us how to use it.
  • Hadoop many flavors of SQL.
  • Installing and Getting Started With Semantic Search.

MySQL:

  • Support for storing and querying JSON within SQL is progressing for the ANSI/ISO SQL Standard, and for MySQL 5.7.
  • Loss-less failover using MySQL semi-syncronous replication and MySQL Fabric!
  • Memory consumption The binary format of the JSON data type should consume more memory.
  • This post compares a B-Tree and LSM for read, write and space amplification. The comparison is done in theory and practice so expect some handwaving mixed with data from iostat and vmstat collected while running the Linkbench workload.
  • If you do not have a reliable network access (i.e. in some remote places) or need something really small to store your data you can now use Intel Edison.

 

Learn more about Pythian’s expertise in Oracle SQL Server & MySQL.

Categories: DBA Blogs

Delayed Durability and Impact on Mirroring

Fri, 2015-11-27 12:39
Objective

To Test the delayed durability feature with mirroring in high performance and high safety mode. The idea is to confirm what the performance improvement is of the transaction and if it has any benefit in high latency networks.

 

Test Scenario

We have two databases configured with mirroring in high safety mode, which will require that commits happen at the secondary first. One database called “DalayedTEST” has delayed durability enabled with FORCED mode. Then I have configured performance counters to check latency and performance of each database each second, I have added mirroring, transaction, lock and physical disk counters so we can compare the performance when using delayed durability or not in a mirrored environment. Then we are going to switch mirroring to high performance and see its behaviour as well. No Witness is configured

 

1img

2img

Using SQLQueryStress tool I am loading 5000 transactions to a log table with 50 threads enabled. This is equivalent to 50 persons loading 5000 records to the same table at the same time. Should be a good test to analyze behaviour of the databases.

Normal Database Test (Full Transactions) High Safety Mode

3img

 

Counters

 

Mirroring

4img

 

 

Locks and Transactions

5img

 

Physical Disk

6img

 

Failover

A failover happens 30 seconds after starting the test. The database is able to failover and record count 98850.

7img

 

8img

Delayed Database Test (Delayed Transactions) High Safety Mode

9img

 

 

Counters

 

Mirroring

10img

 

Locks and Transactions

11img

 

Physical Disk

12img

 

 

Failover

A failover happens 30 seconds after starting the test. The database is able to failover and record count 165757.

13img

Conclusion – High Safety Mode

Mirroring High Safety Mode

conclusionHighSafety

Notes:

Having delayed transactions enabled in a database with high safety mirroring improves performance under high contention scenarios. In this case having lots of transactions running at the same time and requesting for the same table object proved to be better, as the execution time was faster when using delayed transactions.

When checking the mirroring behaviour one can see that the mirroring log queue is bigger with delayed transactions. The difference is that with full transactions the queue is processed sequentially, which decreases the queue while the transaction completes. As a result, you will see a high spike at the beginning of the stress test which is the full queue , and then it decreases in time. When using delayed transactions one can see spikes spread evenly every 20 seconds or so, this means that one batch is processed, and then other batch is processed, and this process carries on until the queue is empty.

So having a sequential mirror queue processing vs. batch processing makes a difference in latency? It is clear that transaction delay is 3 times worse with delayed transactions as it will processes batches in a cyclic way, which will saturate more at the endpoint on every iteration and also keep the transaction waiting for a commit acknowledgement from the mirroring partner. So, having delayed transactions with high contention, it is faster locally as it has less logical reads/writes, but slows down the mirroring communication as it works in batches.

But total execution time was better with delayed transactions?

Having delayed transaction improves execution time but increases mirroring delay, so it is simply a matter of doing the math. The local processing was so much better that it compensated for the slower mirroring transaction delay and in general terms is faster in this scenario (I used a local network), however if you add other variables to the stress test, such as a high latency network for the mirroring, the end result might not be compensated and the total execution time can end up being around the same or worse.

 

Now, let´s see the numbers with a high performance mode:

Normal Database Test (Full Transactions) High Performance Mode

15img

 

Counters

 

Mirroring

16img

 

 

Locks and Transactions

17img

 

Physical Disk

18img

 

Delayed Database Test (Delayed Transactions) High Performance Mode

19img

 

Counters

 

Mirroring

20img

 

 

Locks and Transactions

21img

 

Physical Disk

22img

 

Conclusion High Performance Mode

Mirroring High Performance Mode

conclusionHighPerformance

Notes:

Having delayed transactions means we have a database with high performance mirroring that is not improving performance, despite the fact that indicators show less locking and less logical reads. Seems the mirroring queue was not being handled properly. The mirroring queue is considerably bigger when using delayed transactions and despite the fact that we are working in high performance mode the general execution time is the worst in this test environment. After I realized this I ran other tests to see if the results were the same, and generally speaking the total execution time is almost the same, a difference of 4-10 seconds tops.

 

General Conclusion

When using delayed transactions in all the tests we can conclude that consistently the mirroring queue, and the amount of data transferred is considerably larger than the one produced by full transactions.

When using high safety mode, it seems delayed transactions have a better disk performance, which compensate the higher mirroring/network values and results in a faster execution time. This situation can change if the network has high latency producing the same or worse performance.

When using high performance mode it seems delayed transactions have an ok to good disk performance, but it is not good enough to compensate the higher mirroring/network values, and results in the same or slower execution time.

I would recommend enabling delayed transactions only to fix high contention rates. Also avoid using it in high latency or very busy networks when mirroring high performance environments. These recommendations are based on a small test and in a controlled environment and should not necessarily extend to all environments out there. The rule of thumb is that using delayed transactions does impact mirroring performance, which by itself is a technology known to introduce certain performance issues depending on the system, so do not take the decision to use delayed transactions lightly and test first in a staging environment before rolling into production.

 

Discover more about our expertise in SQL Server.

Categories: DBA Blogs

ORACLE E-BUSINESS SUITE: VIRTUAL HOST NAMES SOLUTION

Fri, 2015-11-27 11:39

 

This blog post is a continuation to an earlier post about my musings on Oracle EBS support for virtual host names.

Actually, most parts of Oracle E-Business Suite work with virtual host names with out any problem. The only component that doesn’t work when using virtual host names are the Concurrent Managers. Concurrent Managers expect that the node name defined in the Concurrent Manager definition screen matches the host name FNDLIBR executable reads at the server level. Having the virtual host name as an alias in the hosts file in the server doesn’t cut it for the FNDLIBR executable. FNDLIBR reads the host name of the server using the Unix system call.

This behaviour of FNDLIBR can be hacked by overriding the Unix gethostname system call using LD_PRELOAD functionality. There is already a prebuilt program out there on github to achieve this functionality. It’s called fakehostname. I have tested this and verifies that it works with Oracle 11i, R12.0 and R12.1 version without any problem.

Here is a demo:


$ hostname
ebs
$ export LD_PRELOAD=/home/oracle/fakehost/libfakehostname.so.1
$ export MYHOSTNAME=ebsfakehost
$ hostname
ebsfakehost
$ export MYHOSTNAME=newebshost
$ hostname
newebshost

 

This utility helps in making concurrent managers thinking that it’s running on the virtual host by overriding the gethostname system call. This method of getting EBS to work with virtual hostnames doesn’t work any more with EBS R12.2. The reason for this EBS R12.2 is that it it’s shipped in a mix of 32bit and 64bit components. Earlier releases of EBS like 11i, 12.0 and 12.1 are 32bit only, even though they run on 64bit platforms. We can get EBS R12.2 working by having both 32bit and 64bit versions of the fakehostname library in the LD_PRELOAD, but EBS borks too many warning messages about not being able to load 32bit/64bit libraries, which defeats the whole purpose of having a simple solution.

I am working on another way of getting virtual host names working in EBS R12.2. I will post that in my next blog post. Stay tuned!

 

Discover more about our expertise in the world of Oracle.

Categories: DBA Blogs