Skip navigation.

DBA Blogs

TRUNCATEing a Table makes an UNUSABLE Index VALID again

Hemant K Chitale - Sun, 2016-05-22 09:54
Here's something I learned from Jonathan Lewis sometime ago.

If you set an Index to be UNUSABLE but later issue a TRUNCATE TABLE, the Index becomes VALID again --- i.e. the Index gets updated with rows subsequently inserted.

SQL> connect hemant/hemant
Connected.
SQL> drop table target_data purge;

Table dropped.

SQL> create table target_data as select * from source_data where 1=2;

Table created.

SQL> create index target_data_ndx_1
2 on target_data(owner, object_type, object_name);

Index created.

SQL> insert /*+ APPEND */ into target_data
2 select * from source_data;

367156 rows created.

SQL> commit;

Commit complete.

SQL> col segment_name format a30
SQL> select segment_name, segment_type, bytes/1048576
2 from user_segments
3 where segment_name like 'TARGET_DATA%'
4 order by 1;

SEGMENT_NAME SEGMENT_TYPE BYTES/1048576
------------------------------ ------------------ -------------
TARGET_DATA TABLE 49
TARGET_DATA_NDX_1 INDEX 19

SQL>
SQL> col index_name format a30
SQL> select index_name, status
2 from user_indexes
3 where table_name = 'TARGET_DATA';

INDEX_NAME STATUS
------------------------------ --------
TARGET_DATA_NDX_1 VALID

SQL>


So, I have a VALID Index on my Table.

I now make it UNUSABLE and add rows to it.

SQL> alter index target_Data_ndx_1 unusable;

Index altered.

SQL> select status
2 from user_indexes
3 where index_name = 'TARGET_DATA_NDX_1';

STATUS
--------
UNUSABLE

SQL> insert /*+ APPEND */ into target_data
2 select * from source_data;

367156 rows created.

SQL> commit;

Commit complete.

SQL> select index_name, status
2 from user_indexes
3 where table_name = 'TARGET_DATA';

INDEX_NAME STATUS
------------------------------ --------
TARGET_DATA_NDX_1 UNUSABLE

SQL> select segment_name, segment_type, bytes/1048576
2 from user_segments
3 where segment_name like 'TARGET_DATA%'
4 order by 1;

SEGMENT_NAME SEGMENT_TYPE BYTES/1048576
------------------------------ ------------------ -------------
TARGET_DATA TABLE 104

SQL>


Oracle actually drops the Index segment (so you don't see it in USER_SEGMENTS) when it is set to UNUSABLE alhough the Index definition is still present.  The Index doesn't "grow" as the Segment doesn't exist.

Let me TRUNCATE the table.

SQL> truncate table target_data;

Table truncated.

SQL> select segment_name, segment_type, bytes/1048576
2 from user_segments
3 where segment_name like 'TARGET_DATA%'
4 order by 1;

SEGMENT_NAME SEGMENT_TYPE BYTES/1048576
------------------------------ ------------------ -------------
TARGET_DATA TABLE .0625
TARGET_DATA_NDX_1 INDEX .0625

SQL> select index_name, status
2 from user_indexes
3 where table_name = 'TARGET_DATA';

INDEX_NAME STATUS
------------------------------ --------
TARGET_DATA_NDX_1 VALID

SQL>


Immediately after the TRUNCATE TABLE, the Index Segment is instantiated and the Index becomes VALID again.  So inserting rows will update the Index.  My last explicit command against the Index was ALTER INDEX ... UNUSABLE but that seems to be not the current state now !

SQL> insert /*+ APPEND */ into target_data
2 select * from source_data;

367156 rows created.

SQL> commit;

Commit complete.

SQL> select segment_name, segment_type, bytes/1048576
2 from user_segments
3 where segment_name like 'TARGET_DATA%'
4 order by 1;

SEGMENT_NAME SEGMENT_TYPE BYTES/1048576
------------------------------ ------------------ -------------
TARGET_DATA TABLE 49
TARGET_DATA_NDX_1 INDEX 19

SQL>


So, repopulating the Table has expanded the Index again.
.
.
.


Categories: DBA Blogs

“What do you mean there’s line breaks in the address?” said SQLLDR

RDBMS Insight - Fri, 2016-05-20 18:55

I had a large-ish CSV to load and a problem: line breaks inside some of the delimited fields.

Like these two records:

one, two, "three beans", four
five, six, "seven
beans", "eight wonderful beans"

SQL Loader simply won’t handle this, as plenty of sad forum posts attest. The file needs pre-processing and here is a little python script to do it, adapted from Jmoreland91’s solution on Stack Overflow.

import sys, csv, os
 
def hrtstrip (inputfile,outputfile,newtext):
    print("Input file " + inputfile)
    print("Output file " + outputfile)
    with open(inputfile, "r") as input:
       with open(outputfile, "w") as output:
          w = csv.writer(output, delimiter=',', quotechar='"', 
quoting=csv.QUOTE_NONNUMERIC, lineterminator='\n')
          for record in csv.reader(input):
             w.writerow(tuple(s.replace("\n", newtext) for s in record))
    print("All done")

Thanks to Jmoreland91 for this. If you use it, give him an updoot.

edit – Jason Bucata (@tech31842) tweeted me another StackOverflow with a number of scripts in assorted languages: http://stackoverflow.com/questions/33994244/how-to-remove-newlines-inside-csv-cells-using-regex-terminal-tools

Categories: DBA Blogs

Application Development Platform (Java Cloud, Application Container Cloud, Developer Cloud) New Release 16.2.1

Oracle Cloud has already moved to the latest 16.2.1 release of Application Development platform with Java Cloud Service,  Application Container Cloud Service, and ...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Understanding The Database Options – AutoClose

Pythian Group - Fri, 2016-05-20 13:03

In this blog post we’ll cover how AutoClose works, and why it’s recommended to disable this property.

At the time that the SQL Server service is started, the operating system logs to find the location of the master system database, and requests exclusive lock of the data and log files.

After that, SQL Server performs the reading in view of sys.master_files system and finds all the data files (.mdf, .NDF) and transaction log (.ldf) from all databases stored on the instance, and also requests the exclusive lock these files to initialize each of the databases.

The first situation in which the AutoClose property can influence the performance drop is in the acquisition of this exclusive lock on the data and log files.

If the property is off, this lock is held since service startup until you stop, however if the property is enabled, from the time when there is no more activity in the database, this lock is released and the data and log files are available to any other process.

Initially this situation may seem very interesting, because we could manipulate the data and log files and perform some administrative tasks, such as a backup.

Now imagine that during any backup, an application needs to access the database, what would happen?

The SQL Server would return an error alerting the impossibility of acquiring exclusive lock on files and the database initialization would fail.

Another big performance problem resulting from use of the AutoClose property is related to the use of the Memory Cache and Buffer areas, Plan Cache.

Whenever a query is performed, generates an execution plan that is kept in an area of memory called the Plan Cache. This area of memory is to store the generated execution plans so that they can be reused if the query is executed again.

After the execution plan generation, all search operators are executed and the data pages selected by the query are stored in an area of memory called Cache Buffer. This area of memory is to store the pages of data so that you don’t have to perform new accesses to the disk subsystem and thus optimize the next i/o requests.

When the AutoClose property is enabled and there are no more connections to the database, all the data pages and execution plans that are in memory will be deleted, thus creating a big drop of performance.

We ran a small demonstration just to be clear on this behavior.

Initially you will enable the AutoClose property in the AdventureWorks2012 database, as script below:

USE master
GO
ALTER DATABASE [AdventureWorks2012] SET AUTO_CLOSE ON WITH NO_WAIT
GO

Then let’s perform some queries in the database AdventureWorks2012, as script below:

USE AdventureWorks2012
GO
SELECT * FROM Person.person
GO
SELECT * FROM Sales.salesorderheader

 

After the execution of queries, it is possible to analyze, through the DMV sys.dm os_buffer_descriptors, the amount of data pages that have been allocated in memory to the database AdventureWorks2012, as illustrated on Figure 1:

Fava_AutoClose_1
Figure 1 – Data pages that have been allocated in memory

With the DMVs sys.dm_exec_cached_plans and sys.dm_exec_sql_text we can check execution plans that were stored in memory for queries executed, as illustrated on Figure 2.

Fava_AutoClose_2
Figure 2 – Execution plans stored in memory

So when all connections to the database AdventureWorks2012 are finished, all the memory areas will be cleaned, as the image below:

--Amount of data pages in memory
SELECT
Count (*) TotalPages,
DB_NAME (database_id) DBname
FROM
sys.dm_os_buffer_descriptors
GROUP BY
Db_name (database_id)
ORDER BY
1 DESC

--Amount of in-memory execution plans
SELECT
COUNT (*) TotalPlanos
FROM
SYS.dm_sys.dm_exec_cached_plans
CROSS APPLY
sys.dm_exec_sql_text sys.dm (plan_handle)
Where
[dbid] = 7 and objtype = ' Adhoc '

Fava_AutoClose_3
Figure 3 – Memory usage after close all connections

With this demonstration is extremely simple to conclude that the AutoClose property is always disabled due to performance problems that can bring to a high performance database.

Categories: DBA Blogs

An Effective Approach to Migrate Dynamic Thrift Data to CQL: Part 2

Pythian Group - Fri, 2016-05-20 12:22

Note that this post is Part 2 of a 3-part blog series. If you haven’t read Part 1 of this series, please do so before continuing. Part 1 gives some background knowledge of the problem we’re trying to solve, as well as some fundamental concepts used in the following discussion. The chapter number sequencing also follows that from Part 1. (note: Part 3 can be found here)

4. Static and Dynamic Cassandra Table Definition

In Thrift, Cassandra tables can be defined either statically, or dynamically. When a table is defined dynamically, an application can add new columns on the fly and the column definition for each storage row doesn’t necessary need to be the same. Although a little bit flexible this way, it can be problematic as well because the dynamic column definition information is merely available in the application and invisible to outside world.

In CQL, however, tables are always statically defined, which means that any column and its type information has to be defined in advance before it can be used. Each row in a CQL table has exactly the same columns and type definitions.

In CQL, the dynamism of a Thrift table definition can be achieved through clustering column and/or more advanced column type definition like collections and user defined types (UDTs).

4.1 Static Table Definition

In Thrift, a statically defined table has column_metadata information in the table definition statement, as following:

create column family avg_grade
    with key_validation_class = Int32Type
     and comparator = UTF8Type
     and column_metadata = [
       {column_name: class_id, validation_class: Int32Type},
       {column_name: grade, validation_class: FloatType}
     ]

A strictly equivalent CQL table definition is like this (note the “WITH COMPACT STORAGE” table property):

create table avg_grade (
    key int primary key,
    class_id int,
    grade float
) with compact storage

A statically table defined in either a Thrift utility (cassandra-cli) or a CQL utility (cqlsh) can be accessed in another one with no problem. One difference between the Thrift and CQL definition is that in CQL definition, the row key definition has a name, but Thrift definition doesn’t. In this case, when accessing a table defined in Thrift, CQL uses a default name (“key”) for the row key.

4.2 Dynamic Table Definition

In Thrift, a dynamically defined table does NOT have column_metadata information in the table definition statement. Typically, time-series application adopts dynamic table definition. For example, for a sensor monitoring application, we may use sensor id as the storage row key and for each sensor, we want to record different event values detected by the sensor within a period of time. An example table definition is as following:

create column family sensor_data
   with key_validation_class = Int32Type
    and comparator = TimeUUIDType
    and default_validation_class = DecimalType;

Suppose for this table, 2 events are recorded for sensor 1 and 1 event is recorded for sensor 2. The output from cassandra-cli utility is like below:

-------------------
RowKey: 1
=> (name=1171d7e0-14d2-11e6-858b-5f3b22f4f11c, value=21.5, timestamp=1462680314974000)
=> (name=23371940-14d2-11e6-858b-5f3b22f4f11c, value=32.1, timestamp=1462680344792000)
-------------------
RowKey: 2
=> (name=7537fcf0-14d2-11e6-858b-5f3b22f4f11c, value=11.0, timestamp=1462680482367000)

The above shows output that the columns for each row are dynamically generated by the application and can be different between rows. In CQL, a strictly equivalent table definition and output for the above dynamic Thrift able is as below:

CREATE TABLE sensor_data (
    key int,
    column1 timeuuid,
    value decimal,
    PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
key  | column1                              | value
-----+--------------------------------------+-------------------------
   1 | 1171d7e0-14d2-11e6-858b-5f3b22f4f11c | 0E-1077248000
   1 | 23371940-14d2-11e6-858b-5f3b22f4f11c | -8.58993459E-1077939396
   2 | 7537fcf0-14d2-11e6-858b-5f3b22f4f11c | 0E-1076232192

Since the columns are generated on the fly by the application, CQL doesn’t know the column names in advance. So it uses the default column name “column1” (and also the default key name “key”) in its definition. Functionally, it can be transformed equally to a more descriptive definition as below by using “ALTER TABLE” CQL command to rename the column names (e.g. “key” to “sensor_id”, “column1” to “event_time”), as below:

CREATE TABLE sensor_data (
    sensor_id int,
    event_time timeuuid,
    value decimal,
    PRIMARY KEY (sensor_id, event_time)
) WITH COMPACT STORAGE
4.3 Mixed Table Definition

In thrift, a table can also be in a mixed mode, which means that when a table is created, it has part of its column information being defined in column_metadata, just like a static table. But during runtime, a Thrift based application can add more columns on the go.

The table below, blogs, is such an example.  This table has one column “author” as statically defined. There are also 3 more columns (tags:category, tags:rating, and tags:recommend) for RowKey 1 defined on the fly.

create column family blogs
    with key_validation_class = Int32Type
     and comparator = UTF8Type
     and column_metadata = [
       {column_name: author, validation_class: UTF8Type}
     ]
-------------------
RowKey: 1
=> (name=author, value=Donald, timestamp=1462720696717000)
=> (name=tags:category, value=music, timestamp=1462720526053000)
=> (name=tags:rating, value=A, timestamp=1462720564590000)
=> (name=tags:recommend, value=Yes, timestamp=1462720640817000)
-------------------
RowKey: 2
=> (name=author, value=John, timestamp=1462720690911000)

When examining in CQL, in both table schema and data output, we can only see the columns as statically defined. The dynamically columns are NOT displayed.

CREATE TABLE blogs (
    key int PRIMARY KEY,
    author text
) WITH COMPACT STORAGE
key  | author
-----+--------
1    | Donald
2    | John
Categories: DBA Blogs

Links for 2016-05-19 [del.icio.us]

Categories: DBA Blogs

Restore database schema from full expdp backup

Learn DB Concepts with me... - Thu, 2016-05-19 15:01

Import schema from full db expdp backup:

In some situations you might want to restore a single schema from entire EXPDP backup. In this example I want to explain how to import a single schema from full DB expdp backup.





Lets backup the full database using the EXPDP:

F:\>expdp atoorpu directory=dpump dumpfile=fulldb_%U.dmp logfile=fulldb.log full=y compression=all parallel=8

Export: Release 11.2.0.4.0 - Production on Tue May 17 14:55:21 2016

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
Password:

UDI-28002: operation generated ORACLE error 28002
ORA-28002: the password will expire within 7 days

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Master table "ATOORPU"."SYS_EXPORT_SCHEMA_01" successfully loaded/unloaded
Starting "ATOORPU"."SYS_EXPORT_SCHEMA_01":  atoorpu/******** directory=dpump dumpfile=fulldb_%U.dmp logfile=fulldb.log full=y compression=all parallel=8

.
.
.
.
.
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/INDEX/INDEX
. . exported "SCOTT"."DEPT"                             4.976 KB       4 rows
. . exported "SCOTT"."EMP"                              5.625 KB      14 rows
. . exported "SCOTT"."SALGRADE"                         4.890 KB       5 rows
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/INDEX/FUNCTIONAL_INDEX/INDEX
.
.
.
.
.
Master table "ATOORPU"."SYS_EXPORT_FULL_01" successfully loaded/unloaded
******************************************************************************
Dump file set for ATOORPU.SYS_EXPORT_FULL_01 is:
  C:\DPUMP\FULLDB_01.DMP
  C:\DPUMP\FULLDB_02.DMP
  C:\DPUMP\FULLDB_03.DMP
  C:\DPUMP\FULLDB_04.DMP
  C:\DPUMP\FULLDB_05.DMP
  C:\DPUMP\FULLDB_06.DMP
  C:\DPUMP\FULLDB_07.DMP
  C:\DPUMP\FULLDB_08.DMP
Job "ATOORPU"."SYS_EXPORT_FULL_01" successfully completed at Tue May 17 14:27:13 2016 elapsed 0 00:05:36

Restore schema into a temp SCOTT1 schema:


Now that I have completed my full DB backup. I want to restore just SCOTT schema from full backup into a temp schema called SCOTT1. For this I will need to pass two parameters
1. schemas = Schema name you want to import
2. remap_schema= This is to remap that schema to a temp SOCTT1 schema


F:\>impdp atoorpu directory=dpump dumpfile=fulldb_%U.dmp logfile=fulldb_imp.log schemas=SCOTT remap_schema=SCOTT:SCOTT1

Import: Release 11.2.0.4.0 - Production on Tue May 17 14:57:21 2016

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
Password:

UDI-28002: operation generated ORACLE error 28002
ORA-28002: the password will expire within 7 days

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Master table "ATOORPU"."SYS_IMPORT_SCHEMA_01" successfully loaded/unloaded
Starting "ATOORPU"."SYS_IMPORT_SCHEMA_01":  atoorpu/******** directory=dpump dumpfile=fulldb_%U.dmp logfile=fulldb_imp.log schemas=SCOTT remap_schema=SCOTT:SCOTT1
Processing object type DATABASE_EXPORT/SCHEMA/USER
Processing object type DATABASE_EXPORT/SCHEMA/GRANT/SYSTEM_GRANT
Processing object type DATABASE_EXPORT/SCHEMA/ROLE_GRANT
Processing object type DATABASE_EXPORT/SCHEMA/DEFAULT_ROLE
Processing object type DATABASE_EXPORT/SCHEMA/PROCACT_SCHEMA
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/TABLE_DATA
. . imported "SCOTT1"."DEPT"                             4.976 KB       4 rows
. . imported "SCOTT1"."EMP"                              5.625 KB      14 rows
. . imported "SCOTT1"."SALGRADE"                         4.890 KB       5 rows
. . imported "SCOTT1"."BONUS"                                0 KB       0 rows
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/INDEX/INDEX
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/CONSTRAINT/CONSTRAINT
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/INDEX/STATISTICS/INDEX_STATISTICS
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/CONSTRAINT/REF_CONSTRAINT
Processing object type DATABASE_EXPORT/SCHEMA/TABLE/STATISTICS/TABLE_STATISTICS
Job "ATOORPU"."SYS_IMPORT_SCHEMA_01" successfully completed at Tue May 17 14:57:34 2016 elapsed 0 00:00:08


Restore SCOTT into SCOTT schema and replace exiting tables :

If you want to restore it with same schema SCOTT and replace existing tables use this


impdp atoorpu directory=dpump dumpfile=fulldb_%U.dmp logfile=fulldb_imp.log schemas=SCOTT ABLE_EXISTS_ACTION=REPLACE


If you want to restore it with same schema SCOTT but append (add) rows to existing tables use this
impdp atoorpu directory=dpump dumpfile=fulldb_%U.dmp logfile=fulldb_imp.log schemas=SCOTT ABLE_EXISTS_ACTION=APPPEND
Categories: DBA Blogs

A Test Drive With Azure

Pythian Group - Thu, 2016-05-19 11:55

While recently reading the latest Microsoft Azure updates, I found that we could try some VM’s from Azure Marketplace for free, even without any Azure subscription account. The feature is called “Azure test drive”. I found it interesting and decided to give a try, and see what it could offer. Here I am going to share my “driving” experience with the service.

 

I opened a browser where I was not logged to Azure, and went to the Test drives on Microsoft Azure webpage. There you can see a list of different VMs including DataStax, NetApp DataVault, and others. There are dozens of services listed there. I picked up a “SoftNAS Cloud NAS”, clicked and was redirected to the product page. On this page I had two options to create a VM. The main one was the usual “Create Virtual Machine” button, and the second one was located a bit lower and stated “Or, take a free test drive first”. I correctly assumed that was exactly what I needed (not a rocket science), and clicked it. I was was offered either to login or create a new “Orbitera” account for Azure Marketplace test drives.

 

The new service did not required an Azure account, but still asked to create one on another service. I didn’t have that account yet, and opted to create a new one. On the account creation page I was asked standard questions, filled out forms, and at the bottom confirmed that I was not a robot and pushed “sign up”. I got an email with a validation link just thirty seconds after that.
After validating my account I was able to sign up, and got to the test drive the page. I then received a button saying “Launch Test Drive”, an introduction video and couple of guides in pdf to download. One of the guides was a SoftNAS Cloud NAS Demo which helped get most out of the test drive, and another one was about the “SoftNAS Cloud NAS” itself. I pushed the button and got a counter showing how many many minutes were remaining to complete the deployment of my test VM.

 

It took less than 7 minutes to get link to the my Softnas interface and credentials. By the way, you don’t need to sit and wait when it is crating, you are going to get an email when the VM is ready to use and when you time is up. The test drive VM lives only one hour, and you have a counter on the test drive page with time left for you exercise. I got an hour to test SoftNAS. I later tried another test drive for SUSE HPC and got only 30 minutes available for it. I think the test time will depend on the service you want to try out, and how much time you may need to go through all the options.
Interestingly, I got a warning connecting to my newly created SoftNAS. It looked like certificate for the web interface was not good. My Firefox browser complained “The owner of softnas33027publicipdns.eastus.cloudapp.azure.com has configured their website improperly.” Of course you can wave that warning and add the certificate to exceptions but, I think,for SoftNAS it will be better to fix it.

 

So, I played with the test VM, shared couple of volumes, and was able to mount them on another VM and on my Mac. It worked pretty well and allowed  me to make myself friendly with the interface and options. When the my hour had finished, the SoftNAS went down and all my mounts became unavailable. If you need more time you can just fire another trial and use one more hour. Yes, it would be a brand new VM, and no configuration would be saved if you had one, but, it will allow you to explore features you weren’t able to try before time was gone.

 

I liked the service. It provides opportunity to test and play with new products in Marketplace and decide either it suits you or not. And you can just show to somebody how to work with SoftNAS, how to share and mount a new NFS or do other things. I hope the test drive will be growing and we see more and more new brands among available products.

Categories: DBA Blogs

Log Buffer #474: A Carnival of the Vanities for DBAs

Pythian Group - Wed, 2016-05-18 14:46

This Log Buffer Edition covers Oracle, SQL Server and MySQL blogs from across the planet.

Oracle:

You might be thinking “Nothing is more frustrating that encountering a string column that is full of dates”. But there is something worse than that.

Unique constraint WWV_FLOW_WORKSHEET_RPTS_UK violated.

Understanding query slowness after platform change

Database Migration and Integration using AWS DMS

Oracle BPM 12c: Browsing the SOAINFRA

SQL Server:

Adding PK Exceptions to SQLCop Tests

Is a RID Lookup faster than a Key Lookup?

Performance Surprises and Assumptions : DATEADD()

Generate INSERT scripts from SQL Server queries and stored procedure output

PowerShell Desired State Configuration: Pull Mode

Continuous Delivery from the 19th Century to TODAY

MySQL:

ProxySQL versus MaxScale for OLTP RO workloads

Properly removing users in MySQL

MySQL/MariaDB cursors and temp tables

Quick start MySQL testing using Docker (on a Mac!)

Query Rewrite plugin can harm performance

Categories: DBA Blogs

Join Pythian at the 2016 Cloud Computing Expo

Pythian Group - Wed, 2016-05-18 09:46

 

It won’t be too long before we’re touching down in New York City to attend another Cloud Computing Expo, June 7-9 at the Javits Centre.

If you plan on attending the show, be sure to stop by Pythian’s booth (#424) to chat with our technical experts about your cloud strategy development and to find how Pythian is currently helping companies maximize performance and reduce costs when moving to the cloud.

Our experts will also be presenting on a wide range of insightful cloud topics, which you won’t want to miss.

 

Our Speakers:

Chris Presley (Solutions Architect) will be speaking on the factors that you should consider when moving a database onto a public cloud.

Alex Lovell-Troy (Director, DevOps) will be guiding listeners from Configuration Management to Cloud Orchestration.

Warner Chaves (Principal Consultant) will be providing a tour of Data Platforms as a Service.

 

Interested in meeting up at the show? Drop us a line, we’d love to connect.

We’re looking forward to seeing you there!

Categories: DBA Blogs

Partition Storage -- 8 : Manually Sizing Partitions

Hemant K Chitale - Wed, 2016-05-18 09:44
As demonstrated in the first post in this series, the  default size for Table Partitions is to start with an I nitial8MB Extent.  The fourth post demonstrated how to resize a Partition that has already been created and populated.

Here is a method to pre-size the Initial Extent. I start with 3 partitions of different sizes.

SQL> create table my_part_tbl_init_sized (id_column number(6), data_column varchar2(100))
2 partition by range (id_column)
3 (partition p_100 values less than (101) segment creation immediate
4 storage (initial 64K next 64K) tablespace hemant,
5 partition p_200 values less than (201) segment creation immediate
6 storage (initial 1M next 1M) tablespace hemant,
7 partition p_max values less than (maxvalue) segment creation immediate
8 storage (initial 8M next 1M) tablespace hemant)
9 /

Table created.

SQL>
SQL> select partition_name, initial_extent, next_extent, tablespace_name
2 from user_tab_partitions
3 where table_name = upper('my_part_tbl_init_sized')
4 order by partition_position
5 /

PARTITION_NA INITIAL_EXTENT NEXT_EXTENT TABLESPACE_NAME
------------ -------------- ----------- ------------------------------
P_100 65536 65536 HEMANT
P_200 1048576 1048576 HEMANT
P_MAX 8388608 1048576 HEMANT

SQL>
SQL> select partition_name, bytes/1024, extents
2 from user_segments
3 where segment_name = upper('my_part_tbl_init_sized')
4 and segment_type = 'TABLE PARTITION'
5 order by 1
6 /

PARTITION_NA BYTES/1024 EXTENTS
------------ ---------- ----------
P_100 64 1
P_200 1024 1
P_MAX 8192 1

SQL>


However, the caveat about Partition Extent Sizes when SPLIT is used as shown in the third post still applies.

SQL> alter table my_part_tbl_init_sized
2 split partition p_max
3 at (301)
4 into (partition p_300, partition p_max)
5 /

Table altered.

SQL>
SQL> select partition_name, initial_extent, next_extent, tablespace_name
2 from user_tab_partitions
3 where table_name = upper('my_part_tbl_init_sized')
4 order by partition_position
5 /

PARTITION_NA INITIAL_EXTENT NEXT_EXTENT TABLESPACE_NAME
------------ -------------- ----------- ------------------------------
P_100 65536 65536 HEMANT
P_200 1048576 1048576 HEMANT
P_300 8388608 1048576 HEMANT
P_MAX 8388608 1048576 HEMANT

SQL>
SQL> select partition_name, bytes/1024, extents
2 from user_segments
3 where segment_name = upper('my_part_tbl_init_sized')
4 and segment_type = 'TABLE PARTITION'
5 order by 1
6 /

PARTITION_NA BYTES/1024 EXTENTS
------------ ---------- ----------
P_100 64 1
P_200 1024 1
P_300 8192 1
P_MAX 8192 1

SQL>


So, SPLITting P_MAX to create P_300 created P_300 with an 8MB extent because the sizing is inherited from P_MAX.  How do I change the Partition Extent sizes on SPLIT ?

SQL> alter table my_part_tbl_init_sized
2 split partition p_max
3 at (501)
4 into (partition p_500 storage (initial 64K next 64K) tablespace hemant, partition p_max)
5 /

Table altered.

SQL>
SQL> alter table my_part_tbl_init_sized
2 split partition p_500
3 at (401)
4 into (partition p_400, partition p_500)
5 /

Table altered.

SQL>
SQL> select partition_name, initial_extent, next_extent, tablespace_name
2 from user_tab_partitions
3 where table_name = upper('my_part_tbl_init_sized')
4 order by partition_position
5 /

PARTITION_NA INITIAL_EXTENT NEXT_EXTENT TABLESPACE_NAME
------------ -------------- ----------- ------------------------------
P_100 65536 65536 HEMANT
P_200 1048576 1048576 HEMANT
P_300 8388608 1048576 HEMANT
P_400 65536 65536 HEMANT
P_500 65536 65536 HEMANT
P_MAX 8388608 1048576 HEMANT

6 rows selected.

SQL> select partition_name, bytes/1024, extents
2 from user_segments
3 where segment_name = upper('my_part_tbl_init_sized')
4 and segment_type = 'TABLE PARTITION'
5 order by 1
6 /

PARTITION_NA BYTES/1024 EXTENTS
------------ ---------- ----------
P_100 64 1
P_200 1024 1
P_300 8192 1
P_400 64 1
P_500 64 1
P_MAX 8192 1

6 rows selected.

SQL>


This time, when I split P_MAX I deliberately specified a size for P_500.  Then, when I split P_500 further, the new P_400 inherited the sizing from P_500.
,
,
,

Categories: DBA Blogs

Another reason why you should use the Data Guard Broker for your #Oracle Standby

The Oracle Instructor - Wed, 2016-05-18 01:00

The Data Guard Broker is recommended for various reasons, this one is less obvious: It prevents a Split-Brain problem that may otherwise occur in certain situations. Let me show you:

[oracle@uhesse ~]$ dgmgrl sys/oracle@prima
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected as SYSDBA.
DGMGRL> show configuration;

Configuration - myconf

  Protection Mode: MaxAvailability
  Members:
  prima - Primary database
    physt - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 18 seconds ago)

This is my setup with 12c, but the demonstrated behavior is the same with 11g already. I will cause a crash of the primary database now, without damaging any files – like a power outage on the primary site:

[oracle@uhesse ~]$ ps -ef | grep smon
oracle    6279     1  0 08:30 ?        00:00:00 ora_smon_prima
oracle    6786     1  0 08:32 ?        00:00:00 ora_smon_physt
oracle    7168  3489  0 08:43 pts/0    00:00:00 grep --color=auto smon
[oracle@uhesse ~]$ kill -9 6279

Don’t do that at home:-) Now the primary is gone, but of course I can failover to the standby:

[oracle@uhesse ~]$ dgmgrl sys/oracle@physt
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected as SYSDBA.
DGMGRL> failover to physt;
Performing failover NOW, please wait...
Failover succeeded, new primary is "physt"

So far so good, my end users can continue to work now on the new primary. But what happens when the power outage is over and the ex-primary comes back up again?

[oracle@uhesse ~]$ sqlplus sys/oracle@prima as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed May 18 08:47:30 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup
ORACLE instance started.

Total System Global Area 1258291200 bytes
Fixed Size		    2923920 bytes
Variable Size		  452985456 bytes
Database Buffers	  788529152 bytes
Redo Buffers		   13852672 bytes
Database mounted.
ORA-16649: possible failover to another database prevents this database from
being opened

The DMON background process of the new primary communicates with the DMON on the ex-primary, telling it that there cannot be two primary databases within the same Data Guard Broker configuration! Try the same scenario without the broker and you will observe the ex-primary coming up until status OPEN. Just wanted to let you know:-)


Tagged: Data Guard
Categories: DBA Blogs

An Effective Approach to Migrate Dynamic Thrift Data to CQL, Part 1

Pythian Group - Tue, 2016-05-17 15:00
1. Introduction

Recently Pythian’s Cassandra team worked on one customer’s request to copy data of several Cassandra tables between two Cassandra clusters (not Cassandra data centers). The original approach we used to copy data is through Cassandra COPY TO/FROM commands because the size of the data to be copied is not large (several hundred mega-bytes per table). The execution of the commands was successful, but for one of the tables that we did data copy, the application complained about missing data. We examined the data for that table using Cassandra cqlsh utility and found no discrepancy. After a further discussion with the customer, we realized that the source tables were created and manipulated by a Thrift based application and the application can dynamically create different columns for different rows, although each row does share a common set of statically defined columns. It is the data in these dynamic columns that are missing during the data copy process.

We addressed the issue, but in the end we felt that we should write something about Cassandra data migration from Thrift to CQL because this is quite a common problem faced by many existing Cassandra users right now, considering that Cassandra is gradually phasing out Thrift and replacing it with CQL.

This post is the first part of a three-post series. In this post (Part 1), we’re going to dive into the details of Cassandra storage engine (pre-3.0 version) and explore some fundamental concepts that are key to better understanding the discussion in the following posts. In the next post (Part 2), we’re going to explore how Cassandra tables can be defined statically, dynamically, or in a mixed mode in Thrift and what the corresponding table definition in CQL are. In the last post (Part 3), we’ll present an effective approach to migrate dynamically generated data in Thrift into a statically defined CQL table, without suffering any data loss.

1.1. Cassandra Transition from Thrift to CQL API

Apache Thrift is a software framework developed at Facebook for “scalable cross-language services development”. In early days of Cassandra, Thrift base API was the only method to develop Cassandra client applications. But with the maturity of CQL (Cassandra query language), Cassandra is gradually moving away from Thrift API to CQL API. Along with this trend,

  • Thrift based client drivers are not officially supported.
  • Thrift API will not get new Cassandra features; it exists simply for backward compatibility purpose.
  • CQL based “cqlsh” utility is replacing thrift based “cassandra-cli” utility as the main command-line tool to interact with Cassandra.

Please note that the latest version of CQL is 3.x (3.4 as of May, 2016). CQL2 is deprecated and removed for Cassandra 2.2 and later.  In the discussion below, we will simply use CQL to refer to CQL 3.

2. Overview of Internal Cassandra Data Storage Structure

Please note that since Cassandra 3.0, the underlying storage engine for Cassandra has gone through a lot of changes. The discussion in this post series is for pre-3.0 Cassandra (v 2.2 and before).

At very high level, a Cassandra table (or column family by old term) can be seen as a map of sorted map in the following format *:

     Ma<RowKey, SortedMap<ColumnKey, ColumnValue>>

A graphical representation looks like below:

Cassanra_thrift-to-cql-scrnsht1

Please note that although there are more complex structures such as Super Column and Composite Column, the basic idea remains the same and the representation above is good enough for us to describe the problem in this document.

Internally, such a storage structure is where both Thrift and CQL APIs are based. The difference is that Thrift API manipulates the storage structure directly, but CQL API does so through an abstraction layer and expresses the data to user in a tabular form similar to what SQL does for a relational database.

In order to make this clearer, let’s use an example to compare the outputs between cassandra-cli and cqlsh command utilities, which are based on Thrift and CQL protocols separately. The table schema is defined as below (in CQL format):

CREATE TABLE song_tags (
   id uuid,
   tag_name text,
   PRIMARY KEY (id, tag_name)
) WITH COMPACT STORAGE

This simple table is used to maintain the song tags. After inserting several rows in this table, we examined the table content using both Thirft based “cassandra-cli” utility and CQL based “cqlsh” utility. The result is as below:

Cassanra_thrift-to-cql-scrnsht2

From the example above, it can be easily noticed that between Thrift and CQL, the terms “row” and “column” don’t share the same meaning and this causes some confusion when people do Thrift to CQL migration. For example,

  • In Thrift, one row means one data partition that is determined by the partition key definition. Each row has multiple columns, or more precisely “cells”. Each cell contains the time-stamp of when it is created. The name/key of the cell/column is not necessarily the name as defined in the table definition, especially when there are clustering column(s) defined (just as in the example above)
  • In CQL, one row could be one partition, or could be one part of a partition. It really depends on how the partition key and cluster key are designed

Due to such differences, I’m going to use the following terms in this document for clarification:

  • “CQL Row”, or simply “Row”, refers to a row in CQL context
  • “Storage Row”, refers to a row in Thrift context
  • “Column”, refers to a column in both CQL or Thrift contexts
  • “Cell” particularly refers to a column in Thrift context
  • “Table” refers to a table in CQL context, or a column family in Thrift context
  • “Partition” refers to a data partition determined by the hash key. In Thrift context, “Partition” and “Storage Row” has the same meaning. In CQL context, one “Partition” includes multiple “CQL Rows”.
3. Compact Storage

In CQL, a table property, called for COMPACT STRORAGE, is created for backward compatibility.  As the name suggests, tables created with this directive consumes less storage space compared with those created without this directive.

To make this clear, we also uses an example to explain. Basically, two tables are created to keep track of the average student grades for classes in a school. The table definition for them are exactly the same except that one (avg_grade) is defined without COMPACT STORAGE property, but another (avg_grade2) does. The same records of data are also inserted into both tables, as below:

CREATE TABLE avg_grade (
    student_id int,
    class_id int,
    grade double,
    PRIMARY KEY (student_id, class_id)
)

CREATE TABLE avg_grade2 (
    student_id int,
    class_id int,
    grade double,
    PRIMARY KEY (student_id, class_id)
) WITH COMPACT STORAGE

insert into avg_grade(student_id, class_id, grade) values (1, 1, 75.2);
insert into avg_grade(student_id, class_id, grade) values (2, 1, 81.3);

insert into avg_grade2(student_id, class_id, grade) values (1, 1, 75.2);
insert into avg_grade2(student_id, class_id, grade) values (2, 1, 81.3);

The statements are executed in cqlsh utility. The data is then flushed from memory to disk with “nodetool flush” command. After that, sstable2json utility is used to examine the contents of the SSTable data files for both tables. Below is the output:

Cassanra_thrift-to-cql-scrnsht3

From the output above, we can see that tables NOT in compact storage mode has more cells within each Storage Row (e.g. the extra cell with empty value for clustering column “class”) and each Cell stores more metadata (e.g. the name of the “grade” column is added in each of the row). So just from storage perspective, having a table defined in compact storage could save quite some storage space, especially when we’re dealing with many columns and/or with complex column types like collections.

In Thrift, tables are always stored in compact storage mode. In CQL, tables by default are stored in non-compact storage mode, unless the tables are defined with “COMPACT STRORAGE” property. As a result of this, CQL tables without “COMPACT STORAGE” property are not visible in Thrift based utilities like cassandra-cli.

When a CQL table is defined with COMPACT STORAGE property, it gets the benefit of saving some disk space. However, there are also some caveats that need to pay attention to. For example:

  • It cannot have new columns added or existing columns dropped.
  • If it has a compound primary key (multiple columns), then at most one column can be defined as not part of the key.
  • It cannot have a column defined with non-frozen collection types.
    • Note that for people who are not familiar with the concept of “frozen” vs. “non-frozen” collection, a frozen collection serializes all sub-components of the collection into one single value when stored, which is treated as a blob and the whole value must be updated once. On the contrary, A non-frozen collection allows updates on individual fields.
Categories: DBA Blogs

Characterset Conversion Conundrums

Pythian Group - Mon, 2016-05-16 10:30

Every now and then a database needs to be migrated from one characterset to another. The driver behind this is ususally to either synchronize all databases to a single standard or to support new characters or symbols like €,

Categories: DBA Blogs

Replication between Tungsten clusters

Pythian Group - Mon, 2016-05-16 09:52
Replication between Tungsten clusters

The process I will describe in this post will allow you to configure replication between Tungsten clusters. The most common use case I have seen for this is a dedicated ETL cluster.

The setup will look like this:

 

Replication between Tungsten clusters

So we will have a composite datasource (compositeprod) composed of east and west clusters, and two ETL clusters, one on each side.

The described setup allows failover or switchover within a single datacenter (e.g. db1.east -> db2.east) or to the Disaster Recovery Site, (e.g. db1.east -> db1.west) for the core cluster.

At the time of this article’s publication it is not possible to replicate between two composite clusters, so each ETL cluster needs to be standalone. This feature is expected for Tungsten 5.0 release.

We will install a standalone replicator on the ETL hosts to bring data in from the core cluster. This assumes you have already completed the required prerequisites.

Configuration

Using the .ini files installation method, the config file for the core cluster hosts would be:

[defaults]
application-password=****
application-port=9999
application-user=app
home-directory=/opt/continuent
mysql-connectorj-path=/opt/continuent/connector-java/mysql-connector-java-5.1.24/mysql-connector-java-5.1.24-bin.jar
replication-password=****
replication-port=3306
replication-user=tungsten
start-and-report=true

[east]
master=db1.east
members=db1.east,db2.east,db3.east

[west]
master=db1.west
members=db1.west,db2.west,db3.west

[compositeprod]
dataservice-composite-datasources=east,west

The config file for etl.east hosts would be:

[defaults]
application-password=****
application-port=9999
application-user=etl
home-directory=/opt/continuent
mysql-connectorj-path=/opt/continuent/connector-java/mysql-connector-java-5.1.24/mysql-connector-java-5.1.24-bin.jar
replication-password=****
replication-port=3306
replication-user=tungsten
start-and-report=true

[defaults.replicator]
home-directory=/opt/continuent/replicator
executable-prefix=tr
rmi-port=10002
thl-port=2113
property=replicator.log.slave.updates=false

[etl_east]
topology=clustered
master=etl1.east
members=etl1.east,etl2.east,etl3.east

[compositeprod]
topology=cluster-alias
master=db1.east
members=db1.east,db2.east,db3.east

[core_etl_east]
topology=cluster-slave
master-dataservice=compositeprod
slave-dataservice=etl_east

For etl.west hosts:

[defaults]
application-password=****
application-port=9999
application-user=etl
home-directory=/opt/continuent
mysql-connectorj-path=/opt/continuent/connector-java/mysql-connector-java-5.1.24/mysql-connector-java-5.1.24-bin.jar
replication-password=****
replication-port=3306
replication-user=tungsten
start-and-report=true

[defaults.replicator]
home-directory=/opt/continuent/replicator
executable-prefix=tr
rmi-port=10002
thl-port=2113
property=replicator.log.slave.updates=false

[etl_west]
topology=clustered
master=etl1.west
members=etl1.west,etl2.west,etl3.west

[compositeprod]
topology=cluster-alias
master=db1.west
members=db1.west,db2.west,db3.west

[core_etl_west]
topology=cluster-slave
master-dataservice=compositeprod
slave-dataservice=etl_west
Installation

Now we need to install the cluster by running the Tungsten installer on each core and ETL host:

./install/continuent-tungsten-4.0.2-6072082/tools/tpm install

Also on ETL hosts only, we need to install the standalone replicator:

./install/tungsten-replicator-4.0.0-18/tools/tpm install

Each ETL host will end up with 2 different replicator processes:

  1. The replicator that brings data in from the core cluster
  2. The replicator that reads data from the ETL cluster master to replicate to the ETL slaves

With the ‘executable-prefix‘ option, it is easy to reference each one:

  1. tr_trepctl status will show info from the core -> ETL replicator
  2. trepctl status will show info from the ETL cluster replicator
Tips & tricks

One thing to keep in mind is that for aliases to actually work, we need to add the folllowing line to .bashrc:

. "/opt/continuent/replicator/share/env.sh"

as there is a bug where the installer only adds the aliases from the cluster installation, and ignores the ones from the standalone replicator installation.

The [defaults.replicator] tag is helpful to pass options to the standalone replicator (it is ignored by the cluster installation), as the same tungsten.ini file will be read by both the cluster and standalone replicator products.

In this case I am using it to specify different THL and RMI ports, so the standalone replicator does not collide with the cluster replicator.

We also need to tell the core -> ETL replicator not to log its updates to the binlog, otherwise the changes will be picked up by the ETL cluster replicator as well, and end up being applied two times on the ETL slaves, resulting in consistency errors.

Do you have any questions about replication between Tungsten clusters? Let me know in the comments!

 

Categories: DBA Blogs

Links for 2016-05-15 [del.icio.us]

Categories: DBA Blogs

Partner Webcast – Oracle Database In-Memory: Accelerate Business

Businesses must compete in today’s high-speed, always-on world where requirements are more demanding than ever. That’s easier said than done, especially when decision-makers must wait hours—in some...

We share our skills to maximize your revenue!
Categories: DBA Blogs