Skip navigation.

Pythian Group

Syndicate content
Official Pythian Blog - Love Your Data
Updated: 1 hour 16 min ago

COLLABORATE16 IOUG – Call For Papers

Thu, 2015-09-03 11:48

There’s so many ways to proceed
To get the knowledge you need
One of the best
Stands out from the rest
COLLABORATE16 – indeed!

Why not be part of the show
By sharing the stuff that you know
Got something to say
For your colleagues each day
Call for papers –> let’s go

I believe many of you would agree that regardless of how insignificant you believe your corner of the Oracle technology may be, everyone has something to say. I attended my first show in Anaheim CA USA in 1990 and started presenting at shows the year after in Washington DC USA. It’s not hard to get over the hump, moving from I would love to present a paper at a show but I just don’t have the koyich to wow that was fun. The only way you will ever get the strength is to do it (and do it and do it …).

Some suggestions for getting started …

  1. Co-present with a colleague
  2. Collaborate through paper and slides development WITH your colleague rather than parcel off portions to one another then merge at the end.
  3. Be cautions of trying to cover too much in too little time (I once attended a session at IOUW [a pre-cursor to COLLABORATE] where the presenter had over 400 slides to cover in 45 minutes].
  4. Ask for assistance from seasoned presenters (mentor/protégé type relationship).
  5. Go slowly at first and set yourself some realistic but aggressive goals.

The experience of presenting at shows is rewarding and I for one do it as much as I can … Ensuring Your Physical Standby is Usable and Time to Upgrade to 12c (posting of 2015 presentation details pending).

The confidence gain, personal koyich, and rewards of presenting at events are life long and can help propel your career into the ionosphere. Speaking of confidence, 20 months ago I started playing bridge. Now look where my experience presenting at shows and writing for Oracle Press got me … check this out :).

Surprises surprises abound
With the new confidence found
Presenting is great
Get now on your plate
In no time you’ll be so renowned

 

Discover more about our expertise in the world of Oracle.

Categories: DBA Blogs

Oracle EBS R12.2: Restarting Online Patching Enablement patch

Thu, 2015-09-03 11:33

If you are in process of upgrading to Oracle E-Business Suite 12.2.4, you would have went though this critical phase in the upgrade which is to apply the Online Patching Enablement patch:

13543062:R12.AD.C.

It’s very common to run into errors with this patch in the first try and have to apply it couple of times, in order to get all issues fixed and get online patching enabled. The recommended command to apply this patch is:

adpatch options=hotpatch,forceapply

When the time comes to re-apply the patch to fix problems, if you use the same command to reapply the patch, you will notice that the patch completed normal with in no time and nothing happens in the back end. This is because of a specific feature from Adpatch. ADPATCH by default skips jobs that are marked as “run successfully” in previous runs or as part of another patch. So we have to force it re-run those jobs. This can be done by using command below:

adpatch options=hotpatch,forceapply,nocheckfile

Sometimes we run into cases where Online Patching Enablement patch completes as “normal” and the actual online patching feature gets enabled where we see that a schema or two have failed to enable the EBR feature. As soon as APPS schema gets EBR enabled by this patch, even though other custom schemas failed to get enabled, Adpatch gets disabled and we are forced to adop utility from then on. In this scenario, we can still re-apply the Online Patch Enablement using Adpatch after setting the environment variable below:

export ENABLE_ADPATCH=YES

I see that online patching enablement exercise for every customer is a unique experience. Do post your experiences with this online patching enablement patch in the comments section. I’d love to hear your story!

Discover more about Pythian’s expertise in the world of Oracle.

Categories: DBA Blogs

VMware Debuts SQL Server DBaaS Platform

Thu, 2015-09-03 11:14

VmWare1

Yesterday at VMworld, VMware announced its entry into the managed database platform market with the introduction of vCloud Air SQL. This new service is an on-demand, managed service offering of Microsoft SQL Server. It’s meant to further the adoption of hybrid operations, since it can be used to extend on-premises SQL Server use into the cloud.

Currently the two major players in this space are Amazon RDS and Azure SQL. Both of those offerings are significantly more mature and feature-rich than VMware’s service as outlined in the early access User Guide.

The beta version of vCloud Air SQL has a number of initial limitations such as:

  • SQL Server Licensing is not included or available. Meaning that the vCloud Air SQL platform is utilizing a “bring your own license” (BYOL) model. This requires that you have an enterprise agreement with software assurance in order to leverage license mobility for existing instances.
  • SQL 2014 is not currently offered, only SQL 2008 & SQL 2012 are supported at this time.
  • SQL Server Instances are limited to 150GB
  • Service tiers are limited to three choices at launch and altering the service tier of an existing instance is not supported at this time.

Although there are a number of limitations, reviewing the early access beta documentation reveals some interesting details about this service offering:

  • “Instant” Snapshot capabilities appear to be superior to any competitors managed service offerings. These features will be appealing to organizations leveraging DevOps and automated provisioning methodologies.
  • Persistent storage is solid state (SSD) based and will likely be more performant than competing HDD offerings.
  • A new cloud service named vCloud Air SQL DR is planned as a companion product. This service will integrate with an organization’s existing on-premises SQL Server instances. Once integrated, it will provide a variety of cloud based disaster recovery options leveraging Asynchronous replication topologies.

If you want to try this new service, VMware is offering a $300 Credit for first time vCloud Air users HERE.

Discover more about Pythian’s expertise in SQL Server.

 

 

Categories: DBA Blogs

Autoconfig in Oracle EBS R12.2

Thu, 2015-09-03 08:28

All seasonal Oracle Apps DBAs know that Autoconfig is the master utility that can configure the whole E-Business Suite Instance. In E-Business Suite releases 11i, 12.0 and 12.1 running Autoconfig recreated all the relevant configurations files used by Apache server. If the context file has the correct settings, then configuration files should include the correct setting after running Autoconfig. This is not the case anymore in Oracle E-Business Suite 12.2. Some of the Apache config files are under fusion middleware control now, namely httpd.conf, admin.conf and ssl.conf. All other Apache config files are still under Autoconfig control. But these 3 critical config files include the main config pieces like Webport, SSL port etc.

So if you have to change the port used by EBS instance, then you have to log into the Weblogic admin console and change port there and then sync context xml file using adSyncContext.pl. This adSyncContext.pl utility will get the current port values from Weblogic console and update the xml with new port values. Once the context xml file syncs, we have to run Autoconfig to sync other config files and database profile values to pickup new webport

Similarly, if you want to change the JVM augments or class path, you have run another utility called adProvisionEBS.pl to make those changes from command line or login to the Weblogic admin console to do those changes. Interestingly, few of the changes done in Weblogic admin console or fusion middleware control are automatically synchronized with context xml file by the adRegisterWLSListeners.pl script that runs in the background all the time. But Apache config file changes were not picked by this script, so Apache changes had to be manually synchronized

There are a few My Oracle Support notes that can help you understand these utilities little more, such as 1676430.1 and 1905593.1. But understand that Autoconfig is a different ball game in Oracle E-Business Suite R12.2.

Discover more about Pythian’s expertise in the world of Oracle.

Categories: DBA Blogs

Amazon RDS Migration Tool

Wed, 2015-09-02 15:06

Amazon has just released their RDS Migration Tool, and Pythian has recently undertaken training to use for our clients. I wanted to share my initial thoughts on the tool, give some background on its internals, and provide a walk-through on the functionality it will be most commonly used for.

There are many factors to consider when evaluating cloud service providers, including cost, performance, and high availability and disaster recovery options. One of the most critical and overlooked elements of any cloud offering though, is the ease of migration. Often, weeks are spent evaluating all of the options only to discover after the choice is made that it will take hours of expensive downtime to complete the migration, and that there is no good rollback option in the case of failure.

In order to reduce the friction inherent in the move to a DBaaS offering, Amazon has developed an RDS Migration tool. This is an in-depth look at this new tool, which will be available after September 1, 2015. Contact Pythian to start a database migration.

With the introduction of the RDS Migration tool, Amazon has provided a powerful engine capable of handling much more than basic migration tasks. It works natively with Oracle, SQL Server, Sybase, MySQL, PostgreSQL, Redshift (target only), Aurora (target only), and provides an ODBC connector for all other source systems. The engine is powerful enough to handle fairly complex transformations and replication topologies; however, it is a migration tool and isn’t intended for long-term use.

Architecture

Amazon’s RDS Migration Tool architecture is very simple. It consists of your source system, an AWS VM with the Migration Tool installed on it, and the target RDS instance.

Each migration is broken up into Tasks. Within a Task, a source and target database are defined, along with the ability to transform the data, filter the tables or data being moved, and perform complex transformations.

Tasks can be scheduled to run at particular times, can be paused and resumed, and can alert on success or failure. It’s important to note that if a task is paused while a table is loading, that table will be reloaded completely from the beginning when the task resumes.

Within a running task, the following high-level steps are performed:
• Data is pulled from the source using a single thread per table
• Data is converted into a generic data type
• All transformations are applied
• Data is re-converted into the target system’s datatype and inserted
• After the initial load, if specified, the tool monitors for updates to data and applies them in near real-time

While processing the data, each table has a single thread reading from it, and any updates are captured using the source system’s native change data capture utility. Changes are not applied until after the initial load is completed. This is done to avoid overloading the source system, where it’s assumed client applications will still be running.

Performance Considerations

There are several factors which might limit the performance seen when migrating a database.

Network Bandwidth
Probably the biggest contributor to performance issues across data centers, there is no magic button when moving to RDS. If the database is simply too big or too busy for the network to handle the data being sent across, then other options may need to be explored or used in conjunction with this tool.

Some workarounds to consider when network performance is slow include:
• Setup AWS Direct Connect
• Use a bulk-load utility, and then use the tool to catch up on transactions
• Only migrate data from a particular point in time

RDS Migration Tool Server CPU
The migration tool converts all data into a common data type before performing any transformations, then converts them into the target database’s data type. This is obviously very heavy on the server’s CPU, and this is where the main performance bottlenecks on the server are seen.

Capacity of Source database
This tool uses a single SELECT statement to migrate the data, and then returns for any changed data after the initial bulk load is completed. On a busy system, this can be a lot of undo and redo data to migrate, and the source system needs to be watched closely to ensure the log files don’t grow out of control.

Capacity of Target database
In the best case scenario, this will be the limiter as it means all other systems are moving very fast. Amazon does recommend disabling backups for the RDS system while the migration is running to minimize logging.

Walkthrough

The following walkthrough looks at the below capabilities of this tool in version 1.2:

• Bulk Data Migration to and from the client’s environment and Amazon RDS
• Near Real-Time Updates to data after the initial load is completed
• The ability to transform data or add auditing information on the fly
• Filtering capabilities at the table or schema level

You will need to have setup network access to your databases for the RDS Migration Tool.

1. After confirming access with your account manager, access the tool by opening the AWS console, selecting EC2, and choosing AMIs.
AWS Console

2. Select the correct AMI and build your new VM. Amazon recommends an M4.large or M4.xlarge.

3. After building the new VM, you will need to install the connectors for your database engine. In this example, we’ll be using Oracle Instant Client 12.1.0.2 and MySQL ODBC Connector 5.2.7.

  • For the SQL Server client tools, you will need to stop the Migration services before installing.

4. Access the Migration Tool

  • Within VM: http://localhost/AmazonRDSMigrationConsole/
  • Public URL: https:[VM-DNS]/AmazonRDSMigrationConsole/
    • Username/Password is the Administrator login to the VM

5. The first screen after logging in displays all of your current tasks and their statuses.
RDS Migration Tool Home Screen

6. Clicking on the Tasks menu in the upper-left corner will bring up a drop-down menu to access Global Settings. From here, you can set Notifications, Error Handling, Logging, etc…
RDS Migration Tool Global Settings

7. Back on the Tasks menu, click the Manage Databases button to add the source and target databases. As mentioned earlier, this walkthrough will be an Oracle to Aurora migration. Aurora targets are a MySQL database for the purposes of this tool.
RDS Migration Tool Manage Databases Pop-Up

8. After defining your connections, close the Manage Databases pop-up and select New Task. Here, you can define if the task will perform a bulk-load of your data and/or if it will attempt to apply changes made.
RDS Migration Tool New Task

9. After closing the New Task window, simply drag & drop the source and target connectors into the task.

10. By selecting Task Settings, you can now define task level settings such as number of threads, truncate or append data, and define how a restart is handled when the task is paused. You can also override the global error handling and logging settings here.

  • The best practice recommendation is to find the largest LOB value in your source database and set that as the max LOB size in the task. Setting this value allows the task to optimize LOB handling, and will give the best performance.

RDS Migration Tool Task Settings

11. Select the Table Selection button to choose which tables will be migrated. The tool uses wildcard searches to allow any combination of tables to exclude or include. For example, you can:

  • Include all tables in the database
  • Include all tables in a schema or set of schemas
  • Exclude individual tables and bring over all remaining tables
  • Include individual tables and exclude all remaining tables

The tool has an Expand List button which will display all tables that will be migrated.

In this screenshot, all tables in the MUSER08 schema that start with T1 will be migrated, while all tables that start with T2 will be excluded EXCEPT for the T20, T21, T22, & T23 tables.
RDS Migration Tool Table Selection

12. After defining which tables will be migrated, select an individual table and choose the Table Settings button. Here you can add transformations for the individual tables, add new columns or remove existing ones, and filter the data that is brought over.

In this screenshot, the T1 table records will only be brought over if the ID is greater than or equal to 50 and the C1 column is LIKE ‘Migrated%’
RDS Migration Tool Table Settings

13. Select the Global Transformations button. Like the table selection screen, you use wildcards to define which tables these transformations will be applied to.
You can:

  • Rename the schema
  • Rename the table
  • Rename columns
  • Add new columns
  • Drop existing columns
  • Change the column data types

In this screenshot, a new column named MigratedDateTime will be created on all tables and populated with the current DateTime value.
RDS Migration Tool Global Transformations

14. Finally, save the task and choose Run. This will kick off the migration process and bring up the Monitoring window. From here, you can see the current task’s status, notifications, and errors, as well as get an idea of the remaining time.
RDS Migration Tool Monitoring Window

Categories: DBA Blogs

You work for a software company. You don’t? Think again.

Tue, 2015-09-01 09:26

If someone asks what business your company is in, you might say transportation, or networks, or retail, or a hundred other possibilities. But what you should be saying is that you are also in the software business.

At its core, your company is a software company. Or it should be.

Why? Because your competitors are growing in numbers, emerging from nowhere and aggressively using digital strategies to succeed over you.

To be successful, you must continually innovate and differentiate your company, no matter what your industry. You must do things better, faster, and cheaper. And you must engage your customers and your partners in new and meaningful ways. It doesn’t matter whether you’re a bank, a pharmaceutical company, or a logistics provider. Think like a startup and use software to stay one step ahead.

This connection can be easy if your business is already using software to provide differentiating product features or services. If you sell goods online, or you deliver content, or you offer software as a service, you know you’re in the software business. You probably focus on being more responsive and being agile, that is delivering new features faster than ever before and using data to gain business insights and to optimize the user experience.

For those companies who don’t initially think of themselves as software companies, it’s a little more interesting. In time, they will realize that software is what is differentiating them.

For example, Redline Communications thinks of itself as a wireless infrastructure company that delivers wireless networks in remote locations. In actuality, it uses software to add new features to its network components. It also uses software to expand a network’s capacity on demand, and to troubleshoot problems. Redline might manufacture hardware but it is solidly in the software business.

Pythian is often described as an IT services company, but it is undoubtedly a software company. Nobody at Pythian touches a customer’s critical production system or data without going through a software portal called Adminiscope that secures access and records all activity. Pythian doesn’t sell software, but it is absolutely in the software business.

Then there are the companies that would not traditionally be classified as technology businesses at all, but have clearly made the connection. And that doesn’t mean just having an online presence. Take retailer Neiman Marcus, a company that has consciously entered the software space with the development of apps like  “Snap. Find. Shop.” a tool that lets users take photos of a product they want and helps them track it down. They know they need to engage customers more personally, and the way to do that is through software that enables them to interact with customers, to understand and respond to buying behaviors and preferences.

KAR Auction Services, who you might know as a car auction company, has stated publicly that that they no longer want to be a car auction company that uses technology but “a technology company that sells cars”. They know that software will drive the future of their business.

It is increasing difficult to sell, deliver or support any product or service without using software. It is increasingly difficult to truly understand your business without being data driven, the byproduct of software. It is increasingly difficult to recruit employees without using software. Your customers and your employees expect you to be agile and responsive, and software helps you meet those expectations, and then measures, monitors, analyzes, and integrates data to keep you ahead of the game.

In today’s hyper-competitive world, your company must use software and technology to become agile in order to respond to ever-changing customer needs. Then you must remain as aggressive by measuring, monitoring, evaluating, and responding to data about your products and services as well as their impact on your customers and their environment. Whether it’s customer feedback about product features, or changing market trends, you need to be ready to react and iterate your products and processes at lightning speed. Software is the one thing that’s going to enable that. 

So what does it mean to use software to be competitive? It means departing from tradition. It means empowering IT to go beyond cutting costs to transform the business. It means empowering everyone in the company to innovate around software. It means encouraging radical disruptive ideas on how to change the business. And it means putting a digital strategy at the heart of your planning. And this is certainly what your competition is doing.

Categories: DBA Blogs

asmcmd> a better “du”

Fri, 2015-08-28 14:28

I discovered ASM with a 10.1.0.3 RAC running on Linux Itanium and that was a big adventure. At this time there was no asmcmd. In 2005, Oracle released Oracle 10gR2 and asmcmd came into the place and we figured out how to make it work with a 10gR1 ASM. We were very excited to have a command line for ASM until… we tried it ! let’s call a spade a spade,  it was very poor…

10 years after, Oracle has released 11gR1, 11gR2, 12cR1, asmcmd has been improved but the “ASM shell” remains very weak and specially the “du” command :

ASMCMD> du
Used_MB Mirror_used_MB
 556178 556178
ASMCMD> du .
Used_MB Mirror_used_MB
 556178 556178
ASMCMD> du *
Used_MB Mirror_used_MB
 556265 556265
ASMCMD> ls
ASM_CONFIG/
DATA/
FRA/
LOG/
ASMCMD>

Why “du *” does not act as it acts in any Unix shell ? How do I know the size of each subdirectory in my current directory ?

 

Nowadays, we use to have dozens of instances running on the same server sharing the same ASM :


[oracle@higgins ~]$ ps -ef | grep pmon | wc -l
30
[oracle@higgins ~]$

so should I use one “du” per database (directory) to know the size used by each database ? what if I keep one month of archivelogs in my FRA ? should I wait for the month of February to have only 28 “du” to perform if I want to know the size of archivelogs generated each day (if this is a non-leap year !) ?

 

This is why I wrote this piece of code to have a “du” under ASM that makes my life easier everyday :

[oracle@higgins ~]$ cat asmdu.sh
#!/bin/bash
#
# du of each subdirectory in a directory for ASM
#
D=$1

if [[ -z $D ]]
then
 echo "Please provide a directory !"
 exit 1
fi

(for DIR in `asmcmd ls ${D}`
 do
     echo ${DIR} `asmcmd du ${D}/${DIR} | tail -1`
 done) | awk -v D="$D" ' BEGIN {  printf("\n\t\t%40s\n\n", D " subdirectories size")           ;
                                  printf("%25s%16s%16s\n", "Subdir", "Used MB", "Mirror MB")   ;
                                  printf("%25s%16s%16s\n", "------", "-------", "---------")   ;}
                               {
                                  printf("%25s%16s%16s\n", $1, $2, $3)                         ;
                                  use += $2                                                    ;
                                  mir += $3                                                    ;
                               }
                         END   { printf("\n\n%25s%16s%16s\n", "------", "-------", "---------");
                                 printf("%25s%16s%16s\n\n", "Total", use, mir)                 ;} '
[oracle@higgins ~]$
Let's see it in action with some real life examples :
[oracle@higgins ~]$. oraenv
ORACLE_SID = [+ASM] ? +ASM
The Oracle base remains unchanged with value /oracle
[oracle@higgins ~]$./asmdu.sh DATA

DATA subdirectories size

Subdir  Used MB Mirror MB
------  ------- --------
DB01/    2423    2423
DB02/    2642    2642
DB03/    321201  321201
DB04/    39491   39491
DB05/    180753  180753
DB06/    4672    4672
DB07/    1431    1431
DB08/    2653    2653
DB09/    70942   70942
DB10/    96001   96001
DB11/    57322   57322
DB12/    70989   70989
DB13/    4639    4639
DB14/    40800   40800
DB15/    13397   13397
DB16/    15279   15279
DB17/    19020   19020
DB18/    8886    8886
DB19/    4671    4671
DB20/    14994   14994
DB21/    502245  502245
DB22/    4839    4839
DB23/    10169   10169
DB24/    7772    7772
DB25/    7828    7828
DB26/    112109  112109
DB27/    5564    5564
DB28/    16895   16895
------  ------- ---------
Total   1639627 1639627
[oracle@higgins ~]$

 

Another one with many archivelogs directories :
[oracle@higgins ~]$./asmdu.sh FRA/THE_DB/ARCHIVELOG/

 FRA/THE_DB/ARCHIVELOG/ subdirectories size

 Subdir       Used MB Mirror MB
 ------        ------ ---------
 2015_02_19/    114   114
 2015_02_20/    147   147
 2015_02_21/    112   112
 2015_02_22/    137   137
 2015_02_23/    150   150
 2015_02_24/    126   126
 2015_02_25/    135   135
 2015_02_26/    130   130
 2015_02_27/    129   129
 2015_02_28/    119   119
 2015_03_01/    146   146
 2015_03_02/    150   150
 2015_03_03/    128   128
 2015_03_04/    134   134
 2015_03_05/    44    44
 2015_05_27/    28    28
 2015_05_28/    95    95
 2015_05_29/    76    76
 2015_05_30/    187   187
 2015_05_31/    78    78
 2015_06_01/    111   111
 2015_06_02/    105   105
 2015_06_03/    43    43
 2015_06_04/    142   142
 2015_06_05/    42    42
 2015_06_06/    84    84
 2015_06_07/    70    70
 2015_06_08/    134   134
 2015_06_09/    77    77
 2015_06_10/    143   143
 2015_06_11/    2     2
 2015_06_21/    14    14
 2015_06_22/   14918 14918
 ------       ------- ---------
 Total         18250   18250

[oracle@higgins ~]$

This example is a very nice one as it shows us that 2015 is not a leap year and that some archivelogs are still on disk even if they probably shouldn’t and that’s a good information as v$log_history do not contain these information anymore :

SQL> select trunc(FIRST_TIME), count(*) from v$log_history group by trunc(FIRST_TIME) order by 1 ;

TRUNC(FIR COUNT(*)
--------- ----------
22-JUN-15 402

SQL>

Hope it will also makes your life easier,

Have a good day :)

Categories: DBA Blogs

Simplify Oracle Tracing with Creative Scripting

Fri, 2015-08-28 14:26

Running a SQL trace is something that all DBAs do to varying degrees. Let’s say you are working on optimizing a SQL statement, and experimenting with some different hints for indexes and optimizer directives. This kind of effort typically goes something like this:

  • modify the SQL statement
  • enable tracing
  • run the statement
  • disable tracing
  • disconnect
  • retrieve the trace file
  • use a profiler to process the trace file
    this might be Method-R mrskew,Oracle tkprof, or something of your own.
  • delete the trace file if no longer needed

That process is OK if all you need to do is look at a couple of trace files, but quickly becomes tedious for any serious optimization effort as there will be many iterations of this process.  This is the kind of job that just cries out for some simple automation.

Let’s walk though automating much of this process using Sqlplus, ssh and some profiling tools.

First let’s consider the environment:

  • Oracle 11.2 database on a remote server
  • Workstation has 11.2 client software installed
  • ssh is setup for connecting to the oracle user on the database server
  • some profiling tools are available

Let’s get started with the script that is the subject of our ‘tuning’ effort.

-- sql2trace.sql
select * from dual;

As you can see there is not really going to be any tuning done in this article; it is all about the process.

The following script tracefile_identifier_demo.sql is used to setup the trace environment by collecting some information about the database host the process owner, and then setting the tracefile_identifier parameter.  The values for these are then used to set sqlplus define variables.

-- tracefile_identifier_demo.sql

-- column variables to capture host, owner and tracefile name
col tracehost new_value tracehost noprint
col traceowner new_value traceowner noprint
col tracefile new_value tracefile noprint

set term off head off feed off

-- get oracle owner
select username traceowner from v$process where pname = 'PMON';

-- get host name
select host_name tracehost from v$instance;

-- set tracefile identifier
alter session set tracefile_identifier = 'MYTRACEFILE';

select value tracefile from v$diag_info where name = 'Default Trace File';

set term on head on feed on

-- do your tracing here
alter session set events '10046 trace name context forever, level 12';

-- run your SQL here
@@sql2trace

alter session set events '10046 trace name context off';

-- disconnect to ensure all trace data flushed
-- the disconnect must be done in the called script
-- otherwise the values of the defined vars are lost

-- now get the trace file, or other processing
--@@mrskew '&&traceowner@&&tracehost' '&&tracefile'
@@tkprof '&&traceowner@&&tracehost' '&&tracefile'

This article began as an idea to write about tracefile_identier, hence the script name.

Most of this script is quite straightforward:

  • set column command initiated define variables to capture host, process owner and tracefile name
  • collect the data
  • enable tracing
  • run the target script
  • disable tracing
  • call the tkprof.sql script to run tkprof

The interesting bit is found in tkprof.sql.

-- tkprof.sql

col ssh_target new_value ssh_target noprint
col scp_filename new_value scp_filename noprint

set term off feed off verify off echo off

select '&&1' ssh_target from dual;
select '&&2' scp_filename from dual;

set feed on term on verify on
disconnect

host ssh &&ssh_target 'cat &&scp_filename' | tkprof /dev/stdin ./tkprof.out sort=exeqry sys=no
host cat ./tkprof.out

There are a couple of things to take notice of in tkprof.sql.  Did you notice the disconnect statement?  There are couple of points of interest about that.  Prior to 11g it was necessary to disconnect from Oracle to ensure that all cursors were closed and all STAT and row source operation rows were written to the trace file.  Disconnecting the session is not necessary in Oracle 11g+.

Another interesting bit about this disconnect statement is its placement.  At first the disconnect statement was in the main script.  The problem was that the define variables would all lose their values prior to calling the tkprof.sql script, and so the call would fail; and so the disconnect command is in the called script.

Finally the trace output is retrieved via ssh and piped to tkprof.  Notice that there is no need to actually copy the file, rather the contents of the file are simple sent to STDOUT and piped to tkprof.

The tkprof command does not read from STDIN.  If for instance you try this; cat somefile | tkprof – ./tkprof.out sort=exeqry; tkprof will exit with an error that an input file is needed.  That problem is circumvented by using the file /dev/stdin.

Put it all together and it looks like this:

11:34:11 JKSTILL@oravm > @tracefile_identifier_demo

Session altered.

Elapsed: 00:00:00.00

D
-
X

1 row selected.

Elapsed: 00:00:00.00

Session altered.

Elapsed: 00:00:00.00

TKPROF: Release 11.2.0.3.0 - Development on Thu Aug 27 11:34:18 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Host key fingerprint is de:ad:be:ed:a2:d6:63:4b:rx:77:fd:1c:e1:36:2b:88
+--[ RSA 2048]----+
|                 |
|                 |
|                 |
|         .  .    |
|        S  +.    |
|        ..ox.o   |
|       o+.F.* o  |
|      99+o.o.= . |
|     . ..+y.ooo  |
+-----------------+

TKPROF: Release 11.2.0.3.0 - Development on Thu Aug 27 11:34:18 2015

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Trace file: /dev/stdin
Sort options: exeqry
********************************************************************************
count    = number of times OCI procedure was executed
cpu      = cpu time in seconds executing
elapsed  = elapsed time in seconds executing
disk     = number of physical reads of buffers from disk
query    = number of buffers gotten for consistent read
current  = number of buffers gotten in current mode (usually for update)
rows     = number of rows processed by the fetch or execute call
********************************************************************************

SQL ID: a5ks9fhw2v9s1 Plan Hash: 272002086

select *
from
 dual


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0          2          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.00          0          2          0           1

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 90
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  TABLE ACCESS FULL DUAL (cr=2 pr=0 pw=0 time=22 us cost=2 size=2 card=1)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  log file sync                                   1        0.00          0.00
  SQL*Net message from client                     2        0.00          0.00
********************************************************************************

SQL ID: 06nvwn223659v Plan Hash: 0

alter session set events '10046 trace name context off'


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.00          0          0          0           0

Misses in library cache during parse: 0
Parsing user id: 90

********************************************************************************
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0          2          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        6      0.00       0.00          0          2          0           1

Misses in library cache during parse: 0

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       3        0.00          0.00
  SQL*Net message from client                     3        0.00          0.00
  log file sync                                   1        0.00          0.00


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          3           1
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        1      0.00       0.00          0          0          3           1

Misses in library cache during parse: 0

    2  user  SQL statements in session.
    1  internal SQL statements in session.
    3  SQL statements in session.
********************************************************************************
Trace file: /dev/stdin
Trace file compatibility: 11.1.0.7
Sort options: exeqry
       1  session in tracefile.
       2  user  SQL statements in trace file.
       1  internal SQL statements in trace file.
       3  SQL statements in trace file.
       3  unique SQL statements in trace file.
     218  lines in trace file.
       0  elapsed seconds in trace file.

The same process was used to run the trace data through the Method-R mrskew command:

-- mrskew.sql

col ssh_target new_value ssh_target noprint
col scp_filename new_value scp_filename noprint

set term off feed off verify off echo off

select '&&1' ssh_target from dual;
select '&&2' scp_filename from dual;

set feed on term on verify on
--disconnect
host ssh &&ssh_target 'cat &&scp_filename' | mrskew

The results of calling mrskew.sql  rather than tkprof.sql:

CALL-NAME                    DURATION       %  CALLS      MEAN       MIN       MAX
—————————  ——–  ——  —–  ——–  ——–  ——–
SQL*Net message from client  0.003733   74.1%      3  0.001244  0.001004  0.001663
log file sync                0.001300   25.8%      1  0.001300  0.001300  0.001300
SQL*Net message to client    0.000008    0.2%      3  0.000003  0.000002  0.000003
PARSE                        0.000000    0.0%      2  0.000000  0.000000  0.000000
FETCH                        0.000000    0.0%      2  0.000000  0.000000  0.000000
CLOSE                        0.000000    0.0%      2  0.000000  0.000000  0.000000
EXEC                         0.000000    0.0%      2  0.000000  0.000000  0.000000
—————————  ——–  ——  —–  ——–  ——–  ——–
TOTAL (7)                    0.005041  100.0%     15  0.000336  0.000000  0.001663

These scripts can all be found at https://github.com/jkstill/tracefile_identifier

If you have ideas about how to improve these, please feel free to clone the repo, make some changes and issue a pull request.

If you don’t know what all of that means, might I suggest this article?  Git for Beginners

The next time you have some tracing to do, why not give this method a try?  Doing so will save you time and make you more productive.

 

Categories: DBA Blogs

Pillars of PowerShell: SQL Server – Part 2

Fri, 2015-08-28 14:24
Introduction

This is the seventh and final post in the series on the Pillars of PowerShell. The previous posts in the series are:

  1. Interacting
  2. Commanding
  3. Debugging
  4. Profiling
  5. Windows OS
  6. SQL Server – Part 1

In this final post I am going to touch on SQL Server Management Objects (SMO) with PowerShell. SMO is one of the most widely used methods, and offers the most versatile way of working with SQL Server to me. It can be a bit tedious to work with being that you are going to be using raw .NET objects now instead of cmdlets, but offers so much more compared to SQLPS. In this post I am just going to touch on the basics of loading SMO, and how you can connect to an instance of SQL Server (or multiple). I am going to end it showing you a function I published a few years ago and use fairly frequently to this day.

Loading SMO

As with SQLPS, you have to load SMO into your PowerShell session before you can utilize it. SMO is what is referred to as an “assembly”, basically a collection of types and other objects that form a logical unit of functionality for interacting with various parts of SQL Server. SQL Server 2012 and above you can import the SQLPS module and it will automatically import the associated version of SMO. However, being that SQLPS is loading in more than just SMO it can take time for that to complete before your script will continue. In that regard, it can shave off some time by just loading SMO directly without all the overhead of the SQLPS module. You will commonly see the following line of code used to load SMO into your session:

[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO')

SMO_Load_1

Generally this command is going to load the highest version registered in the GAC on your machine. In the screenshot you may see the version is “13.0.0”, this is from SQL Server Management Studio preview (July 2015) that is installed on my machine. Now with PowerShell things change over time and using LoadWithPartialName is actually the version 1 method of loading SMO. This method is actually no longer supported, but still works for now. In PowerShell 2.0 a cmdlet was added to do this for you called, Add-Type. If you were to just type in Add-Type ‘Microsoft.SqlServer.Smo’ when you have multiple versions, your are going to get an error similar to this:

SMO_Load_2

In this situation you have to specify the assembly you want to load, so there is a bit more to doing this with SMO. You can load an assembly by specifying the file itself or by the assembly name along with 4 bits of information:

  1. Name
  2. Version
  3. Culture
  4. PublicKeyToken

To date, Microsoft always uses the same Culture and PublicKeyToken on almost all of their assemblies that come out of Redmond. So the only thing lacking is the version, which is going to be in the format of a 4-part version number, 0.0.0.0. If you have worked with SQL Server and you are familiar with the build numbers, you simply need to know that “10” is SQL Server 2008, “11” is SQL Server 2012, “12” is SQL Server 2014, and “13” is going to be SQL Server 2016. So, if I want to load the SQL Server 2012 SMO into my session I simply use this command:

Add-Type -AssemblyName "Microsoft.SqlServer.Smo, Version=11.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91"
The first connection…

To connect to a single instance of SQL Server with Windows Authentication you can use the following:

$srvObject = New-Object Microsoft.SqlServer.Management.Smo.Server "MyServer"

Once you hit enter, it will make a connection to your instance and then the variable $srvObject will contain properties and methods that you can use to manipulate the server-level objects of your instance. If you recall from the previous pillars in this series, this is where Get-Member comes in real handy for exploring. As an example let’s say you wanted to get similar information to what SELECT @@VERSION returns in T-SQL. You simply need to know the properties that hold this information and pipe the object to select:

 
$srvObject | select Product, VersionString, Edition, OSVersion 

In PowerShell it is good to start out with the mindset “if I write it for one server, might as well write it to handle multiple”. What I mean by this is you get to the point of developing a script into a tool. If I wanted to turn the above bit of code into something I can reuse, and run for one instance or 50 instances it just takes a bit of work and you are there before you know it:

function Get-SqlVersion {
 [cmdletbinding()]
 param (
 [string[]]$server
 )
 
 $allServers = @()
 $props = @{ServerName="";Product="";Version="";Edition="";OSVersion=""}
 foreach ($s in $server) {
 $srvObject = New-Object Microsoft.SqlServer.Management.Smo.Server $s

 $cserver = New-Object psobject -Property $props
 $cserver.ServerName = $s
 $cserver.Product = $srvObject.Product
 $cserver.Version = $srvObject.VersionString
 $cserver.Edition = $srvObject.Edition
 $cserver.OSversion = $srvObject.OSVersion
 $allServers += $cserver
 }
 
 $allServers
}

Now, don’t let this scare you as it may look more complicated than it seems. You could just put two lines inside the foreach loop that create your server object and then just select the properties, then you are done. It is best though when you start to write functions that the output of your function is an object. So that is the only additional step I take using New-Object psobject to create a PowerShell object with the properties ServerName, Product, Version, Edition, and OSVersion. In the event you expand on this function in the future, and wanted to pipe this output to another cmdlet or custom bit of code it will be in a more formal object type for you to work against.

Golden Nugget

One of the things I got annoyed with fairly quickly when troubleshooting an instance of SQL Server was having to search through the error log(s). You could be dealing with the default of 6 logs for an instance or up to 99 of them. Now there is some T-SQL code out there of people iterating through each log for you, but I just prefer to use PowerShell. I published this code on my personal blog back in December of 2014. You can find the write-up and code here: Search-SqlErrorLog. It will be good practice for you to try and understand it on your own, but I include help information just in case.

This is one of the few times I wrote a function that only works with one server at a time. You can do some one-liner tricks with the pipeline to easily call it for multiple servers:

"server1","server2" | foreach {Search-SqlErrorLog -server $_ -all -value "^backup"}

The output of this function provides the number of the log it was found in, the date, the process (if noted in the log), and the text found matching the value you provided (which can accept regex expressions, the “^” means the start of the string):

search_sqlerrorlog

The End

I hope you learned something new in this series on PowerShell, and good scripting to you all.

Categories: DBA Blogs

Migration of Oracle Database to Amazon RDS using Golden Gate

Fri, 2015-08-28 14:15

Amazon RDS is a web service used to manage databases, like Oracle, in the cloud. Small- and medium-sized enterprises with databases of normal load, volume, and SLA, can certainly leverage the ease and cost efficiency Amazon RDS offers.

There are two other methods that are widely used to migrate databases with minimal downtime: Oracle Data Guard and Oracle GoldenGate. AWS RDS doesn’t support Data Guard, but luckily it does support Oracle GoldenGate. There are some version constraints though.

The following steps are involved while migrating a database from on-premises to AWS RDS:

— Source database on premises
— Oracle GoldenGate Hub on EC2 instance
— Target database on AWS RDS

Now there could be different topologies for the above 3 components, but we are just using this topology for simplicity. For details on this topology, refer to this very fine and simple Appendix: Using Oracle GoldenGate with Amazon RDS.

Generally and roughly, the steps used to migrate databases from on-premises Oracle database to AWS RDS could be as follows:

— Create target database targetdb in AWS RDS with same parameters as that of the source database sourcedb.

— Create same tablespaces on targetdb in AWS RDS as they exist in source database sourcedb.

— Create same non default users on targetdb in AWS RDS as they exist in source database sourcedb.

— Create same non default roles on targetdb in AWS RDS as they exist in source database sourcedb and assign these roles to users on targetdb.

— Export data/objects from sourcedb database to specific SCN from non default schemas

— Import data/objects into targetdb database

— Configure GoldenGate extract process on sourcedb , for configuration see this

— Configure GoldenGate replicate processes on targetdb , for configuration see this

— Set up Oracle GoldenGate (GG) Hub on EC2 , for configuration see this

— Start GG extract process on sourcedb

— Start GG replicate process on targetdb starting after that SCN until it catch all changes generated on sourcedb database during exp/imp time.

— Then plan the cut-off time for applications to switch to new AWS RDS database after stopping replicat process at targetdb.

— Cleanup of sourcedb.

These are just the skeleton steps and need refining and proper planning. It’s always good to first thoroughly test such action plans. But as you can see, Oracle GoldenGate is a viable tool to migrate databases to the AWS RDS. Pythian has a full range of skills, experience, and capabilities to oversee such migrations as its our daily routine to use GoldenGate to do migrations. And yes, even if AWS RDS is a cloud service, you still need a DBA :)

Categories: DBA Blogs

Three Hidden Azure SQL Database Gotchas

Fri, 2015-08-28 13:35

Azure SQL Database is Microsoft’s Database as a Service (DBaaS) platform offering. It allows end users to leverage the power of SQL Server in the cloud without the expense and complexity of building a private infrastructure. Additionally, this offering simplifies database maintenance tasks while providing seamless high availability and disaster recovery capabilities.

Although DBaaS offerings are still crawling out their infancy, with the correct planning and use cases, implementing an Azure SQL Database solution can be a relatively straightforward process. However, as this platform continues to mature, you can expect to encounter some “Ghosts in the Machine”. Hopefully this post will allow you to avoid some of these unexpected behaviors.

  1. What’s in a name?

Azure SQL Servers all share the same public domain, database.windows.net and access is controlled through IP white-lists and user credentials. Until recently, Azure SQL Database dynamically allocated server names comprised of long random strings for security purposes and because each Azure server name must be unique globally. However, recently Microsoft provided the ability to allocate specific server names specified by the end user, i.e. MyServerName.database.windows.net.

This feature is a more than a welcome addition, particularly for organizations who wish to pre-configure connection strings for cloud implementations.

The hidden gotcha resides in the implementation of this feature. Once you create a server with a user defined name, the Azure cloud reserves that name for you within the Azure fabric. If for any reason you remove the server you will be unable to recreate the server using the same name for at least 5 days. When you attempt to recreate the server, you will receive the message “Specified server name is already used” as depicted below:

Bl1

Microsoft is aware of this limitation, however, at this time, the only way to correct the situation is to contact Microsoft Support and have them remove the Azure fabric metadata manually.

Additionally, it should be noted that you can only specify a specific Azure SQL Database Server name in the preview portal. This feature is not available in the standard portal or via the New-AzureSqlDatabaseServer Cmdlet in PowerShell.

2. You can change the performance tier at any time, unless you can’t.

One of the fantastic benefits of leveraging Azure SQL Database is the ability to switch service tiers at any time, without service disruption in order to leverage pay per minute costing efficiencies.

Unfortunately, another hidden gotcha may rear its ugly head during the switching process. Organizations that utilize BCP processes against an Azure SQL instance need to be wary when performing a service level switch. BCP operations often simply “Hang” when switching between service levels. The only resolution for this issue is to terminate the process and re-initiate once the tier switch has been completed.

3. I know you’re there, but I can’t see you.

Just like all could offerings, Azure SQL Database continues to mature and improve. However, you need to be prepared for some management inconsistencies. The preview portal is aptly named and although some functions are only available within the preview portal, you may need to frequently revert to the standard portal for a more consistent experience.

As an example, I have a client who switched databases between standard and premium tiers and vice versa. These databases no longer display in the preview portal at all. However, they do appear correctly in the standard portal as shown in the CIA level of redacted screen captures below.

BL_Combo2015-08-24_14-08-57

 

Categories: DBA Blogs

Trust and confidence from Pythian

Fri, 2015-08-28 13:25

Recently I “inherited” some new responsibilities at work. It’s not the first time during my 11 or so of the last 16 years at Pythian. Throughout my employ at Pythian, I have been continually given new titles based on new roles I have taken on. For me, besides the enjoyment I have been lucky to have at Pythian, this trust and confidence are two of the biggest contributors to one’s longevity with a company.

For Pythian and me, it all started one spring afternoon in about 1998. Paul and Steve had been doing the Pythian-thing for a year or more, and were looking for assistance getting “off-the-ground” so to speak. That endeavour was part of the reason for our new association and it’s been a magic carpet ride since. I did leave at one point for almost 6 years, but returned in early 2011. Between 1998 and 2011, the size of the company changed, but it was still the same old company.

I now manage the day-to-day operations of the consulting group and take pride in the work I do. Touché all you people out there in Pythian-land.

Categories: DBA Blogs

Creating an Oracle Database Cloud Service

Fri, 2015-08-28 13:10

Back in late June of 2015, Larry Ellison launched several public cloud services and one of those services was the public DBaaS. Today, I had the opportunity to try out this new service. This blog post will examine how to create it and how to connect it with sqlcli. As with any cloud service, it all happens in the background, saving you from doing tedious configuration steps to start using your service.

2015-08-21_1319

In my case, it took about 30 mins from when I clicked on create service to start using my database.

So the first thing that you have to do, obviously, is access the Oracle Cloud My Services application.  If you do not currently have access, speak with your sales rep or cloud administrator, but remember that this application is not free. Once you have access, click on the Oracle Database Cloud Service link and the following page will come up. Click on “Create Service” :

Once you have done that, we need to choose the type of service we will solicit and the billing frequency. As I have talked about in previous posts, it all depends on your business needs and abilities. The difference here between choosing a “Cloud Service” and a “Cloud Service – Virtual Image” is that in the first option, the database and the database instance are created for you, whereas in the “Virtual Image“, you will need to create it yourself, so choose carefully. One of the good things that comes with the first option is that the cloud patching option comes with it, but in the “Virtual Image“, you have to do this yourself.

As of the writing of this post, Oracle offers two database versions – 11.2.0.4 and 12.1.0.2. I chose the latter.

2015-08-18_1256

 

In the Edition section, we get to choose the type of service we will get when choosing the Cloud Software Edition. Unlike the previous one, here we will choose the bells and whistles that you will be licensed to use in this database. I won’t include the differences between the two here, but you can view them in cloud.oracle.com in the PaaS section, under Database. In my case, I just chose the regular Enterprise Edition :

In the details section, we can set the characteristics of the database service. It is important to select the “Compute Shape” correctly as this is critical to your usage billing. It is also good to know that one OCPU (Oracle CPU) is equivalent to a 3.0 GHz 2012 Intel Xeon with HyperThreading Enabled. Also you will have to add a Public SSH key to access your compute node. You can learn how here: how to create one. This is where you will also set the usable storage, your system or administrator password for the database, the name of the SID, the version (in this case, you are using version 12.1.0.2), the name of the PDB. Last, but not least, you will choose your backup destination. In my case, I just chose a local, but you can choose the Oracle Database Backup Service if you have one.

 

 

Last, but not least, you will get a confirmation of the service you are about to create. I didn’t copy this particular screenshot when I created it, but here is a similar one, so you get the gist.

 

Once you click on create, you can select the service and see the details of the creation process, as well as some others, like the Public IP, Port, etc.

Once the DB and VM are allocated, you need to go back to the Oracle Cloud My Services application  and go to the Oracle Compute Cloud Service console. This is to enable the security rule that will allow us to connect to port 1521 for this DB.

 

2015-08-21_1247

In the page that comes up, go to the Network section, and you will see a set of Security Rules, which you will find disabled.2015-08-21_1056

In my case, I enabled the “dbaas/test-orcl/db/ora_p2_dblistener” rule.

2015-08-21_1057

In this particular case – and I want to emphasize this – I am not concerned with security, so I also enabled the Security List for Inbound/Outbound Policy traffic.

2015-08-21_1106

 

Once I had done this, I am now ready to connect to my DB via sqlcli  like I would connect to any other DB:

Renes-iMac:bin Rene$ ./sql system@***.***.****.****:1521:ORCL

SQLcl: Release 4.2.0.15.177.0246 RC on Fri Aug 21 11:41:42 2015

Copyright (c) 1982, 2015, Oracle. All rights reserved.


Password? (**********?) ************
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Oracle Label Security option 

SQL> select name from v$database;


NAME 
---------
ORCL 

SQL> set lines 200 pages 9999


SQL> COLUMN PDB_NAME FORMAT A15

SQL> 
SQL> SELECT PDB_ID, PDB_NAME, STATUS FROM CDB_PDBS ORDER BY PDB_ID;


 PDB_ID PDB_NAME STATUS 
---------- --------------- ---------
 2 PDB$SEED NORMAL 
 3 PDB1 NORMAL 

SQL> alter session set container=PDB1;


Session altered.

Conclusion

As you can see, it is quite easy to request a database service and start using it. You will have to start building your case to use the public cloud, but once you do, you can see that using your database is no different from an on-premise to a cloud service.

Note– This was originally published on rene-ace.com

Categories: DBA Blogs

Log Buffer #438: A Carnival of the Vanities for DBAs

Fri, 2015-08-28 12:08

This Log Buffer Edition covers Oracle, MySQL, and SQL Server blog posts from the last week.

Oracle:

Integrating Telstra Public SMS API into Bluemix

Adaptive Query Optimization in Oracle 12c : Ongoing Updates

First flight into the Oracle Mobile Cloud Service

Oracle 12C Problem with datapatch. Part 2, the “fix”

oracle applications r12 auto start on linux

SQL Server:

Email Formatted HTML Table with T-SQL

SQL Server 2016 – Introduction to Stretch Database

Soundex – Experiments with SQLCLR Part 3

An Introduction to Real-Time Communication with SignalR

Strange Filtered Index Problem

MySQL:

Announcing Galera Cluster 5.5.42 and 5.6.25 with Galera 3.12

doing nothing on modern CPUs

Single-threaded linkbench performance for MySQL 5.7, 5.6, WebScale and MyRocks

Identifying Insecure Connections

MyOraDump, Oracle dump utility, version 1.2

Categories: DBA Blogs

Log Buffer #437: A Carnival of the Vanities for DBAs

Fri, 2015-08-28 12:07

This Log Buffer Edition goes out deep into the vistas of database world and brings out few of the good ones published during the week from Oracle, SQL Server, and MySQL.

Oracle:

Overriding Default Context-Sensitive Action Enablement

This is an alternative to if… then… else… elsif… end if when you want to use conditional statements in PL/SQL.

Achieving SAML interoperability with OAM OAuth Server

Release of BP02 for Oracle Identity Manager 11.1.2.3

IT Business Edge: Oracle Ties Mobile Security to Identity and Access Management

SQL Server:

How to render PDF documents using SQL CLR. Also a good introduction on creating SQL CLR functions.

What is DNX?

SQL Server Performance dashboard reports

Using Microsoft DiskSpd to Test Your Storage Subsystem

Connect to Salesforce Data as a Linked Server

MySQL:

Optimizing PXC Xtrabackup State Snapshot Transfer

Adding your own collation to MySQL

Monitoring your Amazon Aurora Databases using MONyog

How much could you benefit from MySQL 5.6 parallel replication?

MySQL checksum

The post Log Buffer #437: A Carnival of the Vanities for DBAs appeared first on Pythian - Data Experts Blog.

Categories: DBA Blogs

Are you ready to be a private cloud service provider?

Thu, 2015-08-20 20:35

When defining what a cloud service is, we need to know that it is not a technology per se, but its an architectural and operational paradigm. It is a self-service computing environment offering the ability to create, consume, and pay for services. In this architecture, computing resources are elastically supplied from a shared pool and charged based on metered use and it uses service catalogs to provide a menu of options and service levels.

According to the IDC  the “total cloud IT infrastructure spending (server, disk storage, and ethernet switch) will grow by 21% year over year to $32 billion in 2015, accounting for approximately 33% of all IT infrastructure spending, which will be up from about 28% in 2014. Private cloud IT infrastructure spending will grow by 16% year over year to $12 billion, while public cloud IT infrastructure spending will grow by 25% in 2015 to $21 billion.

Meaning that the growth for this architecture (Private,Public or Hybrid) will not stop for the foreseeable future, so we first need to understand what drives it and how to translate your current architecture into a 3rd platform architecture.

2015-08-19_1240 Source: Image from IDC 3rd Platform Study

The principles of a cloud architecture support the following necessary capabilities:

  • Resource pooling – Services can be adjusted to suit each client’s needs without any changes being apparent to the client or end user.
  • Rapid elasticity – The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand.
  • On-demand self-service – Provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider
  • Measured service – Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer
  • Broad network access – Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms
Business Drivers

Cloud will not be a true fit for everybody or for every case. We need to understand and determine the business drivers before we implement a cloud architecture.

  1. Increment our agility within our enterprise by providing:
    1. The ability to remove certain human procedures and have the end user be a self-service consumer
    2. A well-defined service catalog
    3. Capability to adapt to workload changes by provisioning or deprovisioning system resources
  2. Reduce enterprise costs by:
    1. Using shared system resources for our different applications and internal business divisions
    2. Being capable of determining the actual usage of system resources to show the benefit of our architecture
    3. Capable of automating mundane and routine tasks
  3. Reduce enterprise risks
    1. By having greater control of the resources we have and how they are being used
    2. Have more unified security across our business
    3. Providing different levels of high availability to our enterprise
Service Catalog

The most critical part when defining any type of service is defining what is it that we are going to provide. Take McDonalds for example. When we get to a counter, there is a well-defined catalog of what products we can consume in that establishment. It will be a certain type of hamburger and junk food. To define it more clearly, we can’t go into McDonalds and order a pizza or Italian food, as that is not in their business or service catalog.

When defining our business enterprise service catalog, we need to define the What, as to what type of service we want to provide, what service levels we want to provide, what policies we are going to apply to the service, and what our capabilities are to provide it.

The business service catalog will translate into a technical enterprise catalog, defining every detail of how we will provide our business services. Here we need to define the How. How are we going to deploy the service? How are we going to provide the service levels? How are we going to apply the business policies and how are we going to manage our services?

As mentioned, this is not a technology, but it is an architecture, and like any, we first must understand where we are to know where we are going. So we, in our current organization, first need to capture our existing assets, skills, and processes so that we can then validate the future state of our architecture. 2015-08-19_1312

Meter, Charge, and Optimize

Business consumers want to know what they are consuming and what it costs, even if they don’t actually want to pay for the service. Additionally, from an operational perspective, as different tenants start sharing the same piece of platform or infrastructure, there needs to be accountability on the usage, or else resources may be over-allocated. To mitigate this, we often meter the usage and optionally chargeback [or show back] the tenants. Though an IT organization may not actually charge back its LOBs, this provides a transparent mechanism to budget resources and optimize the cloud platform on an ongoing basis.

Conclusion

These are just a few points to be aware of if you want to become a private cloud provider, but this is also helpful for any cloud architecture, as we need to understand what drives the change, what it is we are going provide, and how we are going to deliver and measure the services that we are providing.

Note– This was originally published on rene-ace.com

The post Are you ready to be a private cloud service provider? appeared first on Pythian - Data Experts Blog.

Categories: DBA Blogs

Git for Beginners

Thu, 2015-08-20 20:04
git, simplified

Perhaps you’ve come across a great cache of publicly available SQL scripts that would be very useful in monitoring your databases, and these scripts are hosted on github.  Getting those scripts is as simple as clicking the Download button.

What if, however, you wish to contribute to the script library?

Or perhaps you would like to collaborate with coworkers on a project and want to host the files on github.

How do you get the files to your local server so that changes can be saved and pushed to the master repo?

Github is often the answer for that.

Some time ago github was probably considered by most IT folks as a tool for developers.  That has changed, as now git and github are popularly used to manage changes and allow collaboration on many kinds of projects that require file management.

If you are reading this blog, you are probably a DBA.  What better way to manage SQL scripts and allow others to contribute than with github?

Let’s simplify the use of git and make it usable for casual users. In other words, DBAs who want to access a SQL repo, and don’t want to relearn git every time, need to access the repo.

The methods shown here are not the same ones that would be used by a team of developers. Typically developers would create a fork of a project, clone that fork, modify files, and then issue pull requests to the main repo owner. There would also be branches to the development tree, merging, etc.

For this demo, there will still be a need to fork your own copy of the repo, but that is as far as it will go at this time.

Read more about creating a fork: https://help.github.com/articles/fork-a-repo/

In the spirit of keeping this simple, there will be no branching in this demo; I’ll only show the basics required to contribute to a project.

With simplicity as a goal, the following steps are to be performed in this demo:

  • Create a copy (fork) of the main repo in github
  • Clone the repo to a work environment (my linux workstation)
  • Add a file to the local repo
  • Commit the changes and push to my forked repo on github
  • Issue a ‘pull request’ asking the main repo admin to include my changes

So while it will be necessary to create a fork of the project, we won’t be dealing with branches off the mainline.

 Assumptions:

– you already have a github account

– git is installed on your laptop, server, whatever.

Git Repos

Two users will be used for this demo: jkstill and pytest.

The following repos will be used.

Main Repo: https://github.com/jkstill/git-demo

Developer’s (you) repo: https://github.com/pytest/git-demo

The Main Repo is public, so you can run this demo using your own account if you like.

Fork the Repo

The following steps were performed by the pytest user on github.

Login to https://github.com/ using a browser.

Navigate to https://github.com/jkstill/git-demo

Click on the ‘Fork’ icon and follow any instructions; this should only take a few seconds.

After forking this repo as pytest, my browser was now directed to https://github.com/pytest/git-demo

ssh key setup

This only needs to be done once.

The following examples are for github user pytest.

The pytest account will be used to demonstrate the concepts. Later I will explain more about ssh usage as it pertains to github, but for now this is probably sufficient.

create a new ssh key for use with github
   ssh-keygen -t rsa -N '' -f ~/.ssh/id_rsa_pytest_github -C 'github'
add key to github account

While logged in to your github account in a browser, find the account settings icon.

The icon for account settings is in upper right corner of browser window.

Navigate to the Add SSH Key section.

account settings -> SSH Keys -> Add SSH Key

The key added will be the public key. So in this case, the contents of ~/.ssh/id_rsa_pytest_github.pub would be pasted in the the text box that appears when the Add SSH Key button is pushed.

authenticate to github – the ‘git@github.com’ is required

Make sure to authenticate the key with github.

   ssh -i ~/.ssh/id_rsa_pytest_github -t git@github.com

Here is a successful example:

> ssh -i ~/.ssh/id_rsa_github -t git@github.com

Host key fingerprint is DE:AD:BE:EF:2b:00:2b:36:63:1b:56:4d:eb:df:a6:42

+--[ RSA 2048]----+
|        .        |
|       + .       |
|      . B .      |
|     o * +       |
|    Y * S        |
|   + O o . .     |
|    .   Z . o    |
|       . . t     |
|        . .      |
+-----------------+
PTY allocation request failed on channel 0
Hi pytest! You've successfully authenticated, but GitHub does not provide shell access.
Clone the REPO

Now you are ready to clone the newly forked repo to your workstation. At this point, it is assumed that git is already installed in your development environment. If git is not installed then you will need to install it.  There are many resources available whichever platform you are working on; installation will not be covered here.

The following command will clone your forked copy of the repo in the current directory:

> git clone https://github.com/pytest/git-demo
Cloning into 'git-demo'...
remote: Counting objects: 7, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 7 (delta 0), reused 7 (delta 0), pack-reused 0
Unpacking objects: 100% (7/7), done.
Checking connectivity... done

> cd git-demo
/home/jkstill/github/pytest/git-demo

> ls -la
total 20
drwxr-xr-x 3 jkstill dba 4096 Aug 18 15:45 .
drwxr-xr-x 4 jkstill dba 4096 Aug 18 15:45 ..
drwxr-xr-x 8 jkstill dba 4096 Aug 18 15:45 .git
-rw-r--r-- 1 jkstill dba  113 Aug 18 15:45 .gitignore
-rw-r--r-- 1 jkstill dba   47 Aug 18 15:45 README.md

Note: it is possible to use the ~/.ssh/config file to specify multiple ssh keys for use with git. This is useful when you may be using multiple accounts.

The command I used to do this operation is below as I do have multiple accounts:

  git clone git-as-pytest:pytest/git-demo

You can read more about this in a later section of this article.

Now cd to the new repo:  cd git-demo

There should be two files and a directory as seen in the previous example.

Modify or add a script

Now you can modify a script or add a new script and then commit to your local repo.

In this case, we will add a script fra_config.sql to the local repo.

-- fra_config.sql
-- show location and size of FRA

col fra_location format a30
col fra_size format a16

select fra_location, fra_size from (
   select name, value
   from v$parameter2
   where name like 'db_recovery_file_dest%'
)d
pivot ( max(value) for name in (
      'db_recovery_file_dest' as FRA_LOCATION,
      'db_recovery_file_dest_size' as FRA_SIZE
   )
)
/

Modified files can be seen with git status:

> git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       fra_config.sql
nothing added to commit but untracked files present (use "git add" to track)

Now add the file to the list of those that should be tracked and check the status again:

> git add fra_config.sql


> git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       new file:   fra_config.sql
#

As we are happy with the results, it is time to commit to the local repo:

> git commit -m 'Added the new file fra_config.sql'
[master 86eaf7c] Added the new file fra_config.sql
1 file changed, 18 insertions(+)
create mode 100644 fra_config.sql

> git status
# On branch master
# Your branch is ahead of 'origin/master' by 1 commit.
#   (use "git push" to publish your local commits)
#
nothing to commit, working directory clean

Shouldn’t we have put a date in that file? OK, a date and time was added, changes to the file displayed, the file was added to the list of those to commit, and the commit made:

> git diff fra_config.sql | cat
diff --git a/fra_config.sql b/fra_config.sql
index 03b98fd..37c58ac 100644
--- a/fra_config.sql
+++ b/fra_config.sql
@@ -1,6 +1,7 @@

-- fra_config.sql
-- show location and size of FRA
+-- jkstill 2015-08-18 16:03:00 PDT

col fra_location format a30
col fra_size format a16

> git add fra_config.sql

> git commit -m 'added timestamp'
[master 83afd35] added timestamp
1 file changed, 1 insertion(+)

> git status
# On branch master
# Your branch is ahead of 'origin/master' by 2 commits.
#   (use "git push" to publish your local commits)
#
nothing to commit, working directory clean

Committing can and should be done frequently, as the commit affects only the local repository.

This makes it possible to see (and retrieve) incremental changes to a file as you work on it.

Once you are satisfied with all changes, push the changes to the repo. Notice that git status knows that 2 commits have been performed locally that are not seen in the master repository.

Configure the Remote

Before pushing to the main repo, there is a little more configuration work to do. While this method is not strictly necessary, it does simplify the use of git.

You will need to edit the file ~/.ssh/config; create it if it does not already exist.

Here’s my example file where a host git-as-pytest has been created. This host will be used to connect to github.

GSSAPIAuthentication no
VisualHostKey=yes

Host git-as-pytest
  HostName github.com
  User git
  IdentityFile /home/jkstill/.ssh/id_rsa_pytest_github
  IdentitiesOnly yes

Now edit the file ./.git/config.  Find the line that remote “origin” and change the URL as seen in this example.

[core]
  repositoryformatversion = 0
  filemode = true
  bare = false
  logallrefupdates = true
[remote "origin"]
  #url = https://github.com/pytest/git-demo
  url = git-as-pytest:pytest/git-demo.git
  fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
  remote = origin
  merge = refs/heads/master

Now you should be able to push the changes to the master repo:

> git push origin master
Counting objects: 7, done.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 787 bytes | 0 bytes/s, done.
Total 6 (delta 2), reused 0 (delta 0)
To git-as-pytest:pytest/git-demo.git
788e5b1..83afd35  master -> master

The changes to your files can be seen in your repo on github.com

Issue a PULL request

Once you think the file or files are ready to be included in the master repository, you will issue a pull request to the admin of the master repo.

The repo admin can then pull the changes and examine them. Once it has been determined that the changes can be made to the master repo, the admin will push the changes.

Issuing the pull request

View the repo in your browser, press the ‘pull request’ icon and follow the instructions. This action will cause an email to be sent to the repo admin with URL to view the pull request.   The admin can then examine and test the changes, and merge the pull request (if appropriate) into the mainline.

If the pull request results in your changes being merged, github will send you an email.

After the Pull request has been merged

Now other users can get the updates with the following commands

  git pull
  git status
  git commit

These commands will merge the repo from github with this one.

As there is the possibility of overwriting files you are working on, be sure this is the right thing to do.

Now that you have the basics, you can get started.

Please feel free to use  the https://github.com/jkstill/git-demo repo to follow along with the steps shown here.

The post Git for Beginners appeared first on Pythian - Data Experts Blog.

Categories: DBA Blogs

Difference Between Oracle’s Table and Mongo’s Collection

Thu, 2015-08-20 11:44

Roughly speaking, the notion of ‘Tables’ in Oracle is similar to MongoDB’s ‘Collections’. They are NOT identical though. Before we examine the differences between Oracle’s Table and MongoDB’s Collection, let’s see what Table in Oracle and Collection in MongoDB are.

Table in Oracle:

A table in Oracle is made up of a fixed number of columns for any number of rows. Every row in a table has the same columns.

Collection in MongoDB:

A collection in MongoDB is made up of documents. The concept of Documents is similar to rows in a table, but it’s not identical. A document can have its own unique set of columns. In MongoDB, columns are called fields.

So in MongoDB, fields are defined at the document level (or we can say in Oracle lingo that columns are defined at the row level), whereas in Oracle the columns are defined at the table level.

That is actually the main difference between Oracle’s Table and Mongo’s collection among other subtle differences such as collections are schema-less, whereas Table in Oracle has to be in some schema.

Example of an Oracle table:

EMP

EMPID    NAME    CITY
1                Smith    Karachi
2               Adam    Lahore
3               Jim        Wah Cantt
4               Ken         Quetta

CREATE TABLE EMP (EMPID  NUMBER(5),NAME VARCHAR2(20),CITY VARCHAR2(25));

INSERT INTO EMP VALUES (1,’SMITH’,’KARACHI’);
INSERT INTO EMP VALUES (2,’ADAM’,’LAHORE’);
INSERT INTO EMP VALUES (3,’JIM’,’WAH CANTT’);
INSERT INTO EMP VALUES (4,’KEN’,’KARACHI’);

Select * from EMP;

In the above example, the table is ‘EMP’, with 4 rows. All 4 rows have a fixed number of columns EMPID, NAME, and CITY.

Example of a MongoDB Collection:

db.EMP.insert({EMPID: ‘1’,NAME: ‘Smith’, CITY: ‘Karachi’})
db.EMP.insert({EMPID: ‘2’,NAME: ‘Adam’, CITY: ‘Wah Cantt’, Designation: ‘CTO’})
db.EMP.insert({EMPID: ‘3,NAME: ‘Jim’, Designation: ‘Technician’})
db.EMP.insert({EMPID: ‘4’,NAME: ‘Ken’})

> db.EMP.find()

{ “_id” : ObjectId(“55d44757283d7d463aec4cc1”), “EMPID” : “1”, “NAME” : “Smith”, “CITY” : “Karachi” }
{ “_id” : ObjectId(“55d44757283d7d463aec4cc2”), “EMPID” : “2”, “NAME” : “Adam”, “CITY” : “Wah Cantt”, “Designation” : “CTO” }
{ “_id” : ObjectId(“55d44757283d7d463aec4cc3”), “EMPID” : “3”, “NAME” : “Jim”, “Designation” : “Technician” }
{ “_id” : ObjectId(“55d44757283d7d463aec4cc4”), “EMPID” : “4”, “NAME” : “Ken” }

In the above example, first we inserted 4 documents into collection ‘EMP’. Notice that all 4 documents have different number of columns. db.EMP.find() command is to display these documents.

Hope that helps……

The post Difference Between Oracle’s Table and Mongo’s Collection appeared first on Pythian - Data Experts Blog.

Categories: DBA Blogs

Log Buffer #436: A Carnival of the Vanities for DBAs

Fri, 2015-08-14 08:00

This Log Buffer Edition covers the top blog posts of the week from the Oracle, SQL Server and MySQL arenas.

Oracle:

  • Momentum and activity regarding the Data Act is gathering steam, and off to a great start too. The Data Act directs the Office of Management and Budget (OMB) and the Department of the Treasury (Treasury) to establish government-wide financial reporting data standards by May 2015.
  • RMS has a number of async queues for processing new item location, store add, warehouse add, item and po induction. We have seen rows stuck in the queues and needed to release the stuck AQ Jobs.
  • We have a number of updates to partitioned tables that are run from within pl/sql blocks which have either an execute immediate ‘alter session enable parallel dml’ or execute immediate ‘alter session force parallel dml’ in the same pl/sql block. It appears that the alter session is not having any effect as we are ending up with non-parallel plans.
  • Commerce Cloud, a new flexible and scalable SaaS solution built for the Oracle Public Cloud, adds a key new piece to the rich Oracle Customer Experience (CX) applications portfolio. Built with the latest commerce technology, Oracle Commerce Cloud is designed to ignite business innovation and rapid growth, while simplifying IT management and reducing costs.
  • Have you used R12: Master Data Fix Diagnostic to Validate Data Related to Purchase Orders and Requisitions?

SQL Server:

  • SQL Server 2016 Community Technology Preview 2.2 is available
  • What is Database Lifecycle Management (DLM)?
  • SSIS Catalog – Path to backup file could not be determined
  • SQL SERVER – Unable to Bring SQL Cluster Resource Online – Online Pending and then Failed
  • Snapshot Isolation Level and Concurrent Modification Collisions – On Disk and In Memory OLTP

MySQL:

  • A Better Approach to all MySQL Regression, Stress & Feature Testing: Random Coverage Testing & SQL Interleaving.
  • What is MySQL Package Verification? Package verification (Pkgver for short) refers to black box testing of MySQL packages across all supported platforms and across different MySQL versions. In Pkgver, packages are tested in order to ensure that the basic user experience is as it should be, focusing on installation, initial startup and rudimentary functionality.
  • With the rise of agile development methodologies, more and more systems and applications are built in series of iterations. This is true for the database schema as well, as it has to evolve together with the application. Unfortunately, schema changes and databases do not play well together.
  • MySQL replication is a process that allows you to easily maintain multiple copies of MySQL data by having them copied automatically from a master to a slave database.
  • In Case You Missed It – Breaking Databases – Keeping your Ruby on Rails ORM under Control.

The post Log Buffer #436: A Carnival of the Vanities for DBAs appeared first on Pythian - Data Experts Blog.

Categories: DBA Blogs

Thoughts on Google Cloud Dataflow

Thu, 2015-08-13 15:20

Google Cloud Dataflow is a data processing tool developed by Google that runs in the cloud. Dataflow is an easy to use, flexible tool that delivers completely automated scaling. It is deeply tied to the Google cloud infrastructure, making it a very powerful for projects running in Google Cloud.

Dataflow is an attractive resource management and job monitoring tool because it automatically manages all of the Google Cloud resources, including creating and tearing down  Google Compute Engine resources, communicating with Google Cloud Storage, working with Google Cloud Pub/Sub, aggregating logs, etc.

Cloud Dataflow has the following major components:

SDK – The Dataflow SDK provides a programming mode that simplifies/abstracts out the processing of large amounts of data. Dataflow only provides a Java SDK at the moment, which is a barrier for non-Java programmers. More on the programming model later.

Google Cloud Platform Managed Services – This is one of my favourite features in Dataflow. Dataflow manages and ties together components, such as Google Compute Engine, spins up and tears down VMs, manages BigQuery, aggregates logs, etc.

These two components can be used together to create jobs.

Being programmatic, Dataflow is extremely flexible. It works well for both batch and streaming jobs. Dataflow excels at high-volume computations and provides a unified programming model, which is very efficient and rather simple considering how powerful it is.

The Dataflow programming model simplifies the mechanics of large-scale data processing and abstracts out a lot of the lower level tasks, such as cluster management, adding more nodes, etc. It lets you focus on the logical aspect of your pipeline and not worry about how the job will run.

The Dataflow pipeline consists of four major abstractions:

  • Pipelines – A pipeline represents a complete process on a dataset or datasets. The data could be brought in from external data sources. It could then have a series of transformation operations, such as filter, joins, aggregation, etc., applied to the data to give it meaning and to achieve its desired form. This data could be then written to a sink. The sink could be within the Google Cloud platform or external. The sink could even be the same as the data source.
  • PCollections – PCollections are datasets in the pipeline. PCollections could represent datasets of any size. These datasets could be bounded (fixed size – such as national census data) or unbounded (such as a Twitter feed or data from weather sensors). PCollections are the input and output of every transform operation.
  • Transforms – Transforms are the data processing steps in the pipeline. Transforms take one or more PCollections, apply some transform operations to those collections, and then output to a PCollection.
  • I/O Sinks and Sources – The Source and Sink APIs provide functions to read data into and out of collections. The sources act as the roots of the pipeline and the sinks are the endpoints of the pipeline. Dataflow has a set of built in sinks/sources, but it is also possible to write sinks sources for custom data sources.

Dataflow is also planning to add integration for Apache Flink and Apache Spark. Adding Spark and Flink integration would be a huge feature since it would open up the possibilities to use MLlib, Spark SQL, and Flink machine-learning capabilities.

One of the use cases we explored was to create a pipeline that ingests streaming data from several POS systems using Dataflow’s streaming APIs. This data can be then joined with customer profile data that is ingested incrementally on a daily basis from a relational database. We can then run some filtering and aggregation operations on this data. Using the sink for BigQuery, we can insert the data into BigQuery and then run queries. What makes this so attractive is that in this whole process of ingesting vast amounts of streaming data, there was no need to set up clusters or networks or install software, etc. We stayed focused on the data processing and the logic that went into it.

To summarize, Dataflow is the only data processing tool that completely manages the lower level infrastructure. This removes several API calls for monitoring the load and spinning up and tearing down VMs, aggregating logs, etc., and lets you focus on the logic of the task at hand.  The abstractions are very easy to understand and work with and the Dataflow API also provides a good set of built in transform operations for tasks such as filtering, joining, grouping, and aggregation. Dataflow integrates really well with all components in the Google Cloud Platform, however, Dataflow does not have SDKs in any language besides Java, which is somewhat restrictive.

The post Thoughts on Google Cloud Dataflow appeared first on Pythian - Data Experts Blog.

Categories: DBA Blogs