Skip navigation.

Pythian Group

Syndicate content
Official Pythian Blog - Love Your Data
Updated: 4 hours 57 min ago

Everyone Should Have Firefighter Mentality

Wed, 2014-05-21 07:43

Australia, being the most arid country in the world is known for its bushfires. The part of Australia where I live in doesn’t have many bushfires, but we do have several building fires, like any other part of the world.

Firefighter is one occupation which many kids want to be when they grow up. Kids are inspired by the shiny, clean, smart outwardly appearances of the firefighter crews passing majestically with sirens ringing in their immaculately clean trucks.

While volunteering for one of the fire stations in my suburb on a quiet day, I found those firefighters doing nothing but polishing their trucks, cleaning their stuff, making sure everything is in optimal order, and waiting for the emergency on their toes.

No matter what field you are in, what profession you are following; No matter on which step of corporate ladder you are; If you are a full time employee, a contractor, or a business owner, it is a fact that there are ample quiet times. Small chunks of hushed, still periods during each workday.

Those still moments are the “firefighter” time. Don’t let that time go to waste. Clean your desk and your tools. Polish your skills, and think about yourself and your career. Identify your areas for improvement, and grow your network. Read, write, or help out others.

In other words, get ready for that fire.

Categories: DBA Blogs

Log Buffer #371, A Carnival of the Vanities for DBAs

Fri, 2014-05-16 07:56

Fueled by massive growth of data and propelled by mammoth future potential of its application, bloggers across the database technologies are finding new things to explore in the light of previous knowledge. This Log Buffer Edition covers that all.


To configure an instrumentation watch, you first need to know what instrumentation is, and how to instrument applications or servers.

Why Choose to Deploy Agile PLM in the Cloud?

One of the things that makes JSF different from other approaches to server-side Java web development is that it offers a much higher level of abstraction and transparently maintains UI state.

Step by step instructions for setting up a development environment for doing development with Hippo CMS.

Oracle Solaris 11.2 at the Atlanta OpenStack Summit

SQL Server:

RAID and Its Impact on your SQL Performance.

Microsoft Azure Diagnostics Part 1: Introduction

Using Encrypted Backups in SQL Server 2014

A new plug in for Management Studio from Red Gate is free. It will give you access to all the scripts at SQLServerCentral, including your own briefcase.

Validate File Properties with PowerShell Prior to Import into SQL Server


Benchmark: SimpleHTTPServer vs pyclustercheck (twisted implementation)

Archival and Analytics – Importing MySQL data into Hadoop Cluster using Sqoop

Cross your Fingers for Tech14, see you at OSCON

New Tungsten Replicator 2.2.1 now available

MySQL May Newsletter is Available!

Categories: DBA Blogs

Oracle Passwords: How to Use Punctuation Symbols

Wed, 2014-05-14 07:04

You can’t make a password strong enough. But at least you can try. Having at least one upper case character, one lower case character, one number, one punctuation mark or symbol, and greater than 8 characters, you can have a password which can be considered something decently safe, probably.

In Oracle, if you embed punctuation marks within your passwords, you have to use single quote around that password if you are using orapwd utility from command line. If you are altering the password from the sqlplus utility, you need to use the double quotes around the password.

Example of both is given below, as sys password is being changed:

From orapwd:

$ORACLE_HOME/dbs $ orapwd file=orapwtest password=”My1s!pass” entries=40  force=y
-bash: !pass”: event not found

$ORACLE_HOME/dbs $ orapwd file=orapwtest password=’My1s!pass’ entries=40  force=y

Changed successfully.

From sqlplus:

SQL> alter user sys identified by ‘My1s!pass’;
alter user sys identified by ‘My1s!pass
ERROR at line 1:
ORA-00988: missing or invalid password(s)

SQL> alter user sys identified by “My1s!pass”;

User altered.


Categories: DBA Blogs

The Hadoop and the Hare

Mon, 2014-05-12 07:38

I speak to a lot of people who are terribly concerned with “real-time”. Can data get into the warehouse in real-time? Can we record everything the user does on the site in real-time? Real-time is a magic phrase, because it really means “so fast that nobody cares”. In a business sense, it usually means that nobody has really thought about the latency requirements, so they’re going to push for the lowest latency possible. This is a dangerous habit, and you need to fight against it at every turn. Some of the best, cheapest, most scalable solutions will only be available when you accept that you can wait for an hour (or even hours!) to see some results.

Case in point, a client asking for a NoSQL store to hold user event data. It would have to handle millions of concurrent users, all generating dozens of events within minutes. They were (rightly) concerned about write performance and scaling this NoSQL store horizontally to handle the volume of writes. When asked why they needed the events, they didn’t know: they wanted to do exploratory analytics and see if there were interesting metrics to be mined. I proposed dumping the events on a queue – Kafka, Kinesis – or just writing logs for Flume to pick up. Rather than asking a data store to make all of these messages available for querying immediately, do things asynchronously, run some batch jobs to process data in Hadoop, and see if interesting data comes out. The Hadoop cluster and the ingestion pipeline is an order of magnitude less expensive and risky than the cool, real-time solution which confers no immediate benefits.

Maybe this client will decide that batch analytics are good enough, and they’ll stick with using Hive or Impala for querying. Maybe they discover a useful metric, or a machine learning technique they want to feed back into their user-facing application. Many specific metrics can be stored as simple counters per user, in a store like Redis, once they’ve been identified as valuable. Machine learning techniques are interesting because (besides cold start situations) they should be stable: the output shouldn’t change dramatically for small amounts of new data, so new information only needs to be considered periodically. In both cases, the requirements should be narrowed down as far as possible before deciding to invest in “real-time”.

In another case, a client presented the classic data warehousing problem: store large volumes of events with fine grain, and slice and dice them in every imaginable way. Once again, the concept of reducing latency came up. How quickly can we get events into the warehouse so people can start querying them? The answer is that we can make anything happen, but it will be needlessly expensive and malperformant. The main goal of the project was to provide historical reporting: there was no indication that clients would want to do the same kind of pivots and filtering on data only from the past minute. The low latency application would be cool and interesting to develop, but it would be much simpler to find which reports  users want to be low-latency, then precompute them and store them in a more appropriate way.

A more appropriate way is Dynamo, Cassandra, or your preferred NoSQL key-value store. Precompute aggregates you know the user wants with very low latency, and store them keyed on time with a very fine grain: you have the benefit of high write throughput here, but at the cost of little query complexity. Once the data is no longer interesting – it’s visible in the warehouse with much more grain along the various dimensions – then drop it from NoSQL.

Starting with a relatively slow, batch platform gives very high flexibility at a low cost, and with little development effort. Once your users – internal or clients – have explored the data set and started complaining about latency, then is the time to build specialized pipelines for those specific use cases. Since Kafka, Kinesis and Flume all support streaming computations as well as batch, these will serve as good connection points for your “real-time” pipeline.

Categories: DBA Blogs