Skip navigation.

Pythian Group

Syndicate content
Official Pythian Blog - Love Your Data
Updated: 16 hours 9 min ago

EM12c : Login to GUI with the correct password causes authentication failure

Thu, 2015-05-21 16:47

So the other day I was trying to log in to my EM12c R4 environment with the SSA_ADMINISTRATOR user, and I got the error:

“Authentication failed. If problem persists, contact your system administrator”

I was quite sure that the password that I had was correct, so I tried with the SYSMAN user and had the same error. I still wanted to verify that I had the correct password , so I tried with the SYSMAN user to log in to the repository database, and was successful, so I know something was wrong there.


SQL> connect sysman/
Enter password:
Connected.

So I went to the<gc_inst>/em/EMGC_OMS1/sysman/log/emoms.log and saw the following error


2015-05-18 21:22:06,103 [[ACTIVE] ExecuteThread: '15' for queue: 'weblogic.kernel.Default (self-tuning)'] ERROR audit.AuditManager auditLog.368 - Could not Log audit data, Error:java.sql.SQLException: ORA-14400: inserted partition key does not map to any partition
ORA-06512: at &quot;SYSMAN.MGMT_AUDIT&quot;, line 492
ORA-06512: at &quot;SYSMAN.MGMT_AUDIT&quot;, line 406
ORA-06512: at line 1

Which led me to believe that the JOB_QUEUE_PROCESSES was set to 0, but it wasn’t the case, since it was set to 50. Though, this is actually an incorrect limit, so I bumped it up to 1000 and tried to rerun the EM12c repository DBMS Scheduler jobs as per the documentation in 1498456.1:


SQL&gt; show parameter JOB_QUEUE_PROCESSES

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes integer 50
SQL&gt; alter system set JOB_QUEUE_PROCESSES=1000 scope = both;

System altered.

SQL&gt; show parameter both
SQL&gt; show parameter job

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes integer 1000
SQL&gt; connect / as sysdba
Connected.
SQL&gt; alter system set job_queue_processes = 0;

System altered.

SQL&gt; connect sysman/alyarog1605
Connected.
SQL&gt; exec emd_maintenance.remove_em_dbms_jobs;

PL/SQL procedure successfully completed.

SQL&gt; exec gc_interval_partition_mgr.partition_maintenance;

PL/SQL procedure successfully completed.

SQL&gt; @$OMS_HOME/sysman/admin/emdrep/sql/core/latest/admin/admin_recompile_invalid.sql SYSMAN
old 11: AND owner = upper('&amp;RECOMPILE_REPOS_USER')
new 11: AND owner = upper('SYSMAN')
old 26: dbms_utility.compile_schema(upper('&amp;RECOMPILE_REPOS_USER'),FALSE);
new 26: dbms_utility.compile_schema(upper('SYSMAN'),FALSE);
old 41: WHERE owner = upper('&amp;RECOMPILE_REPOS_USER')
new 41: WHERE owner = upper('SYSMAN')
old 84: AND owner = upper('&amp;RECOMPILE_REPOS_USER')
new 84: AND owner = upper('SYSMAN')
old 104: AND ds.table_owner = upper('&amp;RECOMPILE_REPOS_USER')
new 104: AND ds.table_owner = upper('SYSMAN')

PL/SQL procedure successfully completed.
PL/SQL procedure successfully completed.

SQL&gt; connect / as sysdba
Connected.
SQL&gt; alter system set job_queue_processes = 1000;

System altered.

SQL&gt; connect sysman/
Enter password:
Connected.
SQL&gt; exec emd_maintenance.submit_em_dbms_jobs;

PL/SQL procedure successfully completed.

SQL&gt; commit;

Commit complete.

After this I bounced the OMS, but still kept getting the same error. And though it fixed the scheduler jobs, I was now seeing the following error in the emoms.log:


2015-05-18 22:29:09,573 [[ACTIVE] ExecuteThread: '15' for queue: 'weblogic.kernel.Default (self-tuning)'] WARN auth.EMRepLoginFilter doFilter.450 - InvalidEMUserException caught in EMRepLoginFilter: Failed to login using repository authentication for user: SSA_ADMIN
oracle.sysman.emSDK.sec.auth.InvalidEMUserException: Failed to login using repository authentication for user: SSA_ADMIN

So what I did was an update to the SYSMAN.MGMT_AUDIT_MASTER table and ran the procedure MGMT_AUDIT_ADMIN.ADD_AUDIT_PARTITION as was stated in document id 1493151.1:


oracle $ sqlplus

&amp;nbsp;

Enter user-name: sysman
Enter password:

SQL&gt; update mgmt_audit_master set prepopulate_days=5 where prepopulate_days is null;

1 rows updated.

SQL&gt; select count(1) from mgmt_audit_master where prepopulate_days is null;

COUNT(1)
----------
0

SQL&gt; exec mgmt_audit_admin.add_audit_partition;

PL/SQL procedure successfully completed.

SQL&gt; commit;

Commit complete.

Once I did this, I was able to login with all my EM12c administrators without any issues:


oracle@em12cr4.localdomain [emrep] /home/oracle
oracle $ emcli login -username=ssa_admin
Enter password

Login successful

Conclusion

Even though the JOB_QUEUE_PROCESSES were not set to 0, it was the cause that it was failing, as it was a low value for this parameter. Thus, be careful when setting up this parameter, be sure to follow the latest installation guidelines.

Note– This was originally published on rene-ace.com

Categories: DBA Blogs

Log Buffer #423: A Carnival of the Vanities for DBAs

Wed, 2015-05-20 16:36

This Log Buffer edition covers Oracle, SQL Server and MySQL blog posts from all over the blogosphere!


Oracle:

Hey DBAs:  You know you can  install and run Oracle Database 12c on different platforms, but if you install it on an Oracle Solaris 11 zone, you can take additional advantages.

Here is a video with Oracle VP of Global Hardware Systems Harish Venkat talking with Aaron De Los Reyes, Deputy Director at Cognizant about his company’s explosive growth & how they managed business functions, applications, and supporting infrastructure for success.

Oracle Unified Directory is an all-in-one directory solution with storage, proxy, synchronization and virtualization capabilities. While unifying the approach, it provides all the services required for high-performance enterprise and carrier-grade environments. Oracle Unified Directory ensures scalability to billions of entries. It is designed for ease of installation, elastic deployments, enterprise manageability, and effective monitoring.

Understanding Flash: Summary – NAND Flash Is A Royal Pain In The …

Extracting Oracle data & Generating JSON data file using ROracle.

SQL Server:

It is no good doing some or most of the aspects of SQL Server security right. You have to get them all right, because any effective penetration of your security is likely to spell disaster. If you fail in any of the ways that Robert Sheldon lists and describes, then you can’t assume that your data is secure, and things are likely to go horribly wrong.

How does a column store index compare to a (traditional )row store index with regards to performance?

Learn how to use the TOP clause in conjunction with the UPDATE, INSERT and DELETE statements.

Did you know that scalar-valued, user-defined functions can be used in DEFAULT/CHECK CONSTRAINTs and computed columns?

Tim Smith blogs as how to measure a behavioral streak with SQL Server, an important skill for determining ROI and extrapolating trends.

Pilip Horan lets us know as How to run SSIS Project as a SQL Job.

MySQL:

Encryption is important component of secure environments. While being intangible, property security doesn’t get enough attention when it comes to describing various systems. “Encryption support” is often the most of details what you can get asking how secure the system is. Other important details are often omitted, but the devil in details as we know. In this post I will describe how we secure backup copies in TwinDB.

The fsfreeze command, is used to suspend and resume access to a file system. This allows consistent snapshots to be taken of the filesystem. fsfreeze supports Ext3/4, ReiserFS, JFS and XFS.

Shinguz: Controlling worldwide manufacturing plants with MySQL.

MySQL 5.7.7 was recently released (it is the latest MySQL 5.7, and is the first “RC” or “Release Candidate” release of 5.7), and is available for download

Upgrading Directly From MySQL 5.0 to 5.6 With mysqldump.

One of the cool new features in 5.7 Release Candidate is Multi Source Replication.

 

Learn more about Pythian’s expertise in Oracle , SQL Server and MySQL.

Categories: DBA Blogs

fsfreeze in Linux

Thu, 2015-05-14 10:17

The fsfreeze command, is used to suspend and resume access to a file system. This allows consistent snapshots to be taken of the filesystem. fsfreeze supports Ext3/4, ReiserFS, JFS and XFS.

A filesystem can be frozen using following command:

# /sbin/fsfreeze -f /data

Now if you are writing to this filesystem, the process/command will be stuck. For example, following command will be stuck in D (UNINTERUPTEBLE_SLEEP) state:

# echo “testing” > /data/file

Only after the filesystem is unfreezed using the following command, can it continue:

# /sbin/fsfreeze -u /data

As per the fsfreeze main page, “fsfreeze is unnecessary for device-mapper devices. The device-mapper (and LVM) automatically freezes filesystem on the device when a snapshot creation is requested.”

fsfreeze is provided by the util-linux package in RHEL systems. Along with userspace support, fsfreeze also requires kernel support.

For example, in the following case, fsfreeze was used in the ext4 filesystem of an AWS CentOS node:

# fsfreeze -f /mysql
fsfreeze: /mysql: freeze failed: Operation not supported

From strace we found that ioctl is returning EOPNOTSUPP:

fstat(3, {st_dev=makedev(253, 0), st_ino=2, st_mode=S_IFDIR|0755,
st_nlink=4, st_uid=3076, st_gid=1119, st_blksize=4096, st_blocks=8,
st_size=4096, st_atime=2014/05/20-10:58:56,
st_mtime=2014/11/17-01:39:36, st_ctime=2014/11/17-01:39:36}) = 0
ioctl(3, 0xc0045877, 0) = -1 EOPNOTSUPP (Operation not
supported)

From latest upstream kernel source:

static int ioctl_fsfreeze(struct file *filp)
{
struct super_block *sb = file_inode(filp)->i_sb;if (!capable(CAP_SYS_ADMIN))
return -EPERM;

/* If filesystem doesn’t support freeze feature, return. */
if (sb->s_op->freeze_fs == NULL)
return -EOPNOTSUPP;

/* Freeze */
return freeze_super(sb);
}

EOPNOTSUPP is returned when a filesystem does not support the feature.

On testing to freeze ext4 in CentOs with AWS community AMI, fsfreeze worked fine.

This means that the issue was specific to the kernel of the system. It was found that AMI used to build the system was having a customized kernel without fsfreeze support.

Categories: DBA Blogs

Ingest a Single Table from Microsoft SQL Server Data into Hadoop

Wed, 2015-05-13 15:13
Introduction

This blog describes the best-practice approach in regards to the data ingestion from SQL Server into Hadoop. The case scenario is described as under:

  • Single table ingestion (no joins)
  • No partitioning
  • Complete data ingestion (trash old and replace new)
  • Data stored in Parquet format
Pre-requisites

This example has been tested using the following versions:

  • Hadoop 2.5.0-cdh5.3.0
  • Hive 0.13.1-cdh5.3.0
  • Sqoop 1.4.5-cdh5.3.0
  • Oozie client build version: 4.0.0-cdh5.3.0
Process Flow Diagram process_flow1 Configuration
  • Create the following directory/file structure (one per data ingestion process). For a new ingestion program please adjust the directory/file names as per requirements. Make sure to replace the
    tag with your table name
<table_name>_ingest + hive-<table_name> create-schema.hql + oozie-properties <table_name>.properties + oozie-<table_name>-ingest + lib kite-data-core.jar
kite-data-mapreduce.jar
sqljdbc4.jar coordinator.xml
impala_metadata.sh
workflow.xml
  • The ingestion process is invoked using an oozie workflow. The workflow invokes all steps necessary for data ingestion including pre-processing, ingestion using sqoop and post-processing.
oozie-<table_name>-ingest
This directory stores all files that are required by the oozie workflow engine. These files should be stored in HDFS for proper functioning of oozie oozie-properties
This directory stores the <table_name>.properties. This file stores the oozie variables such as database users, name node details etc. used by the oozie process at runtime. hive-<table_name>
This directory stores a file called create-schema.hql  which contains the schema definition of the HIVE tables. This file is required to be run in HIVE only once.
  • Configure files under oozie-<table_name>-ingest
1.   Download kite-data-core.jar and kite-data-mapreduce.jar files from http://mvnrepository.com/artifact/org.kitesdk
2.  Download sqljdbc4.jar from https://msdn.microsoft.com/en-us/sqlserver/aa937724.aspx 3.  Configure coordinator.xml. Copy and paste the following XML. <coordinator-app name=”<table_name>-ingest-coordinator” frequency=”${freq}” start=”${startTime}” end=”${endTime}” timezone=”UTC” xmlns=”uri:oozie:coordinator:0.2″>
<action>
<workflow>
<app-path>${workflowRoot}/workflow.xml</app-path>
<configuration>
<property>
<name>partition_name</name>
<value>${coord:formatTime(coord:nominalTime(), ‘YYYY-MM-dd’)}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>

4.  Configure workflow.xml. This workflow has three actions:

a) mv-data-to-old – Deletes old data before refreshing new
b) sqoop-ingest-<table_name> – Sqoop action to fetch table from SQL Server
c) invalidate-impala-metadata – Revalidate Impala data after each refresh Copy and paste the following XML. <workflow-app name=”<table_name>-ingest” xmlns=”uri:oozie:workflow:0.2″><start to=”mv-data-to-old” /><action name=”mv-data-to-old”>
<fs>
<delete path=’${sqoop_directory}/<table_name>/*.parquet’ />
<delete path=’${sqoop_directory}/<table_name>/.metadata’ />
</fs><ok to=”sqoop-ingest-<table_name>”/>
<error to=”kill”/>
</action><action name=”sqoop-ingest-<table_name>”>
<sqoop xmlns=”uri:oozie:sqoop-action:0.3″>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path=”${nameNode}/user/${wf:user()}/_sqoop/*” />
</prepare><configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration><arg>import</arg>
<arg>–connect</arg>
<arg>${db_string}</arg>
<arg>–table</arg>
<arg>${db_table}</arg>
<arg>–columns</arg>
<arg>${db_columns}</arg>
<arg>–username</arg>
<arg>${db_username}</arg>
<arg>–password</arg>
<arg>${db_password}</arg>
<arg>–split-by</arg>
<arg>${db_table_pk}</arg>
<arg>–target-dir</arg>
<arg>${sqoop_directory}/<table_name></arg>
<arg>–as-parquetfile</arg>
<arg>–compress</arg>
<arg>–compression-codec</arg>
<arg>org.apache.hadoop.io.compress.SnappyCodec</arg>
</sqoop><ok to=”invalidate-impala-metadata”/>
<error to=”kill”/>
</action><action name=”invalidate-impala-metadata”>
<shell xmlns=”uri:oozie:shell-action:0.1″>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node><configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${impalaFileName}</exec>
<file>${impalaFilePath}</file>
</shell>
<ok to=”fini”/>
<error to=”kill”/>
</action>
<kill name=”kill”>
<message>Workflow failed with error message ${wf:errorMessage(wf:lastErrorNode())}</message>
</kill><end name=”fini” /></workflow-app>

5. Configure impala_metadata.sh. This file will execute commands to revalidate impala metadata after each restore. Copy and paste the following data.

#!/bin/bash
export PYTHON_EGG_CACHE=./myeggs
impala-shell -i <hive_server> -q “invalidate metadata <hive_db_name>.<hive_table_name>”
  • Configure files under oozie-properties. Create file oozie.properties with contents as under. Edit the parameters as per requirements.
# Coordinator schedulings
freq=480
startTime=2015-04-28T14:00Z
endTime=2029-03-05T06:00Z jobTracker=<jobtracker>
nameNode=hdfs://<namenode>
queueName=<queue_name> rootDir=${nameNode}/user//oozie
workflowRoot=${rootDir}/<table_name>-ingest oozie.use.system.libpath=true
oozie.coord.application.path=${workflowRoot}/coordinator.xml # Sqoop settings
sqoop_directory=${nameNode}/data/sqoop # Hive/Impala Settings
hive_db_name=<hive_db_name>
impalaFileName=impala_metadata.sh
impalaFilePath=/user/oozie/<table_name>-ingest/impala_metadata.sh #impala_metadata.sh # MS SQL Server settings
db_string=jdbc:sqlserver://;databaseName=<sql_server_db_name>
db_username=<sql_server_username>
db_password=<sql_server_password>
db_table=<table_name>
db_columns=<columns>
  • Configure files under hive-<table_name>. Create a new file create-schema.hql with contents as under.
DROP TABLE IF EXISTS ;CREATE EXTERNAL TABLE ()
STORED AS PARQUET
LOCATION ‘hdfs:///data/sqoop/<table_name>'; Deployment
  • Create new directory in HDFS and copy files
$ hadoop fs -mkdir /user/<user>/oozie/<table_name>-ingest
$ hadoop fs -copyFromLocal <directory>/<table_name>/oozie-<table_name>-ingest/lib /user/<user>/oozie/ <table_name>-ingest
$ hadoop fs -copyFromLocal <directory>/<table_name>/oozie-<table_name>-ingest/ coordinator.xml /user/<user>/oozie/ <table_name>-ingest
$ hadoop fs -copyFromLocal <directory>/<table_name>/oozie-<table_name>-ingest/ impala_metadata.sh /user/<user>/oozie/<table_name>-ingest
$ hadoop fs -copyFromLocal <directory>/<table_name>/oozie-<table_name>-ingest/ workflow.xml /user/<user>/oozie/ <table_name>-ingest
  • Create new directory in HDFS for storing data files
$ hadoop fs -mkdir /user/SA.HadoopPipeline/oozie/<table_name>-ingest
$ hadoop fs -mkdir /data/sqoop/<table_name>
  • Now we are ready to select data in HIVE. Go to URL http://<hive_server>:8888/beeswax/#query.
a. Choose existing database on left or create new.
b. Paste contents of create-schema.hql in Query window and click Execute.
c. You should now have an external table in HIVE pointing to data in hdfs://<namenode>/data/sqoop/<table_name>
  • Create Oozie job
a. Choose existing database on left or create new.
$ oozie job -run -config /home/<user>/<<directory>/<table_name>/oozie-properties/oozie.properties Validation and Error Handling
  • At this point an oozie job should be created. To validate the oozie job creation open URL http://<hue_server>:8888/oozie/list_oozie_coordinators. Expected output as under. In case of error please review the logs for recent runs.
 oozie1
  • To validate the oozie job is running open URL http://<hue_server>:8888/oozie/list_oozie_workflows/ . Expected output as under. In case of error please review the logs for recent runs.
 oozie2
  • To validate data in HDFS execute the following command. You should see a file with *.metadata extension and a number of files with *.parquet extension.
$ hadoop fs -ls /data/sqoop/<table_name>/
  • Now we are ready to select data in HIVE or Impala.
    For HIVE go to URL http://<hue_server>:8888/beeswax/#query
    For Impala go to URL http://<hue_server>:8888/impala
    Choose the newly created database on left and execute the following SQL – select * from <hive_table_name> limit 10
    You should see the the data being outputted from the newly ingested data.
Categories: DBA Blogs

Log Buffer #422: A Carnival of the Vanities for DBAs

Fri, 2015-05-08 06:54

This Log Buffer Edition picks, choose and glean some of the top notch blog posts from Oracle, SQL Server and MySQL.

Oracle:

  • The standard images that come with devstack are very basic
  • Oracle is pleased to announce the release of Oracle VM VirtualBox 5.0 BETA 3
  • Monitoring Parallel Execution using Real-Time SQL Monitoring in Oracle Database 12c
  • Accessing your Cloud Integration API end point from Javascript
  • Are You Ready for The Future of Oracle Database?

SQL Server:

  • SQL Monitor Custom Metric: Plan Cache; Cache Pages Total
  • Generating A Password in SQL Server with T-SQL from Random Characters
  • This article explains how default trace can be used for auditing purposes when combined with PowerShell scripts
  • How to FTP a Dynamically Named Flat File
  • Alan Cooper helped to debug the most widely-used PC language of the late seventies and early eighties, BASIC-E, and, with Keith Parsons, developed C-BASIC. He then went on to create Tripod, which morphed eventually into Visual Basic in 1991.

MySQL:

  • There’s a new kid on the block in the NoSQL world – Azure DocumentDB
  • Spring Cleaning in the GIS Namespace
  • MySQL replication is among the top features of MySQL. In replication data is replicated from one MySQL Server (also knows as Master) to another MySQL Server (also known as Slave). MySQL Binlog Events is a set of libraries which work on top of replication and open directions for myriad of use cases like extracting data from binary log files, building applications to support heterogeneous replication, filtering events from binary log files and much more.
  • New to pstop – vmstat style stdout interface
  • The Perfect Server – Ubuntu 15.04 (Vivid Vervet) with Apache, PHP, MySQL, PureFTPD, BIND, Postfix, Dovecot and ISPConfig 3
Categories: DBA Blogs

OpenTSDB and Google Cloud Bigtable

Wed, 2015-05-06 02:15

Data comes in different shapes. One of the these shapes is called a time series. Time series is basically a sequence of data points recorded over time. If, for example, you measure the height of the tide every hour for 24 hours, then you will end up with a time series of 24 data points. Each data point will consist of tide height in meters and the hour it was recorded at.

Time series are very powerful data abstractions. There are a lot of processes around us that can be described by a simple measurement and a point in time this measurement was taken at. You can discover patterns in your website users behavior by measuring the number of unique visitors every couple of minutes. This time series will help you discover trends that depend on the time of day, day of the week, seasonal trends, etc. Monitoring a server’s health by recording metrics like CPU utilization, memory usage and active transactions in a database at a frequent interval is an approach that all DBAs and sysadmins are very familiar with. The real power of time series is in providing a simple mechanism for different types of aggregations and analytics. It is easy to find, for example, minimum and maximum values over a given period of time, or calculate average, sums and other statistics.

Building a scalable and reliable database for time series data has been a goal of companies and engineers out there for quite some time. With ever increasing volumes of both human and machine generated data the need for such systems is becoming more and more apparent.

OpenTSDB and HBase

There are different database systems that support time series data. Some of them (like Oracle) provide functionality to work with time series that is built on top of their existing relational storage. There are also some specialized solutions like InfluxDB.

OpenTSDB is somewhere in between these two approaches: it relies on HBase to provide scalable and reliable storage, but implements it’s own logic layer for storing and retrieving data on top of it.

OpenTSDB consists of a tsd process that handles all read/write requests to HBase and several protocols to interact with tsd. OpenTSDB can accept requests over Telnet or HTTP APIs, or you can use existing tools like tcollector to publish metrics to OpenTSDB.

OpenTSDB relies on scalability and performance properties of HBase to be able to handle high volumes of incoming metrics. Some of the largest OpenTSDB/HBase installations span over dozens of servers and process ~280k writes per second (numbers from http://www.slideshare.net/HBaseCon/ecosystem-session-6)

There exist a lot of different tools that complete OpenTSDB ecosystem from various metrics collectors to GUIs. This makes OpenTSDB one of the most popular ways to handle large volumes of time series information and one of the major HBase use cases as well. The main challenge with this configuration is that you will need to host your own (potentially very large) HBase cluster and deal with all related issues from hardware procurement to resource management, dealing with Java garbage collection, etc.

OpenTSDB and Google Cloud Bigtable

If you trace HBase ancestry you will soon find out that it all started when Google published a paper on a scalable data storage called Bigtable. Google has been using Bigtable internally for more than a decade as a back end for web index, Google Earth and other projects. The publication of the paper initiated creation of Apache HBase and Apache Cassandra, both very successful open source projects.

Latest release of Bigtable as a publicly available Google Cloud service gives you instant access to all the engineering effort that was put into Bigtable at Google over the years. Essentially, you are getting a flexible, robust HBase-like database that lacks some of the inherited HBase issues, like Java GC stalls. And it’s completely managed, meaning you don’t have to worry about provisioning hardware, handling failures, software installs, etc.

What does it mean for OpenTSDB and time series databases in general? Well, since HBase is built on Bigtable foundation it is actually API compatible with Google Cloud Bigtable. This means that your applications that work with HBase could be switched to work with Bigtable with minimal effort. Be aware of some of the existing limitations though. Pythian engineers are working on integrating OpenTSDB to work with Google Cloud Bigtable instead of HBase and we hope to be able to share results with the community shortly. Having Bigtable as a back end for OpenTSDB opens a lot of opportunities. It will provide you with a managed cloud-based time-series database, which can be scaled on demand and doesn’t require much maintenance effort.

There are some challenges that we have to deal with, especially around a client that OpenTSDB uses to connect to HBase. OpenTSDB uses it’s own implementation of HBase client called AsyncHBase. It is compatible on a wire protocol level with HBase 0.98, but uses a custom async Java library to allow for asynchronous interaction with HBase. This custom implementation allows OpenTSDB to perform HBase operations much faster than using standard HBase client.

While HBase API 1.0.0 introduced some asynchronous behavior using BufferedMutator it is not a trivial task to replace AsyncHBase with a standard HBase client, because it is tightly coupled with the rest of OpenTSDB code. Pythian engineers are working on trying out several ideas on how to make the transition to standard client look seamless from an OpenTSDB perspective. Once we have a standard HBase client working, connecting OpenTSDB to Bigtable should be simple.

Stay tuned.

Categories: DBA Blogs