Skip navigation.

Feed aggregator

Deploying a Private Cloud at Home — Part 5

Pythian Group - Mon, 2014-10-20 13:05

Today’s blog post is part five of seven in a series dedicated to Deploying Private Cloud at Home, where I will be demonstrating how to configure Compute node and OpenStack services on the compute node. We have already installed the MySQL Python library on compute node in previous posts.

  1. Install OpenStack compute packages on the node
    yum install -y openstack-nova-compute openstack-utils
  2. Configure Nova compute service
    openstack-config --set /etc/nova/nova.conf database connection mysql://nova:Youre_Password@controller/nova
    openstack-config --set /etc/nova/nova.conf DEFAULT auth_strategy keystone
    openstack-config --set /etc/nova/nova.conf keystone_authtoken auth_uri http://controller:5000
    openstack-config --set /etc/nova/nova.conf keystone_authtoken auth_host controller
    openstack-config --set /etc/nova/nova.conf keystone_authtoken auth_protocol http
    openstack-config --set /etc/nova/nova.conf keystone_authtoken auth_port 35357
    openstack-config --set /etc/nova/nova.conf keystone_authtoken admin_user nova
    openstack-config --set /etc/nova/nova.conf keystone_authtoken admin_tenant_name service
    openstack-config --set /etc/nova/nova.conf keystone_authtoken admin_password Your_Password
    openstack-config --set /etc/nova/nova.conf DEFAULT rpc_backend qpid
    openstack-config --set /etc/nova/nova.conf DEFAULT qpid_hostname controller
    openstack-config --set /etc/nova/nova.conf DEFAULT my_ip Your_Compute_node_IP
    openstack-config --set /etc/nova/nova.conf DEFAULT vnc_enabled True
    openstack-config --set /etc/nova/nova.conf DEFAULT vncserver_listen
    openstack-config --set /etc/nova/nova.conf DEFAULT vncserver_proxyclient_address Your_Compute_node_IP
    openstack-config --set /etc/nova/nova.conf DEFAULT novncproxy_base_url http://controller:6080/vnc_auto.html
    openstack-config --set /etc/nova/nova.conf DEFAULT glance_host controller
  3. Start the Compute service and its dependencies. Configure them to start automatically when the system boots
    service libvirtd start
    service messagebus start
    service openstack-nova-compute start
    chkconfig libvirtd on
    chkconfig messagebus on
    chkconfig openstack-nova-compute on
  4. Enable IP forwarding
    perl -pi -e 's,net.ipv4.ip_forward = 0,net.ipv4.ip_forward = 1,' /etc/sysctl.conf
    perl -pi -e 's,net.ipv4.conf.default.rp_filter = 1,net.ipv4.conf.default.rp_filter = 0,' /etc/sysctl.conf
    echo "net.ipv4.conf.all.rp_filter=0" >> /etc/sysctl.conf
    sysctl -p
  5. Install legacy networking components and Flat DHCP
    yum install -y openstack-nova-network openstack-nova-api

    We are using legacy networking and single NIC on both controller and compute nodes. Flat and public interfaces will be the same on below configuration. In this case, it is etho replace with the one you have on your system.

    openstack-config --set /etc/nova/nova.conf DEFAULT network_api_class
    openstack-config --set /etc/nova/nova.conf DEFAULT security_group_api nova
    openstack-config --set /etc/nova/nova.conf DEFAULT network_manager
    openstack-config --set /etc/nova/nova.conf DEFAULT firewall_driver nova.virt.libvirt.firewall.IptablesFirewallDriver
    openstack-config --set /etc/nova/nova.conf DEFAULT network_size 254
    openstack-config --set /etc/nova/nova.conf DEFAULT allow_same_net_traffic False
    openstack-config --set /etc/nova/nova.conf DEFAULT multi_host True
    openstack-config --set /etc/nova/nova.conf DEFAULT send_arp_for_ha True
    openstack-config --set /etc/nova/nova.conf DEFAULT share_dhcp_address True
    openstack-config --set /etc/nova/nova.conf DEFAULT force_dhcp_release True
    openstack-config --set /etc/nova/nova.conf DEFAULT flat_network_bridge br0
    openstack-config --set /etc/nova/nova.conf DEFAULT flat_interface eth0
    openstack-config --set /etc/nova/nova.conf DEFAULT public_interface eth0
  6. Start the services and configure them to start when the system bootsservice openstack-nova-network start
    service openstack-nova-metadata-api start
    chkconfig openstack-nova-network on
    chkconfig openstack-nova-metadata-api on
  7. Restart networking
    service network restart


This completes the configuration of compute node. Stay tuned for part six where we will configure network services on controller node

Categories: DBA Blogs


Yann Neuhaus - Mon, 2014-10-20 12:37

I've published recently my script to check index fragmentation. But then do you COALESCE or REBUILD? Well there is also another option - ALTER INDEX SHRINK SPACE. Let's compare all those index defragmentation operations.

Fragmented index

I have an index created when the table had 1 million rows and then I deleted 90% of the rows. Here is the index state from:

  • dbms_space.space_usage
  • index_stats after an analyze index validate structure
  • my index fragmentation checking script with 4 buckets
and here are the results:

      :UNF       :FS1       :FS2       :FS3       :FS4      :FULL  BLOCKS
---------- ---------- ---------- ---------- ---------- ---------- -------
         0          0          1          0          0       2230    2304

------- ------- ---------- -------- -------- -------- ----------- ---------- --------
      3    2304     100404     2226     2225        5         404    1622013       10

         N to          N rows/block bytes/block %free space  blocks free
---------- -- ---------- ---------- ----------- ----------- ------- -----
        10 ->     250280         45         714          91     557 oooo
    250730 ->     500370         45         714          91     557 oooo
    500820 ->     750010         45         714          91     556 oooo
    750460 ->     999660         45         714          91     556 oooo

I have 2226 leaf blocks, the index height is 3 with 5 branch blocks. The leaves are only 91.
However dbms_space shows only full blocks: the deleted entries are still there, just marked as deleted, and the blocks are still seen as full.


Now let's COALESCE:

SQL> alter index DEMO_N coalesce;

And before checking the same values about the index space I measure the amount of work that has been done by the operation (from v$mystat):

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                          22067
db block gets                                                         32818
session logical reads                                                 32886
db block changes                                                      40601
undo change vector size                                            35199264
redo size                                                          47878800

      :UNF       :FS1       :FS2       :FS3       :FS4      :FULL  BLOCKS
---------- ---------- ---------- ---------- ---------- ---------- -------
         0          0       2004          0          0        227    2304

------- ------- ---------- -------- -------- -------- ----------- ---------- --------
      3    2304     100000      223      222        5           0    1591530       88

         N to          N rows/block bytes/block %free space  blocks free
---------- -- ---------- ---------- ----------- ----------- ------- -----
        10 ->     248690        452        7170          11      56
    253200 ->     500800        450        7158          11      56
    505310 ->     752020        449        7132          11      56
    756530 ->     998730        443        7038          12      55

COALESCE is an online operation that defragment the leaf blocks. We have now only 223 leaf blocks that are 90% full (because my pctfree is the default 10%). But the index stil has the same height and still has 2300 blocks. Where are the reclaimed 2000 blocks? They are available if the index need a new block (for a block split). They are seen as FS2 (at least 25 to 50% free space) by dbms_space because they still contain the deleted rows, but they are fully reclaimable anyway.


Back with the same table, and doing a SHRINK SPACE COMPACT instead of COALESCE:

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                          28794
db block gets                                                         40940
session logical reads                                                 41527
db block changes                                                      49387
undo change vector size                                            36990460
redo size                                                          51848880

      :UNF       :FS1       :FS2       :FS3       :FS4      :FULL  BLOCKS
---------- ---------- ---------- ---------- ---------- ---------- -------
         0          0          1          0       2003        227    2304

------- ------- ---------- -------- -------- -------- ----------- ---------- --------
      3    2304     100000      223      222        5           0    1591530       88

         N to          N rows/block bytes/block %free space  blocks free
---------- -- ---------- ---------- ----------- ----------- ------- -----
        10 ->     248690        452        7170          11      56
    253200 ->     500800        450        7158          11      56
    505310 ->     752020        449        7132          11      56
    756530 ->     998730        443        7038          12      55

So what is the difference? Slightly more work (about 20% more logical reads and block changes) for the same result. Except that now the reclaimed blocks are in FS4 (75 to 100% free space).


What if we use SHRINK SPACE instead of SHRINK SPACE COMPACT?

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                          29352
db block gets                                                         45496
session logical reads                                                 46190
db block changes                                                      50032
undo change vector size                                            36981524
redo size                                                          51901500

      :UNF       :FS1       :FS2       :FS3       :FS4      :FULL  BLOCKS
---------- ---------- ---------- ---------- ---------- ---------- -------
         0          0          1          0          0        227     240

------- ------- ---------- -------- -------- -------- ----------- ---------- --------
      3     240     100000      223      222        5           0    1591530       88

         N to          N rows/block bytes/block %free space  blocks free
---------- -- ---------- ---------- ----------- ----------- ------- -----
        10 ->     248690        452        7170          11      56
    253200 ->     500800        450        7158          11      56
    505310 ->     752020        449        7132          11      56
    756530 ->     998730        443        7038          12      55

With tables, the shrink space lowers the high water mark. Here it is the same idea: in addition to the shrink space the reclaimed blocks are no more allocated to the index, so it can be used for other segments in the tablespace. We see that from dbms_space: the index is now 240 blocks only.


The previous requires a lock only for a short duration (according that we did the shrink space compact before). The rebuild needs a Share lock on the table during the whole operation, blocking concurrent DML.

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
db block gets                                                           953
redo entries                                                           1832
db block changes                                                       1906
session logical reads                                                  4019
undo change vector size                                                9152
redo size                                                            173732

      :UNF       :FS1       :FS2       :FS3       :FS4      :FULL  BLOCKS
---------- ---------- ---------- ---------- ---------- ---------- -------
         0          0          1          0          0        222     256

------- ------- ---------- -------- -------- -------- ----------- ---------- --------
      2     256     100000      222      221        1           0    1591520       90

         N to          N rows/block bytes/block %free space  blocks free
---------- -- ---------- ---------- ----------- ----------- ------- -----
        10 ->     248690        452        7170          11      56
    253200 ->     501250        451        7170          11      56
    505760 ->     749300        451        7170          11      55
    753810 ->     997350        448        7117          11      55

The result is the same as the previous operation (SHRINK SPACE) except that the index height has decreased. A rebuild is the right operation if the index blevel has become too high. And we did that offline but with much less work. Minimal undo and redo. And small blocks to read (when the index is still usable the rebuild can use the current index to rebuild the new segment).


Last operation, possible only in Enterprise Edition, is the rebuild online which doesn't need to lock the table.

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                            660
db block changes                                                        876
db block gets                                                          1419
session logical reads                                                  4989
undo change vector size                                               24932
redo size                                                            114924

      :UNF       :FS1       :FS2       :FS3       :FS4      :FULL  BLOCKS
---------- ---------- ---------- ---------- ---------- ---------- -------
         0          0          1          0          0        222     256

------- ------- ---------- -------- -------- -------- ----------- ---------- --------
      2     256     100000      222      221        1           0    1591520       90

         N to          N rows/block bytes/block %free space  blocks free
---------- -- ---------- ---------- ----------- ----------- ------- -----
        10 ->     248690        452        7170          11      56
    253200 ->     501250        451        7170          11      56
    505760 ->     749300        451        7170          11      55
    753810 ->     997350        448        7117          11      55

Here we don't see an overhead to do it online. This is because my table is small (my testcase has only one column which is the indexed one). On a real table you will probably see that the online rebuild takes longer than the offline one, because it cannot use the current index. But anyway, the fact that it is online means that the duration is not a big issue.


This is an example on a table that had a massive purge: all blocks were touched. In that case the REBUILD is the right solution. However if you are in Standard Edition and cannot do it online, then you will probably do a SHRINK SPACE COMPACT because it lets you do a SHRINK SPACE (need a quick locks but for a very short duration - to do on a period of low activity).
COALESCE will make sense here only if you know you will insert back a lot of rows, so that you don't need to deallocate the blocks from the index.

Now, what to do if the free space to reclaim is only on small part of the index blocks? As in the following case:

         N to          N rows/block bytes/block %free space  blocks free
---------- -- ---------- ---------- ----------- ----------- ------- -----
        10 ->      50468        374        5974          26     112 o
     50917 ->     100756        449        7179          11     112
    101205 ->     151044        449        7179          11     112
    151493 ->     201332        449        7179          11     112
    201781 ->     251620        449        7179          11     112
    252069 ->     301908        449        7179          11     112
    302357 ->     351747        449        7179          11     111
    352196 ->     401586        449        7179          11     111
    402035 ->     451425        449        7179          11     111
    451874 ->     501264        449        7179          11     111
    501713 ->     551103        449        7179          11     111
In that case a COALESCE is probably the best because you keep the blocks allocated to the index for future growth. And the few blocks reclaimed will probably not change the index height anyway. However, if you did a small purge that concern only a small part of the index and want to reclaim the space for other segments, then the SHRINK SPACE is the right solution. But do you really need to reclaim space in that case?

Now you see the usage for my index fragmentation script: I don't need only the average free space. I need to have a clear picture of the fragmentation in order to decide what to do and how.

OCP 12C – ADR and Network Enhancements

DBA Scripts and Articles - Mon, 2014-10-20 11:59

ADR enhancements In oracle 12c the Automatic Diagnostic Repository contains a new log directories with 2 subdirectories : DDL Debug The DDL log When you active the DDL logging in Oracle 12c using enable_ddl_logging=true, Oracle writes all DDL operations into the specific DDL log instead of writting it to the alert log. To enable DDL logging [...]

The post OCP 12C – ADR and Network Enhancements appeared first on Oracle DBA Scripts and Articles (Montreal).

Categories: DBA Blogs

OCP 12C – Emergency Monitoring, Real-Time ADDM

DBA Scripts and Articles - Mon, 2014-10-20 11:48

Emergency Monitoring Emergency monitoring is meant for extreme circumstances where it’s impossible for you to connect to the database because the database is hung. Emergency monitoring allows you to connect to the database in diagnostic mode and run a lightweight analysis to see what’s happening. You can access real-time performance data from ASH and access [...]

The post OCP 12C – Emergency Monitoring, Real-Time ADDM appeared first on Oracle DBA Scripts and Articles (Montreal).

Categories: DBA Blogs

First Fluid applications to be delivered for PeopleSoft HCM 9.2 on October 27th

Javier Delgado - Mon, 2014-10-20 10:58

During Oracle OpenWorld 2014 the announcement was made that the first functionalities taking advantage of the new Fluid interface capabilities provided in PeopleTools 8.54 would be made available together with the PeopleSoft HCM 9.2 Update Image 9.
Now, according to My Oracle Support, this image is going to be released on next October 27th. Although you need to have PeopleSoft HCM 9.2 and PeopleTools 8.54 to apply these enhancements to your environments, you will still be able to download the Update Image virtual machine and play around with the first delivered Fluid applications.
So, unless the image delivery date is delayed, we should be able to enjoy the first Fluid applications in less than two weeks.

Note - Oct 20th 2014: Although the image was originally announced for today, Oracle has just posted a new availability date for October 27th, 2014.

Avro MapReduce Jobs in Oozie

Pythian Group - Mon, 2014-10-20 07:56

Normally when using Avro files as input or output to a MapReduce job, you write a Java main[] method to set up the Job using AvroJob. That documentation page does a good job of explaining where to use AvroMappers, AvroReducers, and the AvroKey and AvroValue (N.B. if you want a file full of a particular Avro object, not key-value pair of two Avro types, use AvroKeyOutputWriter as the OutputFormat, AvroKey as the key and NullWritable as the value).

Sometimes (like if you’re using Oozie), you need to set everything up without using AvroJob as a helper. The documentation is less clear here, so here’s a list of Hadoop keys and the appropriate values (for MRv2):

  • avro.schema.output.key - The JSON representation of the output key’s Avro schema. For large objects you may run afoul of Oozie’s 100,000 character workflow limit, in which case you can isolate your Avro job in a subflow
  • avro.schema.output.value – Likewise, if you’re emitting key-value pairs instead of using AvroKeyOutputWriter, put your value’s JSON schema here
  • avro.mapper - your mapper class that extends AvroMapper. You can also use a normal Mapper (with the normal Mapper configuration option), but you’ll have to handle coverting the AvroKey/AvroValue yourself
  • avro.reducer - likewise, a class that extends AvroReducer
  • mapreduce.job.output.key.class - always AvroKey
  • mapreduce.job.output.value.class – AvroValue or NullWritable, as above
  • mapreduce.input.format.class  - if you’re reading Avro files as Input, you’ll need to set this to
  • - AvroKey, if you’re using a subclass of AvroMapper. If you write your own Mapper, you can pick
  • - AvroKey or NullWritable, unless you write a Mapper without subclassing AvroMapper
  • io.serializations  – AvroJob set this value to the following:,,,

With these configuration options you should be able to set up an Avro job in Oozie, or any other place where you have to set up your MapReduce job manually.

Categories: DBA Blogs

An Introduction to Extended Data Types in Oracle 12c

Pythian Group - Mon, 2014-10-20 07:55

One of the lesser known new features that comes as a boon to many developers and DBAs is the provision of implicit handling of large data strings using scalar data types like VARCHAR2 and RAW.

When creating tables, each column must be assigned a data type, which determines the nature of the values that can be inserted into the column. Common data types include number, date, and varchar2. These data types are also used to specify the nature of arguments for PL/SQL programs like functions and procedures.

When choosing a data type, you must carefully consider the data you plan to store and the operations you may want to perform upon it. Making good decisions at the table design stage reduces the potential negative downstream impact on space utilization and performance. Space is a consideration since some data types occupy a fixed length, consuming the same number of bytes, no matter what data is actually stored in it.

In pre-12c databases, long characters strings of more than 4000 bytes had to be handled using creative solutions including: CLOB or LONG data types and multiple columns or variables. These approaches led to inefficient unnecessarily complex designs and added processing overheads.

12c introduced the MAX_STRING_SIZE system parameter that allows string data types to be much larger when the parameter is changed from its default value of STANDARD to EXTENDED. The VARCHAR2 data type, stores variable length character data from 1 to 4000 bytes if MAX_STRING_SIZE=STANDARD or up to 32767 bytes if MAX_STRING_SIZE=EXTENDED.

RAW and NVARCHAR2 data types are affected in similar ways.


Potential issues to consider:

  • Internally, extended data types are stored out-of-line using LOBs, but these cannot be manipulated using the DBMS_LOB interface.
  • When changing the MAX_STRING_SIZE parameter, objects may be updated invalidating dependent objects, so ideally, change this parameter during a maintenance window in your important databases.
  • List partitioning on EDT columns may potentially exceed the 4096 byte limit for the partition bounds. The DEFAULT partition may be used for data values that exceed the 4096 byte limit or a hash function may be used on the data to create unique identifiers smaller than 4096 bytes.
  • Indexing EDT columns may fail with “maximum key length exceeded” errors. For example, databases with an 8k default block size support a maximum key length of approximately 6400 bytes. A suggested work-around is to use a virtual column or function-based index to effectively shorten the index key length.



This feature will no doubt be improved and the shortcomings will be dealt with in future releases—but for now, it offers a clean and elegant mechanism for handling large character data within existing applications requiring minimal code changes.



Categories: DBA Blogs

Old Castles

Pete Scott - Mon, 2014-10-20 06:12
Living here on the Kent Coast we are quite blessed with the number of castles within half and hour’s drive of our cottage. English Heritage manages several nearby castles or forts. The nearest, Richborough, is out and out Roman. We had a lot of Romans roaming around here, they even strolled past my cottage along […]

Connecting to Pivotal Cloud Foundry Ops Metrics using Java VisualVM

Pas Apicella - Sun, 2014-10-19 21:34
The Pivotal Ops Metrics tool is a JMX extension for Elastic Runtime. Pivotal Ops Metrics collects and exposes system data from Cloud Foundry components via a JMX endpoint. Use this system data to monitor your installation and assist in troubleshooting. Below is the tile once installed and available with Pivotal Cloud Foundry Ops Manager

Once installed and configured, metrics for Cloud Foundry components automatically report to the JMX endpoint. Your JMX client uses the credentials supplied to connect to the IP address of the Pivotal Ops Metrics JMX Provider at port 44444

1. Start jvisualvm

2. Under plugin ensure you have the VisualVm-Mbeans plugin installed as shown below, or install it to be able to view the MBeans.

3. Create a JMX connection as shown below

4. Finally the CF MBeans can be viewed as shown below.

More Information

Deploying Pivotal Ops Metrics
Categories: Fusion Middleware

SQLShell accessing Pivotal GemFire XD 1.3

Pas Apicella - Sun, 2014-10-19 19:09
I stumbled open SQLShell recently as per the URL below. Below I will show how you can connect to Pivotal GemFireXD using SQLShell. I used this to export query results using CSV output.

Note: Assuming SQLShell is already installed and instructions below are for Mac OSX

1. Create a file in $HOME/.sqlshell/config as shown below, I just took the sample it ships with. Notice how I have added an alias for "gemfirexd", highlighted below.

# ---------------------------------------------------------------------------
# initialization file for SQLShell

#colspacing: 2

# Driver aliases.
postgresql = org.postgresql.Driver
postgres = org.postgresql.Driver
mysql = com.mysql.jdbc.Driver
sqlite = org.sqlite.JDBC
sqlite3 = org.sqlite.JDBC
oracle = oracle.jdbc.driver.OracleDriver
access = sun.jdbc.odbc.JdbcOdbcDriver
gemfirexd = com.pivotal.gemfirexd.jdbc.ClientDriver

historyDir: ${env.HOME}/.sqlshell

aliases: post
url: jdbc:postgresql://localhost:5432/sampledb
driver: postgres
user: ${}
history: $vars.historyDir/postgres.hist

driver: mysql
url: jdbc:mysql://localhost:3306/sampledb
user: ${}
history: $vars.historyDir/mysql.hist

aliases: sqlite3
url: jdbc:sqlite:/tmp/sample.db
driver: sqlite
history: $vars.historyDir/sqlite3.hist

aliases: ora
schema: example
url: jdbc:oracle:thin:@localhost:1521:sampledb
user: ${}
driver: oracle
history: $vars.historyDir/scrgskd

driver: access
url: jdbc:odbc:Driver={Microsoft Access Driver (*.mdb)};DBQ=/tmp/sample.mdb;DriverID=22}

2. Add Pivotal GemFireXd client driver "gemfirexd-client.jar" to "/Applications/sqlshell/lib"

3. With Pivotal GemFireXD cluster up and running connect and run some commands as shown below.
[Mon Oct 20 11:56:10 papicella@:~/vmware/software/sqlshell ] $ sqlshell gemfirexd,jdbc:gemfirexd://localhost:1527
SQLShell, version 0.8.1 (2012/03/16 09:43:31)
Copyright (c) 2009-2011 Brian M. Clapper
Using JLine
Type "help" for help. Type ".about" for more information.

sqlshell> .set schema APP

sqlshell> .show tables
sqlshell> select * from dept;
Execution time: 0.21 seconds
Retrieval time: 0.6 seconds
7 rows returned.

------ ---------- --------

sqlshell> .capture to /tmp/results.csv
Capturing result sets to: /tmp/results.csv

sqlshell> select * from emp where deptno = 10;
Execution time: 0.18 seconds
Retrieval time: 0.5 seconds
3 rows returned.

----- ------ --------- ---- --------------------- ---- ---- ------
7782 CLARK MANAGER 7839 1981/06/09 00:00:00.0 2450 NULL 10
7839 KING PRESIDENT NULL 1981/11/17 00:00:00.0 5000 NULL 10
7934 MILLER CLERK 7782 1982/01/23 00:00:00.0 1300 NULL 10

sqlshell> .capture off
No longer capturing query results.

Categories: Fusion Middleware

Y Combinator companies has more funding than the sum total of all remaining accelerators

Nilesh Jethwa - Sun, 2014-10-19 13:05

After finishing our call with Bed Bugs , we decided to check out what the startup scene looks like. We used the data from seed-db to let our analytical juices flowing.

First we asked what is the top most program (duh!!) but by how much and who are next in the list and so on.

Like most Data scientists who believe in the power of simple bar graphs we used our first “chart weapon” of choice and here it is what it rendered.


Y Combinator is freaking huge like a dinasaur, infact very much resembles the grass eating Sauropods. In fact we had to create a chart that was 3000 pixels wide just to accommodate all.


See the resemblance between the chart and the Sauropod?

To get better perspective we rendered it in a Treemap as shown


Looking at the treemap, Y Combinator occupies more than the sum total of all the remaining accelerators. That is super amazing but the problem our charts were not coming up beautiful. YC is clearly the outlier and was causing us difficulty to understand the remainder startup ecosystem.

We said, lets cut off the head to dig deeper.

The moment we filtered out YC from our analysis, all of the regions became colorful and that was certainly a visual treat.


Now we could clearly see what are the other accelerators/programs that are roughly the same size.

For example,

TechStars Boulder and AngelPad are roughly the same

TechStars NYC, TechStars Boston and 500Startups are in the same club

Similarly DreamIT, fbFund and Mucker Lab share the same color.

Now let us try to see from the location angle


So we re-established that YC is freaking huge and having them on a chart with other accelerators does not create beautiful visualizations.


Plan depth

Jonathan Lewis - Sun, 2014-10-19 11:20

A recent posting on OTN reminded me that I haven’t been poking Oracle 12c very hard to see which defects in reporting execution plans have been fixed. The last time I wrote something about the problem was about 20 months ago referencing; but there are still oddities and irritations that make the nice easy “first child first” algorithm fail because the depth calculated by Oracle doesn’t match the level that you would get from a connect-by query on the underlying plan table. Here’s a simple fail in 12c:

create table t1
	rownum 			id,
	lpad(rownum,200)	padding
from	all_objects
where	rownum <= 2500

create table t2
select	* from t1

-- call dbms_stats to gather stats

explain plan for
	case mod(id,2)
		when 1 then (select max( from t1 where <=
		when 0 then (select max( from t1 where >=
	end id
from	t2

select * from table(dbms_xplan.display);

It ought to be fairly clear that the two inline scalar subqueries against t1 should be presented at the same level in the execution hierarchy; but here’s the execution plan you get from Oracle:

| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT     |      |  2500 | 10000 | 28039   (2)| 00:00:02 |
|   1 |  SORT AGGREGATE      |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   125 |   500 |    11   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  4 |     TABLE ACCESS FULL| T1   |   125 |   500 |    11   (0)| 00:00:01 |
|   5 |  TABLE ACCESS FULL   | T2   |  2500 | 10000 |    11   (0)| 00:00:01 |

Predicate Information (identified by operation id):
   2 - filter("T1"."ID"<=:B1)
   4 - filter("T1"."ID">=:B1)

As you can see, the immediate (default?) visual impression you get from the plan is that one of the subqueries is subordinate to the other. On the other hand if you check the id and parent_id columns from the plan_table you’ll find that lines 1 and 3 are both direct descendents of line 0 – so they ought to have the same depth. The plan below is what you get if you run the 8i query from utlxpls.sql against the plan_table.

SQL> select id, parent_id from plan_table;

        ID  PARENT_ID
---------- ----------
         1          0
         2          1
         3          0
         4          3
         5          0

| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
| SELECT STATEMENT          |          |     2K|    9K|  28039 |       |       |
|  SORT AGGREGATE           |          |     1 |    4 |        |       |       |
|   TABLE ACCESS FULL       |T1        |   125 |  500 |     11 |       |       |
|  SORT AGGREGATE           |          |     1 |    4 |        |       |       |
|   TABLE ACCESS FULL       |T1        |   125 |  500 |     11 |       |       |
|  TABLE ACCESS FULL        |T2        |     2K|    9K|     11 |       |       |

So next time you see a plan and the indentation doesn’t quite seem to make sense, perhaps a quick query to select the id and parent_id will let you check whether you’ve found an example where the depth calculation produces a misleading result.


Update 20th Oct 2014

A question via twitter – does the error also show up with dbms_xplan.display_cursor(), SQL tuning sets, AWR, etc. or is it just a defect of explain plan. Since the depth is (probably) a derived value for display purposes that Oracle doesn’t use internally for executing the plan I would be inclined to assume that the defect is universal, but I’ve only checked it through explain plan/display, and through execution/display_cursor().




Learning Spark Lightning-Fast Big Data Analytics by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia; O'Reilly Media

Surachart Opun - Sat, 2014-10-18 12:45
Apache Spark started as a research project at UC Berkeley in the AMPLab, which focuses on big data analytics. Spark is an open source cluster computing platform designed to be fast and general-purpose for data analytics - It's both fast to run and write. Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly much quicker than with disk-based systems like Hadoop MapReduce. Users can write applications quickly in Java, Scala or Python. In additional, it's easy to run standalone or on EC2 or Mesos. It can read data from HDFS, HBase, Cassandra, and any Hadoop data source.
If you would like a book about Spark - Learning Spark Lightning-Fast Big Data Analytics by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia. It's a great book for who is interested in Spark development and starting with it. Readers will learn how to express MapReduce jobs with just a few simple lines of Spark code and more...
  • Quickly dive into Spark capabilities such as collect, count, reduce, and save
  • Use one programming paradigm instead of mixing and matching tools such as Hive, Hadoop, Mahout, and S4/Storm
  • Learn how to run interactive, iterative, and incremental analyses
  • Integrate with Scala to manipulate distributed datasets like local collections
  • Tackle partitioning issues, data locality, default hash partitioning, user-defined partitioners, and custom serialization
  • Use other languages by means of pipe() to achieve the equivalent of Hadoop streaming
With Early Release - 7 chapters. Explained Apache Spark overview, downloading and commands that should know, programming with RDDS (+ more advance) as well as working with Key-Value Pairs, etc. Easy to read and Good examples in a book. For people who want to learn Apache Spark or use Spark for Data Analytic. It's a book, that should keep in shelf.

Book: Learning Spark Lightning-Fast Big Data Analytics
Authors: Holden KarauAndy KonwinskiPatrick WendellMatei ZahariaWritten By: Surachart Opun
Categories: DBA Blogs

Going dotty – Generating a Filename containing a parameter value in SQL*Plus

The Anti-Kyte - Sat, 2014-10-18 12:01

As I have alluded to previously, I was not born in the UK.
Nope, my parents decided to up-sticks and move from London all the way to the
other side of the world, namely Auckland.
Then they had me. Then they came back.
To this day, they refuse to comment on whether these two events were related.

I went back to New Zealand a few years ago.
As I wandered around places that I hadn’t seen since I was five, it was strange how memories that I had forgotten came flooding back.
That last sentence doesn’t make much sense. It’s probably more accurate to say that memories I hadn’t thought about for years came flooding back.

I recently remembered something else I once knew, and then forgot – namely how to generate a SQL*Plus file name which includes a parameter value.

The scenario

I’ve got a script that lists all of the employees in a given department :

accept deptno prompt 'Enter Department ID : '
spool department.lis

select first_name, last_name
from hr.employees
where department_id = &deptno
order by employee_id

spool off

Now, rather than it just creating a file called department.lis, I want to create a file that includes the department number I’m querying.

Obvious…but wrong

You might think the following is a reasonable attempt to do this :

accept deptno prompt 'Enter Department ID : '
spool department_&deptno.lis

select first_name, last_name
from hr.employees
where department_id = &&deptno
order by employee_id

spool off

Unfortunately, SQL*Plus insists on being obtuse and outputting the following file :


It is at this point that a colleague came to the rescue ( thanks William)…

Going dotty

This will do the job…

accept deptno prompt 'Enter Department ID : '

spool department_&deptno..lis

select first_name, last_name
from hr.employees
where department_id = &deptno
order by employee_id

spool off

Run this and we not only get :

Enter Department ID : 10
old   3: where department_id = &deptno
new   3: where department_id = 10

-------------------- -------------------------
Jennifer             Whalen


…we get a file, appropriately named :


The magic here is that the “.” character delimits the variable substitution.
Just to prove the point, we can do the same with a positional parameter :

set verify off

spool department_&1..lis

select first_name, last_name
from hr.employees
where department_id = &1
order by employee_id

spool off

…run this and we get :

SQL> @position_param.sql 10

-------------------- -------------------------
Jennifer             Whalen


…and the appropriate file…


On that note, I’m off to the pub. Now, where did I leave my keys ?

Filed under: Oracle, SQL Tagged: spool; filename including a variable value, SQL*Plus

Video and Slides - Data Caching Strategies for Oracle Mobile Application Framework

Andrejus Baranovski - Sat, 2014-10-18 09:08
I have recorded a video tutorial, based on my OOW'14 session - Data Caching Strategies for Oracle Mobile Application Framework. ADF developers who could not attend OOW'14 in San Francisco, this is for you !

Here you can view the slides:

Data Caching Strategies for Oracle Mobile Application Framework
Watch the first part of the tutorial:

Watch the second part of the tutorial:

Described solution is based on the sample application from the blog post - Transactional Data Caching for ADF Mobile MAF Application.

Bandwidth and Latency

Hemant K Chitale - Sat, 2014-10-18 08:40
Here is, verbatim, an article I posted on Linked-In yesterday  [For other posts on Linked-In, view my Linked-In profile] :

Imagine an 8-lane highway. Now imagine a 4-lane highway. Which has the greater bandwidth ?Imagine your organisation sends its employees on a wekend "retreat" by bus. You have the choice of two locations, one that is 200kilometres away and the other is 80kilometres away. Assume that buses travel at a constant speed of 80kmph. Which resort will your employees get to faster ?The first question is about bandwidth. The second is about latency.(Why should I assume a fixed speed for the buses ? Because, I can assume a fixed speed at which electrons transfer over a wire or photons over a light channel).Expand the question further. What if the organisation needs to send 32 employees in a 40-seater bus. Does it matter that the bus can travel on an 8-lane highway versus a 4-lane highway (assuming minimal other traffic on the highways at that time) ?Too often, naive "architects" do not differentiate between the two. If my organisation needs to configure a standby (DR) location for the key databases and has a choice of two locations but varying types of network services, it should consider *both* bandwidth and latency. If the volume of redo is 1000MBytes per minute and this, factoring overheads for packetizing the "data", translates to 167Mbits per second, should I just go ahead and buy bandwidth of 200Mbits per second ? If the two sites have two different network services providers offering different bandwidths, should I simply locate at the site with the greater bandwidth ? What if the time it takes to synchronously write my data to site "A" is 4ms and the time to site "B" is 8ms ? Should I not factor the latency ? (I am assuming that the "write to disk" speed of hardware at either site is the same -- i.e. the hardware is the same). I can then add the complications of network routers and switches that add to the latency. Software configurations, flow-control mechanisms, QoS definitions and hardware configuration can also impact bandwidth and latency in different ways.Now, extend this to data transfers ("output" or "results") from a database server to an application server or end-user. If the existing link is 100Mbps and is upgraded to 1Gbps, the time to "report" 100 rows is unlikely to change as this time is a function of the latency. However, if the number of concurrent users grows from 10 to 500, the bandwidth requirement may increase and yet each user may still have the same "wait" time to see his results (assuming that there are no server hardware constraints returning results for 500 users).On the flip side, consider ETL servers loading data into a database. Latency is as important as bandwidth. An ETL scheme that does "row-by-row" loads relies on latency, not bandwidth. Increasing bandwidth doesn't help such a scheme.Think about the two.
Categories: DBA Blogs

Bed bugs in Boston – Analysis of Boston 311 public dataset

Nilesh Jethwa - Fri, 2014-10-17 11:22

Digging into the Boston public Dataset can reveal interesting and juicy facts.

Even though there is nothing juicy about Bed bugs but the data about Boston open cases for Bed bugs is quite interesting and worth looking at.

We uploaded the entire 50 mb data dump which is around 500K rows into the Data Visualizer and filtered the category for Bed Bugs. Splitting the date into its date hierarchy components we then plotted the month on the Y axis.

InfoCaptor : Analytics & dashboards

It seems that the City of Boston started collecting this data around 2011 and has only partial data for that year.

Interestingly, the number of Bed bug cases seem to rise during the summer months.

Now if we break the lines into Quarters (we just add the quarter hierarchy to the mix)

InfoCaptor : Analytics & dashboards


Public, private health care systems possess security vulnerabilities

Chris Foot - Fri, 2014-10-17 11:12

System and database administrators from health care institutions are facing several challenges.

On one hand, many are obligated to migrate legacy applications to state-of-the-art electronic health record solutions. In addition, they need to ensure the information contained in those environments is protected.

Operating systems, network configurations and a wealth of other factors can either make or break security architectures. If these components are unable to receive frequent updates from vendor-certified developers, it can cause nightmares for database administration professionals. 

Windows XP no longer a valid option 
When Microsoft ceased to provide to support for Windows XP in early April, not as many businesses upgraded to Windows 7 or 8 as the software vendor's leaders had hoped. This means those using XP will not receive regular security updates, leaving them open to attacks as hackers work to find vulnerabilities with the OS.

Despite continuous warnings from Microsoft and the IT community, Information Security Buzz contributor Rebecca Herold believes that the a large percentage of medical devices currently in use are running XP. Her allegations are based on reports submitted by health care electronics producers that stated they leverage XP for the sensors' graphical user interfaces, as well as to create a connection to external databases. 

Because Microsoft has yet to release the source code of XP, health care companies using these implementations have no way of identifying vulnerabilities independently. Even if the source code was distributed, it's unlikely that the majority of medical providers could use in-house resources to search for security flaws. The only way to defend the servers linked with devices running XP is to employ database active monitoring. 

Public sector experiencing vulnerabilities apparently isn't picture-perfect, either. Fed Scoop reported that white hat hackers working for the U.S. Department of Health and Human Services' Office of the Inspector General discovered that personally identifiable information was secured, but some data controlled by the Centers for Medicare and Medicaid Services lacked adequate protection

After an assessment of CMS and databases was completed, the IG advised the organization to encode files with an algorithm approved by Federal Information Processing Standards 140-2. However, authorities at the CMS deduced this wasn't necessary. 

Although this wasn't the first audit of (and it likely won't be the last), the information held within its servers is too valuable for cybercriminals to ignore. Setting up an automated, yet sophisticated intrusion detection program to notify DBAs when user activity appears inconsistent is a step the CMS should strongly consider taking. 

The post Public, private health care systems possess security vulnerabilities appeared first on Remote DBA Experts.

JPMorgan hack joins list of largest data breaches in history [VIDEO]

Chris Foot - Fri, 2014-10-17 09:02


Hi, welcome to RDX. With news about data breaches sweeping the Web on a regular basis, it's no surprise that the latest victim was a major U.S. bank.

According to Bloomberg, hackers gained access to a server operated by JPMorgan Chase, stealing data on 76 million homes and 7 million small businesses.

After further investigation, the FBI discovered the hackers gained access to a server lacking two-factor authentication. From there, the hackers found fractures in the bank's custom software, through which JPMorgan's security team unknowingly gave them access to data banks.

To prevent such attacks from occurring, firms should regularly assess their databases and solutions to find vulnerabilities.

Thanks for watching! Be sure to visit us next time for info on RDX's security services.

The post JPMorgan hack joins list of largest data breaches in history [VIDEO] appeared first on Remote DBA Experts.