For the past year at the AppsLab we have been exploring the possibilities of advanced user interactions using BLE beacons. A couple days ago, Google (unofficially) announced that one of their Chrome teams is working on what I’m calling the gBeacon. They are calling it the Physical Web.
This is how they describe it:
“The Physical Web is an approach to unleash the core superpower of the web: interaction on demand. People should be able to walk up to any smart device – a vending machine, a poster, a toy, a bus stop, a rental car – and not have to download an app first. Everything should be just a tap away.
The Physical Web is not shipping yet nor is it a Google product. This is an early-stage experimental project and we’re developing it out in the open as we do all things related to the web. This should only be of interest to developers looking to test out this feature and provide us feedback.”
Here is a short run down of how iBeacon works vs The Physical Web beacons:
The iBeacon profile advertises a 30 byte packet containing three values that combined make a unique identifier: UUID, Major, Minor. The mobile device will actively listen for these packets. When it gets close to one of them it will query a database (cloud) or use hard-coded values to determine what it needs to do or show for that beacon. Generally the UUID is set to identify a common organization. Major value is an asset within that organization, and Minor is a subset of assets belonging to the Major.
For example, if I’m close to the Oracle campus, and I have an Oracle application that is actively listening for beacons, then as I get within reach of any beacon my app can trigger certain interactions related to the whole organization (“Hello Noel, Welcome to Oracle.”) The application had to query a database to know what that UUID represents. As I reach building 200, my application picks up another beacon that contains a Major value of lets say 200. Then my app will do the same and query to see what it represents (“You are in building 200.”) Finally when I get close to our new Cloud UX Lab, a beacon inside the lab will broadcast a Minor ID that represents the lab (“This is the Cloud UX lab, want to learn more?”)
iBeacons are designed to work as full closed ecosystem where only the deployed devices (app+beacons+db) will know what a beacon represents. Today I can walk to the Apple store and use a Bluetooth app to “sniff” BLE devices, but unless I know what their UUID/Major/Minor values represent I cannot do anything with that information. Only the official Apple Store app will know what do with when is nearby beacons around the store (“Looks like you are looking for a new iPhone case.”)
As you can see the iBeacon approach is a “push” method where the device will proactively push actions to you. In contrast the Physical Web beacon proposes to act as a “pull” or on-demand method.
The Physical Web gBeacon will advertise a 28 bytes packet containing an encoded URL. Google wants to use the familiar and established method of URLs to tell an application, or an OS, where to find information about physical objects. They plan to use context (physical and virtual) to top rank what might be more important to you at the current time and display it.
The Physical Web approach is designed to be a “pull” discovery service where most likely the user will initiate the interaction. For example, when I arrive to the Oracle campus, I can start an application that will scan for nearby gBeacons or I can open my Chrome browser and do a search. The application or browser will use context to top rank nearby objects combined with results. It can also use calendar data, email or Google Now to narrow down interests. A background process with “push” capabilities could also be implemented. This process could have filters that can alert the user of nearby objects of interest. These interests rules could be predefined or inferred by using Google’s intelligence gathering systems like Google Now.
The main difference between the two approaches is that iBeacons is a closed ecosystem (app+beacons+db) and the Physical Web is intended to be a public self discovered (app/os+beacons+www) physical extension of the web. Although the Physical Web could also be restricted by using protected websites and encrypted URLs.
Both approaches are accounting to prevent the misconception about these technologies: “I am going to be spammed as soon as I walk inside a mall?” The answer is NO. iBeacons is an opt-in service within an app and the Physical Web beacons will mostly work on-demand or will have filter subscriptions.
So there you have it. Which method do you prefer?Possibly Related Posts:
- Google Glass + iBeacons
- Oracle OpenWorld and JavaOne 2014 Cometh
- Is Google an Advertising Company?
- Four Months with Chromecast
- Enterprise Clouds
As Oracle OpenWorld 2014 comes to a close, we wanted to reflect on the week and provide some highlights for you all!
We say this every year, but this year's event was one of the best ones yet. We had more than 35 scheduled sessions, plus user group sessions, 10 live product demos, and 7 hands-on labs devoted to Oracle WebCenter and Oracle Business Process Management (Oracle BPM) solutions. This year's Oracle OpenWorld provided broad and deep insight into next-generation solutions that increase business agility, improve performance, and drive personal, contextual, and multichannel interactions.
Oracle WebCenter & BPM Customer Appreciation Reception
Our 8th annual Oracle WebCenter & BPM Customer Appreciation Reception was held for the second year at San Francisco’s Old Mint, a National Historic Landmark. This was a great evening of networking and relationship building, where the Oracle WebCenter & BPM community had the chance to mingle and make new connections. Many thanks to our partners Aurionpro, AVIO Consulting, Bezzotech, Fishbowl Solutions, Keste, Redstone Content Solutions, TekStream & VASSIT for sponsoring!
Oracle Fusion Middleware Innovation Awards
Oracle Fusion Middleware Innovation honors Oracle customers for their cutting-edge solutions using Oracle Fusion Middleware. Winners were selected based on the uniqueness of their business case, business benefits, level of impact relative to the size of the organization, complexity and magnitude of implementation, and the originality of architecture. This year’s winners for WebCenter were Bank of Lebanon and McAfee.
This year’s winners for the BPM category were State Revenue Office, Victoria and Vertafore.
Oracle Appreciation Event at Treasure Island
We stayed up past our bedtimes rocking to Aerosmith and hip-hopping to Macklemore & Ryan Lewis and Spacehog at the Oracle Appreciation Event. These award-winners—plus free-flowing networking, food, and drink—made Wednesday evening magical at Treasure Island. Once we arrived on Treasure Island, we saw that it had been transformed and we were wowed by the 360-degree views of Bay Area skylines (with an even better view from the top of the Ferris wheel). We tested our skills playing arcade games between acts, and relaxed and enjoyed ourselves after a busy couple of days.
Cloud was one of the OOW shining spotlights this year. For WebCenter and BPM, we had dedicated hands-on labs for Documents Cloud Service and Process Cloud Service @ the Intercontinental. In addition, we had live demos including Documents Cloud Service, Process Cloud Services and Oracle Social Network (OSN) throughout the week. Documents Cloud Service and OSN were featured prominently in the Thomas Kurian OOW Keynote (from the 46 minute mark) and the FMW General Session (from the 40 minute mark).
The Oracle WebCenter & BPM Community
Oracle OpenWorld is unmatched in providing you with opportunities to interact and engage with other WebCenter & BPM customers and experts from among our partner and employee communities. It was great to see everyone, make new connections and reconnect with old friends. We look forward to seeing you all again next year!
Suppose it's your job to identify SQL that may run slower in the about-to-be-upgrated Oracle Database. It's tricky because no two systems are alike. Just because the SQL run time is faster in the test environment doesn't mean the decision to upgrade is a good one. In fact, it could be disastrous.
For example; If a SQL statement runs 10 seconds in production and runs 20 seconds in QAT, but the production system is twice as fast as QAT, is that a problem? It's difficult to compare SQL runs times when the same SQL resides in different environments.
In this posting, I present a way to remove the CPU speed differences, so an appropriate "apples to apples" SQL elapsed time comparison can be made, thereby improving our ability to more correctly detect risky SQL that may be placed into the upgraded production system.
And, there is a cool, free, downloadable tool involved!
Why SQL Can Run Slower In Different Environments
There are a number of reasons why a SQL's run time is different in different systems. An obvious reason is a different execution plan. A less obvious and much more complex reason is a workload intensity or type difference. In this posting, I will focus on CPU speed differences. Actually, what I'll show you is how to remove the CPU speed differences so you can appropriately compare two SQL statements. It's pretty cool.
The Mental Gymnastics
If a SQL statement's elapsed time in production is 10 seconds and 20 seconds in QAT, that’s NOT an issue IF the production system is twice as fast.
If this makes sense to you, then what you did was mentally adjust one of the systems so it could be appropriately compared. This is how I did it:
10 seconds in production * production is 2 times as fast as QA = 20 seconds
And in QA the sql ran in 20 seconds… so really they ran “the same” in both environments. If I am considering placing the SQL from the test environment into the production environment, then this scenario does not raise any risk flags. The "trick" is determining "production is 2 times as fast as QA" and then creatively use that information.
Determining The "Speed Value"
Fortunately, there are many ways to determine a system's "speed value." Basing the speed value on Oracle's ability to process buffers in memory has many advantages: a real load is not required or even desired, real Oracle code is being run at a particular version, real operating systems are being run and the processing of an Oracle buffer highly correlates with CPU consumption.
Keep in mind, this type of CPU speed test is not an indicator of scalability (benefit of adding additional CPUs) in any way shape or form. It is simply a measure of brut force Oracle buffer cache logical IO processing speed based on a number of factors. If you are architecting a system, other tests will be required.
As you might expect, I have a free tool you can download to determine the "true speed" rating. I recently updated it to be more accurate, require less Oracle privileges, and also show the execution plan of the speed test tool SQL. (A special thanks to Steve for the execution plan enhancement!) If the execution plan used in the speed tool is difference on the various systems, then obviously we can't expect the "true speeds" to be comparable.
You can download the tool HERE.
How To Analyze The Risk
Before we can analyze the risk, we need the "speed value" for both systems. Suppose a faster system means its speed rating is larger. If the production system speed rating is 600 and the QAT system speed rating is 300, then production is deemed "twice as fast."
Now let's put this all together and quickly go through three examples.
This is the core math:
standardized elapsed time = sql elapsed time * system speed value
So if the SQL elapsed time is 25 seconds and the system speed value is 200, then the standardized "apples-to-apples" elapsed time is 5000 which is 25*200. The "standardized elapsed time" is simply a way to compare SQL elapsed times, not what users will feel and not the true SQL elapsed time.
To make this a little more interesting, I'll quickly go through three scenarios focusing on identifying risk.
1. The SQL truly runs the same in both systems.
Here is the math:
QAT standardized elapsed time = 20 seconds X 300 = 6000 seconds
PRD standardized elapsed time = 10 seconds X 600 = 6000 seconds
In this scenario, the true speed situation is, QAT = PRD. This means, the SQL effectively runs just as fast in QAT as in production. If someone says the SQL is running slower in QAT and therefore this presents a risk to the upgrade, you can confidently say it's because the PRD system is twice as fast! In this scenario, the QAT SQL will not be flagged as presenting a significant risk when upgrading from QAT to PRD.
2. The SQL runs faster in production.
Now suppose the SQL runs for 30 seconds in QAT and for 10 seconds in PRD. If someone was to say, "Well of course it's runs slower in QAT because QAT is slower than the PRD system." Really? Everything is OK? Again, to make a fare comparison, we must compare the system using a standardizing metric, which I have been calling the, "standardized elapsed time."
Here are the scenario numbers:
QAT standardized elapsed time = 30 seconds X 300 = 9000 secondsPRD standardized elapsed time = 10 seconds X 600 = 6000 seconds
In this scenario, the QAT standard elapsed time is greater than the PRD standardized elapsed time. This means the QAT SQL is truly running slower in QAT compared to PRD. Specifically, this means the slower SQL in QAT can not be fully explained by the slower QAT system. Said another way, while we expect the SQL in QAT to run slower then in the PRD system, we didn't expect it to be quite so slow in QAT. There must another reason for this slowness, which we are not accounting for. In this scenario, the QAT SQL should be flagged as presenting a significant risk when upgrading from QAT to PRD.
3. The SQL runs faster in QAT.
In this final scenario, the SQL runs for 15 seconds in QAT and for 10 seconds in PRD. Suppose someone was to say, "Well of course the SQL runs slower in QAT. So everything is OK." Really? Everything is OK? To get a better understanding of the true situation, we need to look at their standardized elapsed times.
QAT standardized elapsed time = 15 seconds X 300 = 4500 secondsPRD standardized elapsed time = 10 seconds X 600 = 6000 seconds
In this scenario, QAT standard elapsed time is less then the PRD standardized elapsed time. This means the QAT SQL is actually running faster in the QAT, even though the QAT wall time is 15 seconds and the PRD wall time is only 10 seconds. So while most people would flag this QAT SQL as "high risk" we know better! We know the QAT SQL is actually running faster in QAT than in production! In this scenario, the QAT SQL will not be flagged as presenting a significant risk when upgrading from QAT to PRD.
Identify risk is extremely important while planning for an upgrade. It is unlikely the QAT and production system will be identical in every way. This mismatch makes identifying risk more difficult. One of the common differences in systems is their CPU processing speeds. What I demonstrated was a way to remove the CPU speed differences, so an appropriate "apples to apples" SQL elapsed time comparison can be made, thereby improving our ability to more correctly detect risky SQL that may be placed into the upgraded production system.
Looking at the "standardized elapsed time" based on Oracle LIO processing is important, but it's just one reason why a SQL may have a different elapsed time in a different environment. One of the big "gotchas" in load testing is comparing production performance to a QAT environment with a different workload. Creating an equivalent workload on different systems is extremely difficult to do. But with some very cool math and a clear understanding of performance analysis, we can also create a more "apples-to-apples" comparison, just like we have done with CPU speeds. But I'll save that for another posting.
All the best in your Oracle performance work!
ADF UI Shell with Alta UI - clean and light:
Fishbowl will host a series of webinars this month about integrating the Google Search Appliance with an Oracle WebCenter or Liferay Portal. Our new product, the GSA Portal Search Suite, fully exposes Google features within portals while also maintaining the existing look and feel.
The first webinar, “The Benefits of Google Search for your Oracle WebCenter or Liferay Portal”, will be held on Wednesday, October 15 from 12:00-1:00 PM CST. This webinar will focus on the benefits of using the Google Search Appliance, which has the best-in-class relevancy and impressive search features, such as spell check and document preview, that Google users are used to.
The second webinar, “Integrating the Google Search Appliance and Oracle WebCenter or Liferay Portal”, further explains how Fishbowl’s GSA Portal Search Suite helps improve the process of setting up a GSA with a WebCenter or Liferay Portal. This product uses configurable portlets so users can choose which Google features to enable and provides single sign-on between the portal and the GSA. The webinar will be held on Wednesday, October 22 from 12:00-1:00 PM CST.
For more information on the GSA Portal Search Suite, read our previous blog post on the topic.
The post Upcoming Webinar Series: Using Google Search with your Oracle WebCenter or Liferay Portal appeared first on Fishbowl Solutions' C4 Blog.
Today’s blog post completes our three-part series with excerpts from our latest white paper, Microsoft Hadoop: Taming the Big Challenge of Big Data. In the first two posts, we discussed the impact of big data on today’s organizations, and its challenges.
Today, we’ll be sharing what organizations can accomplish by using the Microsoft Hadoop solution:
- Improve agility. Because companies now have the ability to collect and analyze data essentially in real time, they can more quickly discover which business strategies are working and which are not, and make adjustments as necessary.
- Increase innovation. By integrating structured and unstructured data sources, the solution provides decision makers with greater insight into all the factors affecting the business and encouraging new ways of thinking about opportunities and challenges.
- Reduce inefficiencies. Data that currently resides in conventional data management systems can be migrated into Parallel Data Warehouse (PDW) for faster information delivery
- Better allocate IT resources. The Microsoft Hadoop solution includes a powerful, intuitive interface for installing, configuring, and managing the technology, freeing up IT staff to work on projects that provide higher value to the organization.
- Decrease costs. Previously, because of the inability to effectively analyze big data, much of it was dumped into data warehouses on commodity hardware, which is no longer required thanks to Hadoop.
Download our full white paper to learn which companies are currently benefiting from Hadoop, and how you can achieve the maximum ROI from the Microsoft Hadoop solution.
If you average out all of Oracle's new product development, it comes to a rate of one new product release every working day of the year. And I think they saved up bunches for OOW. It was difficult to keep up.
It was also difficult to physically keep up with things at OOW, as Oracle utilized the concept of product centers and spread things out over even more of downtown San Francisco this year. For example, Cloud ERP products were centered in the Westin on Market Street. Cloud HCM was located at the Palace Hotel. Sales Cloud took over the 2nd floor of Moscone West. Higher Education focused around the Marriott Marquis. Anything UX, as well as many other hands-on labs, happened at the InterContinental Hotel. And, of course, JavaOne took place at the Hilton on Union Square along with the surrounding area. The geographical separation required even more in the way of making tough choices about where to be and when to be there.
With all that, I think I've figured out a way to organize my own take on the highlights from OOW - with a tip o' the hat to Oracle's Thomas Kurian. Thomas sees Oracle as based around five product lines: engineered systems, database, middleware, packaged applications, and cloud services. The more I consider this framework, the more it makes sense to me. So my plan is to organize the news from OOW around these five product lines over the next few posts here. We'll see if we can't find some clarity in the avalanche.
*.* @YOURSERVERADDRESS:YOURSERVERPORT ## for UDP
*.* @@YOURSERVERADDRESS:YOURSERVERPORT ## for TCPFor rsyslog:
[root@centos01 ~]# grep centos /etc/rsyslog.conf
*.* @centos01:7777Came back to Flume, I used Simple Example for reference and changed a bit. Because I wanted it write to HDFS.
[root@centos01 ~]# grep "^FLUME_AGENT_NAME\=" /etc/default/flume-agent
[root@centos01 ~]# cat /etc/flume/conf/flume.conf
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
#a1.sources.r1.type = netcat
a1.sources.r1.type = syslogudp
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 7777
# Describe the sink
#a1.sinks.k1.type = logger
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://localhost:8020/user/flume/syslog/%Y/%m/%d/%H/
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.batchSize = 10000
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 10000
a1.sinks.k1.hdfs.filePrefix = syslog
a1.sinks.k1.hdfs.round = true
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
[root@centos01 ~]# /etc/init.d/flume-agent start
Flume NG agent is not running [FAILED]
Starting Flume NG agent daemon (flume-agent): [ OK ]Tested to login by ssh.
[root@centos01 ~]# tail -0f /var/log/flume/flume.log
06 Oct 2014 16:35:40,601 INFO [hdfs-k1-call-runner-0] (org.apache.flume.sink.hdfs.BucketWriter.doOpen:208) - Creating hdfs://localhost:8020/user/flume/syslog/2014/10/06/16//syslog.1412588139067.tmp
06 Oct 2014 16:36:10,957 INFO [hdfs-k1-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.renameBucket:427) - Renaming hdfs://localhost:8020/user/flume/syslog/2014/10/06/16/syslog.1412588139067.tmp to hdfs://localhost:8020/user/flume/syslog/2014/10/06/16/syslog.1412588139067
[root@centos01 ~]# hadoop fs -ls hdfs://localhost:8020/user/flume/syslog/2014/10/06/16/syslog.1412588139067
14/10/06 16:37:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r-- 1 flume supergroup 299 2014-10-06 16:36 hdfs://localhost:8020/user/flume/syslog/2014/10/06/16/syslog.1412588139067
[root@centos01 ~]# hadoop fs -cat hdfs://localhost:8020/user/flume/syslog/2014/10/06/16/syslog.1412588139067
14/10/06 16:37:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
sshd: Accepted password for surachart from 192.168.111.16 port 65068 ssh2
sshd: pam_unix(sshd:session): session opened for user surachart by (uid=0)
su: pam_unix(su-l:session): session opened for user root by surachart(uid=500)
su: pam_unix(su-l:session): session closed for user rootLook good... Anyway, It needs to adapt more...
Written By: Surachart Opun http://surachartopun.com
This is the first entry in a series of random articles about some useful internals-to-know of the awesome Oracle Database In-Memory column store. I intend to write about Oracle’s IM stuff that’s not already covered somewhere else and also about some general CPU topics (that are well covered elsewhere, but not always so well known in the Oracle DBA/developer world).
Before going into further details, you might want to review the Part 0 of this series and also our recent Oracle Database In-Memory Option in Action presentation with some examples. And then read this doc by Intel if you want more info on how the SIMD registers and instructions get used.
There’s a lot of talk about the use of your CPUs’ SIMD vector processing capabilities in the Oracle inmemory module, let’s start by checking if it’s enabled in your database at all. We’ll look into Linux/Intel examples here.
The first generation of SIMD extensions in Intel Pentium world were called MMX. It added 8 new XMMn registers, 64 bits each. Over time the registers got widened, more registers and new features were added. The extensions were called Streaming SIMD Extensions (SSE, SSE2, SSSE3, SSE4.1, SSE4.2) and Advanced Vector Extensions (AVX and AVX2).
The currently available AVX2 extensions provide 16 x 256 bit YMMn registers and the AVX-512 in upcoming King’s Landing microarchitecture (year 2015) will provide 32 x 512 bit ZMMn registers for vector processing.
So how to check which extensions does your CPU support? On Linux, the “flags” column in /proc/cpuinfo easily provides this info.
Let’s check the Exadatas in our research lab:
$ grep "^model name" /proc/cpuinfo | sort | uniq model name : Intel(R) Xeon(R) CPU E5540 @ 2.53GHz $ grep ^flags /proc/cpuinfo | egrep "avx|sse|popcnt" | sed 's/ /\n/g' | egrep "avx|sse|popcnt" | sort | uniq popcnt sse sse2 sse4_1 sse4_2 ssse3
So the highest SIMD extension support on this Exadata V2 is SSE4.2 (No AVX!)
$ grep "^model name" /proc/cpuinfo | sort | uniq model name : Intel(R) Xeon(R) CPU X5670 @ 2.93GHz $ grep ^flags /proc/cpuinfo | egrep "avx|sse|popcnt" | sed 's/ /\n/g' | egrep "avx|sse|popcnt" | sort | uniq popcnt sse sse2 sse4_1 sse4_2 ssse3
Exadata X2 also has SSE4.2 but no AVX.
$ grep "^model name" /proc/cpuinfo | sort | uniq model name : Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz $ grep ^flags /proc/cpuinfo | egrep "avx|sse|popcnt" | sed 's/ /\n/g' | egrep "avx|sse|popcnt" | sort | uniq avx popcnt sse sse2 sse4_1 sse4_2 ssse3
The Exadata X3 supports the newer AVX too.
My laptop (Macbook Pro late 2013):
The Exadata X4 has not yet arrived to our lab, so I’m using my laptop as an example of a latest available CPU with AVX2:
Update: Jason Arneil commented that the X4 does not have AVX2 capable CPUs (but the X5 will)
$ grep "^model name" /proc/cpuinfo | sort | uniq model name : Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz $ grep ^flags /proc/cpuinfo | egrep "avx|sse|popcnt" | sed 's/ /\n/g' | egrep "avx|sse|popcnt" | sort | uniq avx avx2 popcnt sse sse2 sse4_1 sse4_2 ssse3
The Core-i7 generation supports everything up to the current AVX2 extension set.
So, which extensions is Oracle actually using? Let’s check!
As Oracle needs to run different binary code on CPUs with different capabilities, some of the In-Memory Data (kdm) layer code has been duplicated into separate external libraries – and then gets dynamically loaded into Oracle executable address space as needed. You can run pmap on one of your Oracle server processes and grep for libshpk:
$ pmap 21401 | grep libshpk 00007f0368594000 1604K r-x-- /u01/app/oracle/product/184.108.40.206/dbhome_1/lib/libshpksse4212.so 00007f0368725000 2044K ----- /u01/app/oracle/product/220.127.116.11/dbhome_1/lib/libshpksse4212.so 00007f0368924000 72K rw--- /u01/app/oracle/product/18.104.22.168/dbhome_1/lib/libshpksse4212.so
My (educated) guess is that the “shpk” in libshpk above stands for oS dependent High Performance [K]ompression. “s” prefix normally means platform dependent (OSD) code and this low-level SIMD code sure is platform and CPU microarchitecture version dependent stuff.
Anyway, the above output from an Exadata X2 shows that SSE4.2 SIMD HPK libraries are used on this platform (and indeed, X2 CPUs do support SSE4.2, but not AVX).
Let’s list similar files from $ORACLE_HOME/lib:
$ cd $ORACLE_HOME/lib $ ls -l libshpk*.so -rw-r--r-- 1 oracle oinstall 1818445 Jul 7 04:16 libshpkavx12.so -rw-r--r-- 1 oracle oinstall 8813 Jul 7 04:16 libshpkavx212.so -rw-r--r-- 1 oracle oinstall 1863576 Jul 7 04:16 libshpksse4212.so
So, there are libraries for AVX and AVX2 in the lib directory too (the “12” suffix for all file names just means Oracle version 12). The AVX2 library is almost empty though (and the nm/objdump commands don’t show any Oracle functions in it, unlike in the other files).
Let’s run pmap on a process in my new laptop (which supports AVX and AVX2 ) to see if the AVX2 library gets used:
$ pmap 18969 | grep libshpk 00007f85741b1000 1560K r-x-- libshpkavx12.so 00007f8574337000 2044K ----- libshpkavx12.so 00007f8574536000 72K rw--- libshpkavx12.so
Despite my new laptop supporting AVX2, only the AVX library is used (the AVX2 library is named libshpkavx212.so). So it looks like the AVX2 extensions are not used yet in this version (it’s the first Oracle 22.214.171.124 GA release without any patches). I’m sure this will be added soon, along with more features and bugfixes.
To be continued …Related Posts
- Oracle Memory Troubleshooting, Part 4: Drilling down into PGA memory usage with…
- Oracle Core: Essential Internals for DBAs and Developers book by Jonathan Lewis
- Our take on the Oracle Database 12c In-Memory Option
- Oracle X$ tables – Part 1 – Where do they get their data from?
- When do Oracle Parallel Execution Slaves issue buffered physical reads – Part 1?
An updated version of the Oracle BigDataLite VM came out a couple of weeks ago, and as well as updating the core Cloudera CDH software to the latest release it also included Oracle Big Data SQL, the SQL access layer over Hadoop that I covered on the blog a few months ago (here and here). Big Data SQL takes the SmartScan technology from Exadata and extends it to Hadoop, presenting Hive tables and HDFS files as Oracle external tables and pushing down the filtering and column-selection of data to individual Hadoop nodes. Any table registered in the Hive metastore can be exposed as an external table in Oracle, and a BigDataSQL agent installed on each Hadoop node gives them the ability to understand full Oracle SQL syntax rather than the cut-down SQL dialect that you get with Hive.
There’s two immediate use-cases that come to mind when you think about Big Data SQL in the context of BI and data warehousing; you can use Big Data SQL to include Hive tables in regular Oracle set-based ETL transformations, giving you the ability to reference Hive data during part of your data load; and you can also use Big Data SQL as a way to access Hive tables from OBIEE, rather than having to go through Hive or Impala ODBC drivers. Let’s start off in this post by looking at the ETL scenario using ODI12c as the data integration environment, and I’ll come back to the BI example later in the week.
You may recall in a couple of earlier posts earlier in the year on ETL and data integration on Hadoop, I looked at a scenario where I wanted to geo-code web server log transactions using an IP address range lookup file from a company called MaxMind. To determine the country for a given IP address you need to locate the IP address of interest within ranges listed in the lookup file, something that’s easy to do with a full SQL dialect such as that provided by Oracle:
In my case, I’d want to join my Hive table of server log entries with a Hive table containing the IP address ranges, using the BETWEEN operator – except that Hive doesn’t support any type of join other than an equi-join. You can use Impala and a BETWEEN clause there, but in my testing anything other than a relatively small log file Hive table took massive amounts of memory to do the join as Impala works in-memory which effectively ruled-out doing the geo-lookup set-based. I then went on to do the lookup using Pig and a Python API into the geocoding database but then you’ve got to learn Pig, and I finally came up with my best solution using Hive streaming and a Python script that called that same API, but each of these are fairly involved and require a bit of skill and experience from the developer.
But this of course is where Big Data SQL could be useful. If I could expose the Hive table containing my log file entries as an Oracle external table and then join that within Oracle to an Oracle-native lookup table, I could do my join using the BETWEEN operator and then output the join results to a temporary Oracle table; once that’s done I could then use ODI12c’s sqoop functionality to copy the results back down to Hive for the rest of the ETL process. Looking at my Hive database using SQL*Developer 4.0.3’s new ability to work with Hive tables I can see the table I’m interested in listed there:
and I can also see it listed in the DBA_HIVE_TABLES static view that comes with Big Data SQL on Oracle Database 12c:
SQL> select database_name, table_name, location 2 from dba_hive_tables 3 where table_name like 'access_per_post%'; DATABASE_N TABLE_NAME LOCATION ---------- ------------------------------ -------------------------------------------------- default access_per_post hdfs://bigdatalite.localdomain:8020/user/hive/ware house/access_per_post default access_per_post_categories hdfs://bigdatalite.localdomain:8020/user/hive/ware house/access_per_post_categories default access_per_post_full hdfs://bigdatalite.localdomain:8020/user/hive/ware house/access_per_post_full
There are various ways to create the Oracle external tables over Hive tables in the linked Hadoop cluster, including using the new DBMS_HADOOP package to create the Oracle DDL from the Hive metastore table definitions or using SQL*Developer Data Modeler to generate the DDL from modelled Hive tables, but if you know the Hive table definition and its not too complicated, you might as well just write the DDL statement yourself using the new ORACLE_HIVE external table access driver. In my case, to create the corresponding external table for the Hive table I want to geo-code, it looks like this:
CREATE TABLE access_per_post_categories( hostname varchar2(100), request_date varchar2(100), post_id varchar2(10), title varchar2(200), author varchar2(100), category varchar2(100), ip_integer number) organization external (type oracle_hive default directory default_dir access parameters(com.oracle.bigdata.tablename=default.access_per_post_categories));
Then it’s just a case of importing the metadata for the external table over Hive, and the tables I’m going to join to and then load the results into, into ODI’s repository and then create a mapping to bring them all together.
Importantly, I can create the join between the tables using the BETWEEN clause, something I just couldn’t do when working with Hive tables on their own.
Running the mapping then joins the webserver log lookup table to the geocoding IP address range lookup table through the Oracle SQL engine, removing all the complexity of using Hive streaming, Pig or the other workaround solutions I used before. What I can then do is add a further step to the mapping to take the output of my join and use that to load the results back into Hive, like this:
I’ll then use IKM SQL to to Hive-HBase-File (SQOOP) knowledge module to set up the export from Oracle into Hive.
Now, when I run the mapping I can see the initial table join taking place between the Oracle native table and the Hive-sourced external table, and the results then being exported back into Hadoop at the end using the Sqoop KM.
Finally, I can view the contents of the downstream Hive table loaded via Sqoop, and see that it does in-fact contain the country name for each of the page accesses.
Oracle Big Data SQL isn’t a solution suitable for everyone; it only runs on the BDA and requires Exadata for the database access, and it’s an additional license cost on top of the base BDA software bundle. But if you’ve got it available it’s an excellent way to blend Hive and Oracle data, and a great way around some of the restrictions around HiveQL and the Hive JDBC/ODBC drivers. More on this topic later next week, when I’ll look at using Big Data SQL in conjunction with OBIEE 11g.
Why should anyone else in the world care what I like best about myself?
I have no idea. That is for sure. But, hey, what can I say? This is the world we live in (I mean: the artificial environment humans have created, mainly to avoid actually living in and on our amazing world).
It is an age of, ahem, sharing. And, ahem, advertising. Actually, first and foremost, advertising.
Anyway, screw all that. Here's what I like best about myself:
I love to be with kids. And I am, to put it stupidly but perhaps clearly, a kid whisperer.
Given the choice between spending time with an adult or spending time with a child, there is no contest. None at all. It's a bit of a compulsion, I suppose, but....
If there is a child in the room, I pay them all of my attention, I cannot stop myself from doing this. It just happens. Adults, for the most part, disappear. I engage with a child as a peer, another whole human. And usually children respond to me instantly and with great enthusiasm.
Chances are, if your child is between, say, three months old to five years, we will be fast friends within minutes. Your cranky baby might fall asleep in my arms, as I sing Moonshadow to her or whisper nonsense words in her ear. Your shy three-year old son might find himself talking excitedly about a snake he saw on a trail that day (he hadn't mentioned it to you). Your teenage daughter might be telling me about playing games on her phone and how she doesn't think her dad realizes how much she is doing it.
I have the most amazing discussions with children. And though I bet this will sound strange to you: some of my favorite and memorable conversations have been with five month old babies. How is this possible, you might wonder. They can't even talk. Well, you can find ouit. Just try this at home with your baby:
Hold her about a foot away from your face, cradled in your arms. Look deeply and fully into her eyes. Smile deeply. And then say something along these lines, moving your mouth slowly: "Ooooh. Aaaaah. Maaaaa. Paaaaa." And then she will (sometimes) answer back, eyes never leaving yours....and you have a conversation. Your very first game of verbal Ping Pong.
I suppose I could try to explain the feeling of pure happiness I experience at moments like this. I don't think, though, that written language is good for stuff like that. It's better for recording knowledge needed to destroy more and more of our planet to make humans comfortable.
And with my granddaughter, oh, don't even get me started. Sometimes I will be talking to her, our heads close together, and realize her face has gone into this kind of open, relaxed state in which she is rapt, almost in a trance, absorbing everything I am saying, the sound of my voice, my mouth moving. Just taking it all in. You'd better believe that I put some thought into what I am saying to this incredibly smart and observant "big girl." (who turns three in three weeks)
Here's another "try this at home" with your three year old (or two or four): talk about shadows. Where do they come from/ How do they relate to your body? Why does their shape change as the day goes on? Loey and I have had fun with shadows several times.
I have always been this way. I have no idea why. I have this funny feeling that it might actually be at least in some small way the result of a genetic mutation. I have a nephew who resembles me in several different, seemingly unconnected ways, including this love of and deep affinity for children.
I don't think that many people understand what I am doing when I spend time with children. I am called a "doting" grandfather. It offends me, though I certainly understand that no offense was intended.
I don't dote on Loey. Instead,I seek out every opportunity to share my wonder of our world and life with her, help her understand and live in the world as effectively as possible. What this has meant lately is that I talk with her a lot about trees, how much I love them, how amazing they are.
One day at the park, as we walked past the entrance to the playground, I noticed a very small oak sapling - in essence, a baby oak tree.
When we got inside the park, there was a mature oak towering over our stroller. I asked Loey if she wanted to see a baby tree. She said yes, so I picked her up to get close to the mature oak's leaf. I showed her the shape of the leaf, and the big tree to which it was attached.
Then I took her outside and we looked at the sapling. I showed her how the leaves on this tiny baby tree were the same, shape and size, as those on the big tree. That's how we knew it was a baby of that big tree. And it certainly was interesting that the leaves would be the same size on the tiny sapling. Held her attention throughout. That was deeply satisfying.
Mostly what I do is look children directly in the eyes, give them my full attention, smile with great joy at seeing them. Babies are deeply hard-wired to read faces. They can see in the wrinkles around my widened eyes and the smile that is stretching across my face that I love them, accept them fully. And with that more or less physical connection established, they seem to relax, melt, soften with trust. They know they can trust me, and they are absolutely correct.
In that moment, I would do anything for them.
This wisdom (that's how I see it) to accept the primacy of our young, my willingness to appear to adults as absolutely foolish, but to a child appear as a bright light, making them glow right back at me:
That is what I like best about me.
I’m on record as noting and agreeing with an industry near-consensus that Spark, rather than Tez, will be the replacement for Hadoop MapReduce. I presumed that Hortonworks, which is pushing Tez, disagreed. But Shaun Connolly of Hortonworks suggested a more nuanced view. Specifically, Shaun tweeted thoughts including:
Tez vs Spark = Apples vs Oranges.
Spark is general-purpose engine with elegant APIs for app devs creating modern data-driven apps, analytics, and ML algos.
Tez is a framework for expressing purpose-built YARN-based DAGs; its APIs are for ISVs & engine/tool builders who embed it
[For example], Hive embeds Tez to convert its SQL needs into purpose-built DAGs expressed optimally and leveraging YARN
That said, I haven’t yet had a chance to understand what advantages Tez might have over Spark in the use cases that Shaun relegates it to.
- The Twitter discussion with Shaun was a spin-out from my research around streaming for Hadoop.
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
UnhideWhenUsed="false" Name="Table Grid"/>
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
UnhideWhenUsed="false" Name="Light Shading"/>
UnhideWhenUsed="false" Name="Light List"/>
UnhideWhenUsed="false" Name="Light Grid"/>
UnhideWhenUsed="false" Name="Medium Shading 1"/>
UnhideWhenUsed="false" Name="Medium Shading 2"/>
UnhideWhenUsed="false" Name="Medium List 1"/>
UnhideWhenUsed="false" Name="Medium List 2"/>
UnhideWhenUsed="false" Name="Medium Grid 1"/>
UnhideWhenUsed="false" Name="Medium Grid 2"/>
UnhideWhenUsed="false" Name="Medium Grid 3"/>
UnhideWhenUsed="false" Name="Dark List"/>
UnhideWhenUsed="false" Name="Colorful Shading"/>
UnhideWhenUsed="false" Name="Colorful List"/>
UnhideWhenUsed="false" Name="Colorful Grid"/>
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
UnhideWhenUsed="false" Name="Light List Accent 1"/>
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
UnhideWhenUsed="false" Name="Light List Accent 2"/>
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
UnhideWhenUsed="false" Name="Light List Accent 3"/>
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
UnhideWhenUsed="false" Name="Light List Accent 4"/>
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
UnhideWhenUsed="false" Name="Light List Accent 5"/>
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
UnhideWhenUsed="false" Name="Light List Accent 6"/>
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
/* Style Definitions */
mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
mso-fareast-font-family:"Times New Roman";
Setting passwordless ssh root connection using dcli is fast and simple and will easy later to execute commands on all servers using this utility.
In order to do that you should have either:
DNS resolution to all Database and Storage nodes OR have them registered in /etc/hosts
1) Create a parameter file that contains all the server names you want to reach via dcli, tipically we have a cell_group for storage cells, a dbs_group for database servers and an all_group for both of them.
The parameter files will have only the server name, in short format
ie: all_group will have on an Exadata quarter rack:
2) As root user create ssh equivalence:
ssh-keygen -t rsa
3) Distribute the key to all servers
dcli -g ./all_group -l root -k -s '-o StrictHostKeyChecking=no'
dcli -g all_group -l root hostname
The genesis of this post is that:
- Hortonworks is trying to revitalize the Apache Storm project, after Storm lost momentum; indeed, Hortonworks is referring to Storm as a component of Hadoop.
- Cloudera is talking up what I would call its human real-time strategy, which includes but is not limited to Flume, Kafka, and Spark Streaming. Cloudera also sees a few use cases for Storm.
- This all fits with my view that the Current Hot Subject is human real-time data freshness — for analytics, of course, since we’ve always had low latencies in short-request processing.
- This also all fits with the importance I place on log analysis.
- Cloudera reached out to talk to me about all this.
Of course, we should hardly assume that what the Hadoop distro vendors favor will be the be-all and end-all of streaming. But they are likely to at least be influential players in the area.
In the parts of the problem that Cloudera emphasizes, the main tasks that need to be addressed are:
- Getting data into the plumbing from whatever systems it’s being generated in. This is the province of Flume, one of Cloudera’s earliest projects. I’d add that this is also one of the core competencies of Splunk.
- Getting data where it needs to go. Flume can do this. Kafka, a publish/subscribe messaging system, can do it in a more general way, because streams are sent to a Kafka broker, which then re-streams them to their ultimate destination.
- Processing data in flight. Storm can do this. Spark Streaming can do it more easily. Spark Streaming is or soon will be a part of every serious Hadoop distribution. Flume can do some lightweight processing as well.
- Serving up data for further query. Cloudera would like you to do this via HBase or Impala. But Oracle is a fine choice too, and indeed a popular choice among Cloudera customers.
I guess there’s also a step of receiving data out of the plumbing system. Cloudera and I glossed over that aspect when we talked, but I’ll say:
- Spark commonly lives over HDFS (Hadoop Distributed File System).
- Flume feeds HDFS. Flume was also hacked years ago — rah-rah open source! — to feed Kafka instead, and also to be fed by it.
Cloudera has not yet decided whether to make Kafka part of CDH (which stands for Cloudera Distribution yada yada Hadoop). Considerations in that probably include:
- Kafka has impressive adoption among high-profile internet companies, but not so much among conventional enterprises.
- Surely not coincidentally, Kafka is missing features in areas such as security (e.g. it lacks Kerberos integration).
- Kafka lacks cool capabilities to let you configure rather than code, although Cloudera thinks that in some cases you can work around this problem by marrying Kafka and Flume.
I still find it bizarre that a messaging system be named after an author famous for writing about depressingly inescapable situations. Also, I wish that:
- Kafka had something to do with transformations.
- The name Kafka had been used by a commercial software company, which could offer product trials.
Highlights from the Storm vs. Spark Streaming vs. Samza part of my discussion with Cloudera include:
- Storm has a companion project Trident that makes Storm somewhat easier to program and/or configure. But Trident only has some of the usability advantages of Spark Streaming.
- Cloudera sees no advantages to Samza, a Kafka companion project, when compared with whichever of Spark Streaming or Storm + Trident is better suited to a particular use case.
- Cloudera likes the rich set of primitives that Spark Streaming inherits from Spark. Cloudera also notes that, if you learn to program over Spark for any reason, then you will in particular have learned how to program over Spark Streaming.
- Spark Streaming lets you join Spark Streaming data to other data that Spark can get access to. I agree with Cloudera that this is an important advantage.
- Cloudera sees Storm’s main advantages as being in latency. If you need 10-200 millisecond latency, Storm can give you that today while Spark Streaming can’t. However, Cloudera notes that to write efficiently to your persistent store — which Cloudera fondly hopes but does not insist will be HBase or Impala — you may need to micro-batch your writes anyway.
Also, Spark Streaming has a major advantage over bare Storm in whether you have to manually configure your topology, but I wasn’t clear as to how far Trident closes that particular gap.
Cloudera and I didn’t particularly talk about data-consuming technologies such as BI, predictive analytics, or analytic applications, but we did review use cases a bit. Nothing too surprising jumped out. Indeed, the discussion reminded me of a 2007 list I did of applications — other than extreme low-latency ones — for CEP (Complex Event Processing).
- Top-of-mind were things that fit into one or more of the buckets “internet”, “retail”, “recommendation/personalization”, “security” or “anti-fraud”.
- Transportation/logistics got mentioned, to which I replied that the CEP vendors had all seemed to have one trucking/logistics client each.
- At least in theory, there are potentially huge future applications in health care.
In general, candidate application areas for streaming-to-Hadoop match those that involve large volumes of machine-generated data.
Edit: Shortly after I posted this, Storm creator Nathan Marz put up a detailed and optimistic post about the history and state of Storm
Complete information about the security fix availability should be reviewed, before applying the fix, in MOS DOC:
Responses to common Exadata security scan findings (Doc ID 1405320.1)
The security fix is available for download from:
The summary installation instructions are as follows:
1) Download getPackage/bash-3.2-33.el5_11.4.x86_64.rpm
2) Copy bash-3.2-33.el5_11.4.x86_64.rpm into /tmp at both database and storage nodes.
3) Remove rpm exadata-sun-computenode-exact
rpm -e exadata-sun-computenode-exact
4) On compute nodes install bash-3.2-33.el5_11.4.x86_64.rpm using this command:
rpm -Uvh /tmp/bash-3.2-33.el5_11.4.x86_64.rpm
5) On storage nodes install bash-3.2-33.el5_11.4.x86_64.rpm using this command:
rpm -Uvh --nodeps /tmp/bash-3.2-33.el5_11.4.x86_64.rpm
6) Remove /tmp/bash-3.2-33.el5_11.4.x86_64.rpm from all nodes
As a side effect of applyin this fix, during future upgrades on the database nodes, a warning will appear informing:
The "exact package" was not found and it will use minimal instead.
That's a normal and expected message and will not interfere with the upgrade.
After the very welcome tradition of breakfast at Lori's Diner, I had time to register and then get myself down to Moscone South for my first session of the day. I'd planned to listen to Paul Vallee's security talk because I'd been unable to register for Gwen Shapira's Analyzing Twitter data with Hadoop session but noticed spare seats as I passed the room, so switched. I love listening to Gwen talk on any subject because her enthusiasm is contagious. A few of the demos went a little wrong but I still got a nice overview of the various components of a Hadoop solution (which is an area I've never really looked at much) so the session flew by. Good stuff.
Next up was Yet-another-Oracle-ACE-Director Arup Nanda's presentation on Demystifying Cache Buffer Chains. The main reason I attended was to see how he presented the subject and wasn't expecting to learn too much but it's an important subject, particularly now I'm working with RAC more often and consolidated environments. CBC latch waits are on my radar once more!
Next up was 12 things about 12c, a session of 12 speakers given 5 minutes to talk about, well, 12c stuff. Debra Lilley organised this and despite all her concerns that she'd expressed leading up to it, it went very smoothly, so hats off to Debra and to the speakers for behaving themselves with the timing! I was particularly concerned that we kicked off with Jonathan Lewis Big problem with putting him on first - will he actually be able to stay within the time constraints? Because he'll get too excited and want to talk about things in more depth. He did do it, but it was tough as he raced towards the finishing line
The only thing that bugged me about this was that I hadn't realised it was two session slots (makes complete sense if I'd performed some simple maths!) but it was very annoying when they kicked everyone out of the room at half-time before readmitting them. Yes, there are rules, but this was one of the more stupid. It annoyed me enough that I decided to skip the second half and attend the Enkitec panel session instead.
What an amazing line-up of Exadata geek talent they had on one stage for Expert Oracle Exadata: Then and Now ....
Including most of the authors of the original book as well as the authors who are writing the next edition which should be out before the end of the year.
From left-to-right : Karl Arao, Martin Bach, Kerry Osborne, Andy Colvin, Tanel Poder and Frits Hoogland.
They talked a little about the original version of the book (largely based on V2) and how far Exadata had come since then, but it was a pretty open session with questions shooting around all over the place and great fun. Nice way for me to wrap up my user group conference activities for the day and head out into the sun for Larry's Opening Keynote.
First we had the traditional vendor blah-blah-blah about stuff I couldn't care less about but, in shocking news, I actually enjoyed it! Maybe it's because it was Intel and so I'm probably more interested in what they're doing, but it was pretty ok stuff. All the keynotes are available online here.
Then it was LarryTime. Seemed on pretty good form by recent standards although I can summarise it simply as Cloud, Cloud and more Cloud. There's no getting away from the fact that it's been quite the about-turn from him in his attitude towards the Cloud. I did appreciate the "we're only just getting started" message and I suppose I've become innured to how accurate the actual facts are in his presentations and to the attacks on competitors so I sort of enjoy his keynotes more than most.
At this stage, the jetlag was biting *hard* and I ended up missing yet another ACE dinner but from all the reports I heard it was the best ever by some distance so I was gutted to miss out on it. But when you're body is saying sleep whilst you're walking, sometimes you have to listen to it! Then again, when it decides to wake you up again at 2:30, perhaps you should tell it to go and take a running jump!
Oracle's OpenWorld has ended. It was the fist time I attended this great event and it really is a "great" event:
- 60000 attendees from 145 countries
- 500 partners or customers in the exhibit hall
- 400 demos in the DEMOgrounds
- 2500 sessions