Skip navigation.

Other

Webinar: 21st Century Education Goes Digital with Oracle WebCenter

Oracle Corporation Banner 21st Century Education Goes Digital with Oracle WebCenter

Learn how The Digital Campus with WebCenter can address top-of-mind issues for creating exceptional digital learning experiences, put content in context for the user and optimize business processes

The global education market is under-going a fundamental transformation — from the printed textbook and physical classroom to newer digital, online and mobile experiences.  Today, students can learn anywhere, anytime, from anyone on any device, bridging administrative and academic systems into single universal view.

Oracle WebCenter is at the center of innovation and engagement for any digital enterprise looking to empower exceptional experiences for students, faculty, administrators and researchers. It powerfully connects people, processes, and information with the most complete portfolio of portal, content management, Web experience management and collaboration technologies to enable student success.

Join this special event featuring the University of Pretoria, Fishbowl Solutions and Oracle, whose experts will illustrate successful design patterns and solution delivery for:

  • Student Portals. Create rich, interactive student experiences
  • Digital Repository. Deliver advanced content capture, tagging and sharing while securing enterprise data
  • Admissions. Leverage image capture and business process design to enable improved self-service

Attendees will benefit from the use-case insights and strategies of a world re-knowned university as well as a pre-built solution approach from Oracle and solutions partner Fishbowl to enable a truly modern digital campus.

Audio information:

Dial in Numbers: U.S / Canada: 877-698-7943 (toll free)
International: 706-679-0060(chargeable)
Passcode:
solutions2 Red Button Top Register Now Red Button Bottom

Calendar Sep 11, 2014
10:00 AM PT |
01:00 PM ET

If you are an employee or official of a government organization, please click here for important ethics information regarding this event. Hardware and Software Engineered to Work Together Copyright © 2014, Oracle Corporation and/or its affiliates.
All rights reserved. Contact Us | Legal Notices and Terms of Use | Privacy Statement SEO100151617

Oracle Corporation – Worldwide Headquarters, 500 Oracle Parkway, OPL – E-mail Services, Redwood Shores, CA 94065, United States

Your privacy is important to us. You can login to your account to update your e-mail subscriptions or you can opt-out of all Oracle Marketing e-mails at any time.

Please note that opting-out of Marketing communications does not affect your receipt of important business communications related to your current relationship with Oracle such as Security Updates, Event Registration notices, Account Management and Support/Service communications.

The post Webinar: 21st Century Education Goes Digital with Oracle WebCenter appeared first on Fishbowl Solutions' C4 Blog.

Categories: Fusion Middleware, Other

An idealized log management and analysis system — from whom?

DBMS2 - Sun, 2014-09-07 06:38

I’ve talked with many companies recently that believe they are:

  • Focused on building a great data management and analytic stack for log management …
  • … unlike all the other companies that might be saying the same thing :)
  • … and certainly unlike expensive, poorly-scalable Splunk …
  • … and also unlike less-focused vendors of analytic RDBMS (which are also expensive) and/or Hadoop distributions.

At best, I think such competitive claims are overwrought. Still, it’s a genuinely important subject and opportunity, so let’s consider what a great log management and analysis system might look like.

Much of this discussion could apply to machine-generated data in general. But right now I think more players are doing product management with an explicit conception either of log management or event-series analytics, so for this post I’ll share that focus too.

A short answer might be “Splunk, but with more analytic functionality and more scalable performance, at lower cost, plus numerous coupons for free pizza.” A more constructive and bottoms-up approach might start with: 

  • Agents for any kind of machine that admits streams of data.
  • Parsers that:
    • Immediately identify explicit name-value pairs in popular formats such as JSON or XML.
    • Also immediately extract a significant fraction of all implicit fields in text strings — timestamps for sure, but also a lot else. (Splunk is the current gold standard for such capabilities.)
    • Allow you to easily write rules for more such extractions.
  • Immediate indexing in line with everything the parsers do.
  • Easy import of log files, relational tables, and other relevant data structures.
  • Queries that can exploit all the indexes, at least up to the functionality level of SQL 2003 analytics (including windowing) and StreamSQL, of course with …
  • … blazing scalable performance.
  • Strong workload management and concurrent performance support. (Teradata is the gold standard for such capabilities in the analytic sphere.)
  • Various other mature-DBMS features, e.g. in backup, manageability, and uptime.

Further, there would be numerous styles of business intelligence interface, at least including:

  • Generic BI like we generally see for tabular data.
  • Constantly-changing displays of streaming data.
  • BI with an event-series orientation.
  • Strong alerting.
  • Mobile versions of everything.

And there would be good support for quick-turnaround, easily-operationalized predictive analytics, of the sort that’s fairly central to the visions for Kiji and Spark.

The data management part of that is particularly hard, in that:

  • Different architectures seem naturally well-suited for different parts of the problem.
  • Maturing a new data management product is always difficult, costly and slow.

My thoughts on strengths and weaknesses of some obvious log data management contenders start:

  • Oracle, IBM, and Microsoft have a lot of heft in all things database. But while each of those vendors has great resources and occasionally impressive pieces of new database engineering, none shows much evidence of framing, let alone solving, the problem in the right way(s).
  • SAP owns Sybase, HANA, several old CEP companies, and Business Objects. Add them to the Oracle/IBM/Microsoft list.
  • Teradata has a lot going for them. Their core analytic data management strengths are obvious. They’ve owned Aster for a while, and Aster innovated nPath quite some time ago. They recently added Hadapt, a leader in schema-on-need, as well as Revelytix, which has some good ideas in dataset management. Like most other DBMS vendors, however, Teradata doesn’t yet have much of a story for streaming data, and anyhow the most optimistic case for Teradata involves the difficult task of stitching together disparate data management technologies.
  • HP Vertica has a decent position as well. Probably more proven in general concurrent, scalable performance than others in their peer group (Netezza, Greenplum, et al.), Vertica also was relatively early in innovations relevant to log analysis, including a range of time series/event series features and its own schema-on-need effort. Vertica was also founded by people who were also streaming pioneers (there were heavily overlapping groups of academics behind StreamBase, Vertica and VoltDB), but it’s not clear how that background is reflected in present Vertica product.
  • Splunk, of course, has a complete stack. At the data acquisition and parsing layers, it’s second to none, and it has a considerable set of log-appropriate BI capabilities as well. And for data management it in effect is stitching together two different inverted-list data stores, plus Hadoop.
  • Hadoop distribution vendors such as Cloudera, MapR or Hortonworks offer typically bundle a range of relevant capabilities. HDFS (Hadoop Distributed File System) is the default place to dump entire logs. In most distros, Spark offers a new approach to streaming. Impala, Drill and so on offer query. Flume gathers the log data in the first place. But a lot of the cooler capabilities are immature or unproven, and in some cases that’s putting it mildly.

In the interest of length, I’ll omit discussion of smaller vendors, except to say that Platfora’s integrated-stack event series analytics story deserves attention, and I’m disappointed that I never hear about Sumo Logic. And I don’t know a lot about companies positioned as SIEM (Security Information and Event Management), especially now that SenSage has left the scene.

Categories: Other

Migrating Existing PeopleSoft Attachments into the Managed Attachments Solution

This post comes from Fishbowl’s Mark Heupel. Mark is an Oracle Webcenter consultant, and he has worked on a few different projects over the last year helping customers integrate WebCenter with Oracle E-Business Suite and PeopleSoft. One of WebCenter’s strengths is it provides these integrations out-of-the-box, including a document imaging integration to automate invoice processing with WebCenter’s capture, forms recognition and imaging capabilities, as well as workflows leveraging Oracle Business Process Management. Mark discusses WebCenter’s integration with PeopleSoft and its managed attachments solution below.

Application Integration

Oracle’s Managed Attachments solution enables business users in PeopleSoft to attach, scan, and retrieve document attachments stored in an Oracle WebCenter Content Server repository.

One of the issues that our clients face when moving to Oracle’s Managed Attachments solution is determining what to do with the attachments that already exist in PeopleSoft. We at Fishbowl have come up with a method to migrate these attachments into WebCenter Content in bulk while still maintaining the attachments’ context within PeopleSoft.

A high-level view of the solution is as follows. Queries are written on the PeopleSoft side to export each of the attachments, as well as a file containing each attachment’s metadata and PeopleSoft contextual information, to a network share. This is a task done by a PeopleSoft administrator. We then use our Enterprise Batchloader product to bulk load these files into WebCenter Content. We’ve written a customization that overrides the set of services that qualify for Managed Attachments to include our Enterprise Batchloader service. Since the context of the attachments is included in the metadata file, the Enterprise Batchloader check-ins work in the same way that a normal check-in from Managed Attachments would and the attachments retain their PeopleSoft context. Let’s get into the details of how this works.

Managed Attachments Overview

In order to understand the migration strategy, we first need to understand how Managed Attachments works under the covers. The important piece to know for this migration is that the table that stores the Managed Attachment object information on the WebCenter side is the AFObjects table. This table stores the PeopleSoft context information as well as the dDocName of each of the attachments currently being stored in WebCenter. Here is an example of what the AFObjects table looks like:

AFObjects Table

Each row in this table represents one PeopleSoft attachment being managed in WebCenter Content. The dAFApplication, dAFBusinessObjectType, and dAFBusinessObject fields make up the context for where the attachment is located in PeopleSoft. The dAFApplication field represents the application, the dAFObjectType field represents the page, and the dAFBusinessObject field is a pipe delimited list of the primary key values from the page where the attachment is located in PeopleSoft. The dDocName field is simply the dDocName of the content item in WebCenter.

When a user clicks the Managed Attachments link on the PeopleSoft screen a request is made over to WebCenter that contains the contextual page information from PeopleSoft (dAFApplication, dAFBusinessObjectType, and dAFBusinessObject). Using this contextual information, a query is then made against the AFObjects table to find the content IDs of the attachments that should be returned back to the user. A similar request is made when a user checks in a document through the Managed Attachments screen in PeopleSoft. The PeopleSoft context information is sent to WebCenter, the document is checked in, and then a row is inserted into the AFObjects table that contains the PeopleSoft contextual information as well as the dDocName of the newly checked-in document.

Loading Content into WebCenter

In order to be able to successfully load a large number of content items into WebCenter, while still maintaining the correct PeopleSoft context, we had to write a customization to hook into the existing Managed Attachments check-in functionality. The AppAdapterCore component, one of the two components installed on WebCenter for Managed Attachments, contains the core Managed Attachments code. This component contains a list of services such as CHECKIN_NEW that, when called with the PeopleSoft contextual information in the binder (dAFApplication, dAFObjectType, and dAFObject), executes the query that inserts a row into the AFObjects table. The customization that we wrote overrides the list of services specified in the AppAdapterCore component to include our Enterprise Batchloader check-in services. By doing so, we’re able to hook into the same insert query that Managed Attachments already uses, assuming we have placed the correct PeopleSoft context information in the binder.

Here is an example of what a standard Enterprise Batchloader blf (batch load file) would look like:

Batch Load File
As you can see, the file simply contains the action to take (insert), the location of the primary file, and the required metadata fields for WebCenter. In order to assign the correct PeopleSoft context we simply need to specify the dAFApplication, dAFObjectType, and dAFObject fields in the blf file:

Batch Load File 2

This effectively places each of those fields into the binder in WebCenter. When Enterprise Batchloader is run and performs its check-ins into WebCenter, the Managed Attachments functionality gets called and a row is inserted into the AFObjects table for each attachment that specifies the PeopleSoft context information. As long as the correct PeopleSoft contextual information is placed into the Enterprise Batchloader blf file, we’re able to bulk load as many attachments as needed into WebCenter while still retaining the correct PeopleSoft context information for use with the Managed Attachments solution.

I hope this provides you with an example of how your existing PeopleSoft Managed Attachments content could be migrated to WebCenter. After all, getting this content into WebCenter has many additional benefits, such as version control, renditions, retention management and the ability to surface this content to WebCenter-based mobile apps and portals. If you have questions or would like to engage with Fishbowl on such projects, please email info@fishbowlsolutions.com.

 

The post Migrating Existing PeopleSoft Attachments into the Managed Attachments Solution appeared first on Fishbowl Solutions' C4 Blog.

Categories: Fusion Middleware, Other

Notes from a visit to Teradata

DBMS2 - Sun, 2014-08-31 03:17

I spent a day with Teradata in Rancho Bernardo last week. Most of what we discussed is confidential, but I think the non-confidential parts and my general impressions add up to enough for a post.

First, let’s catch up with some personnel gossip. So far as I can tell:

  • Scott Gnau runs most of Teradata’s development, product management, and product marketing, the big exception being that …
  • … Darryl McDonald run the apps part (Aprimo and so on), and no longer is head of marketing.
  • Oliver Ratzesberger runs Teradata’s software development.
  • Jeff Carter has returned to his roots and runs the hardware part, in place of Carson Schmidt.
  • Aster founders Mayank Bawa and Tasso Argyros have left Teradata (perhaps some earn-out period ended).
  • Carson is temporarily running Aster development (in place of Mayank), and has some sort of evangelism role waiting after that.
  • With the acquisition of Hadapt, Teradata gets some attention from Dan Abadi. Also, they’re retaining Justin Borgman.

The biggest change in my general impressions about Teradata is that they’re having smart thoughts about the cloud. At least, Oliver is. All details are confidential, and I wouldn’t necessarily expect them to become clear even in October (which once again is the month for Teradata’s user conference). My main concern about all that is whether Teradata’s engineering team can successfully execute on Oliver’s directives. I’m optimistic, but I don’t have a lot of detail to support my good feelings.

In some quick-and-dirty positioning and sales qualification notes, which crystallize what we already knew before:

  • The Teradata 1xxx series is focused on cost-per-bit.
  • The Teradata 2xxx series is focused on cost-per-query. It is commonly Teradata’s “lead” product, at least for new customers.
  • The Teradata 6xxx series is supposed to be above to do “everything”.
  • The Teradata Aster “Discovery Analytics” platform is sold mainly to customers who have a specific high-value problem to solve. (Randy Lea gave me a nice round dollar number, but I won’t share it.) I like that approach, as it obviates much of the concern about “Wait — is this strategic for us long-term, given that we also have both Teradata database and Hadoop clusters?”

Also:

  • 1xxx and 2xxx systems are meant to be I/O-constrained. 6xxx systems are meant to be constrained mainly by CPU, but every system will be I/O-constrained at some point.
  • There is at least one example of a Very Well Known organization buying Teradata’s Hadoop-only appliance despite not otherwise being a Hadoop customer. Teradata concedes, however, that this is not a common occurrence.
  • Customers are increasingly using co-location rather than their own data centers. Many colo organizations charge more or less strictly by floor space. Hence, there’s a push for maximum processing density per rack, power density and weight be damned.

Speaking of not being CPU-constrained — I heard 7-10% as an estimate for typical Hadoop utilization, and also 10-15%. While I didn’t ask, I presume these figures assume traditional MapReduce types of Hadoop workloads. I’m not sure why these figures are yet lower than eBay’s long-ago estimates of Hadoop “parallel efficiency”.

Like Carson used to do, Jeff shared a variety of hardware and networking tidbits with me. In particular:

  • Jeff is confident in Moore’s Law continuing for at least 5 more years. (I think that’s a near-consensus; the 2020s, however, are another matter.)
  • Teradata still uses SAS rather than SATA for all disk (spinning or solid-state) controllers. They’re now seeing 6-700 MB/sec/device on SSDs (Solid State Disk), up from 3-400.
  • SSD prices are down 60% over the past 6 months, vs. much slower declines previously.
  • Formerly a SanDisk/Pliant partisan, Teradata now thinks there are multiple vendors of good SSDs. (I’m not sure whether they’d be happy if I said which one they currently like best.)
  • Jeff foresees InfiniBand and Ethernet more or less merging. Right now Teradata is using a lot of 56 Gb/sec InfiniBand.

Since Oliver is now a Teradata mucky-muck, I asked about virtual data marts, an idea that he pretty much invented or at least popularized back in his eBay days. Comments included:

  • Teradata now calls them Data Labs.
  • Adoption is very high.
  • One major feature is “time boxing” — they expire after a period of time unless you renew them.
  • Analysis of virtual data mart usage is a good guide as to what you might want to add to your permanent data warehouse.

And I’ll stop here, although I hope that a couple more-focused posts will also eventually flow from the visit.

Categories: Other

Subscription Notifier Version 4.0 Enables WebCenter Users to Create Custom Content Email Notifications

Fishbowl Solutions’ Subscription Notifier has been used by many of our customers for years to manage business content stored in Oracle WebCenter Content. Subscription Notifier automatically sends email notifications based on scheduled queries. Fishbowl released version 4.0 of the product last week, and it includes several significant updates.

Now, users of Subscription Notifier can:

  • Attach native or web-viewable files to notification emails
  • Send individual notification emails for each content item
  • Configure hourly notification schedules
  • Run subscription side effects without sending emails

In addition to the latest updates, the product also offers a host of other features that enable WebCenter users to keep track of their high-value content.

You begin by naming the subscription and specifying whether emails should be sent for items matching the query. The scheduler lets you specify exactly when you want email notifications to go out (note the hourly option, new with version 4.0).

 

SubNoti general settings

The email settings specify who you want to send emails to and how they should appear to recipients. The new “Attach Content” feature gives you the option of sending web-viewable or native files, which provides a way for recipients who don’t use Oracle WebCenter to still see important files. Using the query builder is very simple and determines what content items are included in the subscription. Advanced users also have the option to write more complex queries using SQL.

SubNoti email

The Current Subscription Notifications page gives a summary of all subscriptions. In Version 4.0, simple changes such as enabling, disabling, or deleting subscriptions can be done here.

SubNoti current subscription notifications

Subscription Notifier is a very useful tool for any organization that needs to keep tabs on a large amount of business content. It is part of Fishbowl’s Administration Suite, which also includes Advanced User Security Mapping, Workflow Solution Set, and Enterprise BatchLoader. This set of products works together to simplify the most common administrative tasks in Oracle WebCenter Content.

To learn more about Subscription Notifier, visit Fishbowl’s website or read the press release announcing Version 4.0.

The post Subscription Notifier Version 4.0 Enables WebCenter Users to Create Custom Content Email Notifications appeared first on Fishbowl Solutions' C4 Blog.

Categories: Fusion Middleware, Other

“Freeing business analysts from IT”

DBMS2 - Thu, 2014-08-14 06:21

Many of the companies I talk with boast of freeing business analysts from reliance on IT. This, to put it mildly, is not a unique value proposition. As I wrote in 2012, when I went on a history of analytics posting kick,

  • Most interesting analytic software has been adopted first and foremost at the departmental level.
  • People seem to be forgetting that fact.

In particular, I would argue that the following analytic technologies started and prospered largely through departmental adoption:

  • Fourth-generation languages (the analytically-focused ones, which in fact started out being consumed on a remote/time-sharing basis)
  • Electronic spreadsheets
  • 1990s-era business intelligence
  • Dashboards
  • Fancy-visualization business intelligence
  • Planning/budgeting
  • Predictive analytics
  • Text analytics
  • Rules engines

What brings me back to the topic is conversations I had this week with Paxata and Metanautix. The Paxata story starts:

  • Paxata is offering easy — and hopefully in the future comprehensive — “data preparation” tools …
  • … that are meant to be used by business analysts rather than ETL (Extract/Transform/Load) specialists or other IT professionals …
  • … where what Paxata means by “data preparation” is not specifically what a statistician would mean by the term, but rather generally refers to getting data ready for business intelligence or other analytics.

Metanautix seems to aspire to a more complete full-analytic-stack-without-IT kind of story, but clearly sees the data preparation part as a big part of its value.

If there’s anything new about such stories, it has to be on the transformation side; BI tools have been helping with data extraction since — well, since the dawn of BI. The data movement tool I used personally in the 1990s was Q+E, an early BI tool that also had some update capabilities.* And this use of BI has never stopped; for example, in 2011, Stephen Groschupf gave me the impression that a significant fraction of Datameer’s usage was for lightweight ETL.

*Q+E came from Pioneer Software, the original predecessor of Progress DataDirect, which first came to fame in association with Microsoft Excel and the invention of ODBC.

More generally, I’d say that there are several good ways for IT to give out data access, the two most obvious of which are:

  • “Semantic layers” in BI tools.
  • Data copies in departmental data marts.

If neither of those works for you, then most likely either:

  • Your problem isn’t technology.
  • Your problem isn’t data access.

And so we’ve circled back to what I wrote last month:

Data transformation is a better business to enter than data movement. Differentiated value in data movement comes in areas such as performance, reliability and maturity, where established players have major advantages. But differentiated value in data transformation can come from “intelligence”, which is easier to excel in as a start-up.

What remains to be seen is whether and to what extent any of these startups (the ones I mentioned above, or Trifacta, or Tamr, or whoever) can overcome what I wrote in the same post:

When I talk with data integration startups, I ask questions such as “What fraction of Informatica’s revenue are you shooting for?” and, as a follow-up, “Why would that be grounds for excitement?”

It will be interesting to see what happens.

Categories: Other

Actian Vector Hadoop Edition

DBMS2 - Thu, 2014-08-07 05:12

I have a small blacklist of companies I won’t talk with because of their particularly unethical past behavior. Actian is one such; they evidently made stuff up about me that Josh Berkus gullibly posted for them, and I don’t want to have conversations that could be dishonestly used against me.

That said, Peter Boncz isn’t exactly an Actian employee. Rather, he’s the professor who supervised Marcin Zukowski’s PhD thesis that became Vectorwise, and I chatted with Peter by Skype while he was at home in Amsterdam. I believe his assurances that no Actian personnel sat in on the call. :)

In other news, Peter is currently working on and optimistic about HyPer. But we literally spent less than a minute talking about that

Before I get to the substance, there’s been a lot of renaming at Actian. To quote Andrew Brust,

… the ParAccel, Pervasive and Vectorwise technologies are being unified under the Actian Analytics Platform brand. Specifically, the ParAccel technology … is being re-branded Actian Matrix; Pervasive’s technologies are rechristened Actian DataFlow and Actian DataConnect; and Vectorwise becomes Actian Vector.

and

Actian … is now “one company, with one voice and one platform” according to its John Santaferraro

The bolded part of the latter quote is untrue — at least in the ordinary sense of the word “one” — but the rest can presumably be taken as company gospel.

All this is by way of preamble to saying that Peter reached out to me about Actian’s new Vector Hadoop Edition when he blogged about it last June, and we finally talked this week. Highlights include: 

  • Vectorwise, while being proudly multi-core, was previously single-server. The new Vector Hadoop Edition is the first version with node parallelism.
  • Actian’s Vector Hadoop edition uses HDFS (Hadoop Distributed File System) and YARN to manage an Actian-proprietary file format. There is currently no interoperability whereby Hadoop jobs can read these files. However …
  • … Actian’s Vector Hadoop edition relies on Hadoop for cluster management, workload management and so on.
  • Peter thinks there are two paying customers, both too recent to be in production, who between then paid what I’d call a remarkable amount of money.*
  • Roadmap futures* include:
    • Being able to update and indeed trickle-update data. Peter is very proud of Vectorwise’s Positional Delta Tree updating.
    • Some elasticity they’re proud of, both in terms of nodes (generally limited to the replication factor of 3) and cores (not so limited).
    • Better interoperability with Hadoop.

Actian actually bundles Vector Hadoop Edition with DataFlow — the old Pervasive DataRush — into what it calls “Actian Analytics Platform – Hadoop SQL Edition”. DataFlow/DataRush has been working over Hadoop since the latter part of 2012, based on a visit with my then clients at Pervasive that December.

*Peter gave me details about revenue, pipeline, roadmap timetables etc. that I’m redacting in case Actian wouldn’t like them shared. I should say that the timetable for some — not all — of the roadmap items was quite near-term; however, pay no attention to any phrasing in Peter’s blog post that suggests the roadmap features are already shipping.

The Actian Vector Hadoop Edition optimizer and query-planning story goes something like this:

  • Vectorwise started with the open-source Ingres optimizer. After a query is optimized, it is rewritten to reflect Vectorwise’s columnar architecture. Peter notes that these rewrites rarely change operator ordering; they just add column-specific optimizations, whatever that means.
  • Now there are rewrites for parallelism as well.
  • These rewrites all seem to be heuristic/rule-based rather than cost-based.
  • Once Vectorwise became part of the Ingres company (later renamed to Actian), they had help from Ingres engineers, who helped them modify the base optimizer so that it wasn’t just the “stock” Ingres one.

As with most modern MPP (Massively Parallel Processing) analytic RDBMS, there doesn’t seem to be any concept of a head-node to which intermediate results need to be shipped. This is good, because head nodes in early MPP analytic RDBMS were dreadful bottlenecks.

Peter and I also talked a bit about SQL-oriented HDFS file formats, such as Parquet and ORC. He doesn’t like their lack of support for columnar compression. Further, in Parquet there seems to be a requirement to read the whole file, to an extent that interferes with Vectorwise’s form of data skipping, which it calls “min-max indexing”.

Frankly, I don’t think the architectural choice “uses Hadoop for workload management and administration” provides a lot of customer benefit in this case. Given that, I don’t know that the world needs another immature MPP analytic RDBMS. I also note with concern that Actian has two different MPP analytic RDBMS products. Still, Vectorwise and indeed all the stuff that comes out Martin Kersten and Peter’s group in Amsterdam has always been interesting technology. So the Actian Vector Hadoop Edition might be worth taking a look at before you redirect your attention to products with more convincing track records and futures.

Categories: Other

Big Data in the Cloud at Google I/O

William Vambenepe - Tue, 2014-07-01 00:55

Last week was a great party for the entire Google developer family, including Google Cloud Platform. And within the Cloud Platform, Big Data processing services. Which is where my focus has been in the almost two years I’ve been at Google.

It started with a bang, when our fearless leader Urs unveiled Cloud Dataflow in the keynote. Supported by a very timely demo (streaming analytics for a World Cup game) by my colleague Eric.

After the keynote, we had three live sessions:

In “Big Data, the Cloud Way“, I gave an overview of the main large-scale data processing services on Google Cloud:

  • Cloud Pub/Sub, a newly-announced service which provides reliable, many-to-many, asynchronous messaging,
  • the aforementioned Cloud Dataflow, to implement data processing pipelines which can run either in streaming or batch mode,
  • BigQuery, an existing service for large-scale SQL-based data processing at interactive speed, and
  • support for Hadoop and Spark, making it very easy to deploy and use them “the Cloud Way”, well integrated with other storage and processing services of Google Cloud Platform.

The next day, in “The Dawn of Fast Data“, Marwa and Reuven described Cloud Dataflow in a lot more details, including code samples. They showed how to easily construct a streaming pipeline which keeps a constantly-updated lookup table of most popular Twitter hashtags for a given prefix. They also explained how Cloud Dataflow builds on over a decade of data processing innovation at Google to optimize processing pipelines and free users from the burden of deploying, configuring, tuning and managing the needed infrastructure. Just like Cloud Pub/Sub and BigQuery do for event handling and SQL analytics, respectively.

Later that afternoon, Felipe and Jordan showed how to build predictive models in “Predicting the future with the Google Cloud Platform“.

We had also prepared some recorded short presentations. To learn more about how easy and efficient it is to use Hadoop and Spark on Google Cloud Platform, you should listen to Dennis in “Open Source Data Analytics“. To learn more about block storage options (including SSD, both local and remote), listen to Jay in “Optimizing disk I/O in the cloud“.

It was gratifying to see well-informed people recognize the importance of these announcement and partners understand how this will benefit their customers. As well as some good press coverage.

It’s liberating to now be able to talk freely about recent progress on our quest to equip Google Cloud users with easy to use data processing tools. Everyone can benefit from Google’s experience making developers productive while efficiently processing data at large scale. With great power comes great productivity.

Categories: Other