Skip navigation.

Other

Some stuff on my mind, September 28, 2014

DBMS2 - Sun, 2014-09-28 18:21

1. I wish I had some good, practical ideas about how to make a political difference around privacy and surveillance. Nothing else we discuss here is remotely as important. I presumably can contribute an opinion piece to, more or less, the technology publication(s) of my choice; that can have a small bit of impact. But I’d love to do better than that. Ideas, anybody?

2. A few thoughts on cloud, colocation, etc.:

  • The economies of scale of colocation-or-cloud over operating your own data center are compelling. Most of the reasons you outsource hardware manufacture to Asia also apply to outsourcing data center operation within the United States. (The one exception I can think of is supply chain.)
  • The arguments for cloud specifically over colocation are less persuasive. Colo providers can even match cloud deployments in rapid provisioning and elastic pricing, if they so choose.
  • Surely not coincidentally, I am told that Rackspace is deemphasizing cloud, reemphasizing colocation, and making a big deal out of Open Compute. In connection with that, Rackspace has pulled back from its leadership role in OpenStack.
  • I’m hearing much more mention of Amazon Redshift than I used to. It seems to have a lot of traction as a simple and low-cost option.
  • I’m hearing less about Elastic MapReduce than I used to, although I imagine usage is still large and growing.
  • In general, I get the impression that progress is being made in overcoming the inherent difficulties in cloud (and even colo) parallel analytic processing. But it all still seems pretty vague, except for the specific claims being made for traction of Redshift, EMR, and so on.
  • Teradata recently told me that in colocation pricing, it is common for floor space to be everything, with power not separately metered. But I don’t think that trend is a big deal, as it is not necessarily permanent.
  • Cloud hype is of course still with us.
  • Other than the above, I stand by my previous thoughts on appliances, clusters and clouds.

3. As for the analytic DBMS industry:

  • Concurrency is still a challenge. But otherwise …
  • … great SQL query performance isn’t something to get excited about any more, especially in immature systems.
  • Be careful about systems that have great performance when intermediate result sets fit into RAM, but not when they spill to disk. In particular, watch for this problem in the Hadoop/Spark world.
  • Vendors are getting better about ANSI SQL coverage (SQL 99 Analytics, windowing, etc. …)
  • “Runs on Hadoop” isn’t an exciting claim unless you can mix and match SQL and generic Hadoop processing in the same jobs against the same data, even though lesser forms of SQL/Hadoop integration might also with help some aspects of TCO (Total Cost of Ownership).
  • More generally, what’s needed is:
    • The ability to mix SQL and other kinds of analytic processing.
    • The ability to mix traditional tabular data, JSON, and log data.
    • The ability to mix data in place with data that’s trickling/streaming in.

4. Meanwhile, the analytic ease of use story remains popular, in business intelligence and predictive analytics/data science alike. Marketers typically oversimplify it to their own detriment, however, just as they do performance stories.

5. On the short-request side:

  • NoSQL is still going gangbusters.
  • NewSQL still isn’t, except that I haven’t talked with MemSQL for a while and they were doing well when I did.
  • Transparent sharding has stagnated as a business, good technology notwithstanding, and the vendors are pivoting.

6. Finally, one vendor note — Sharmila assures me by brief email that things are going gangbusters at ClearStory. This is unsurprising, as ClearStory exemplifies several trends I believe in, including robust analytic stacks, strong data navigation, Spark, and the incorporation of broad varieties of data.

And of course ClearStory also empowers business analysts to make do without IT involvement, like the other cool analytic kids also do.

Categories: Other

Meet Fishbowl’s WebCenter Experts at OpenWorld

Oracle OpenWorld will be held from September 28-October 2 in San Francisco.

Fishbowl Solutions will once again be at Oracle OpenWorld this year to connect with fellow WebCenter users! The event is now only a few days away, and our team is really looking forward to discussing how our value-add solutions can help your organization.

Our booth in the exhibition hall will be located at 2036 Moscone South, and will feature demos of Mobile ECM, the Google Search Appliance, Portal Solution Accelerator, SharePoint integration, and a free iPad giveway. We will also have many representatives on hand to answer your WebCenter content, portal, or imaging questions. All exhibition halls will be open from 10:00 a.m. – 6:00 p.m. on Monday and Tuesday, and from 9:30 a.m. – 3:30 p.m. on Wednesday.

Other activities at this year’s event include:

  • Sunday, September 28
    A Successful Oracle WebCenter Upgrade: What You Need to Know
    12:00 PM-12:45 PM, Moscone South 305This session’s speakers share facts and use cases that you will be able to apply to your Oracle WebCenter 11g upgrade. You will learn from tips and best practices from successful upgrades to Release 11g that you will be able to utilize as well. The session includes a fact-sharing discussion on upgrades; use case stories from Oracle WebCenter customers; and a roundtable forum during which attendees will be able to ask questions specific to their Oracle WebCenter Content, Oracle WebCenter Portal, or Oracle WebCenter Imaging upgrade.
  • Wednesday, October 1
    Automate Financial Processes for PeopleSoft and Oracle E-Business Suite
    12:45 PM-1:30 PM, Moscone West 3018This session’s speakers share facts and use cases that you will be able to apply to your Oracle WebCenter 11g upgrade. You will learn from tips and best practices from successful upgrades to Release 11g that you will be able to utilize as well. The session includes a fact-sharing discussion on upgrades; use case stories from Oracle WebCenter customers; and a roundtable forum during which attendees will be able to ask questions specific to their Oracle WebCenter Content, Oracle WebCenter Portal, or Oracle WebCenter Imaging upgrade.
  • Wednesday, October 1
    Oracle WebCenter for Education and Research
    2:00 PM-2:45 PM, Marriott Marquis Golden Gate C3Digital, social, and mobile technologies are creating new and transformational education experiences to engage students, faculty, parents, and administrators in their collective pursuit of student success. This session features case studies from higher education and K–12 that illustrate the power of Oracle WebCenter in enabling twenty-first-century learning.
  • Monday, September 29
    Oracle WebCenter and Oracle BPM Customer Appreciation Reception
    6:30 PM-8:30 PM, Old Mint, Old Mint PlazaRegister for the reception here.

If you’d like to meet with any of Fishbowl’s representatives at the event, feel free to email info@fishbowlsolutions.com. To learn more about what we’ll be doing at OpenWorld this year, download our Focus On guide. See you in San Francisco!

The post Meet Fishbowl’s WebCenter Experts at OpenWorld appeared first on Fishbowl Solutions' C4 Blog.

Categories: Fusion Middleware, Other

Data as an asset

DBMS2 - Sun, 2014-09-21 21:49

We all tend to assume that data is a great and glorious asset. How solid is this assumption?

  • Yes, data is one of the most proprietary assets an enterprise can have. Any of the Goldman Sachs big three* — people, capital, and reputation — are easier to lose or imitate than data.
  • In many cases, however, data’s value diminishes quickly.
  • Determining the value derived from owning, analyzing and using data is often tricky — but not always. Examples where data’s value is pretty clear start with:
    • Industries which long have had large data-gathering research budgets, in areas such as clinical trials or seismology.
    • Industries that can calculate the return on mass marketing programs, such as internet advertising or its snail-mail predecessors.

*”Our assets are our people, capital and reputation. If any of these is ever diminished, the last is the most difficult to restore.” I love that motto, even if Goldman Sachs itself eventually stopped living up to it. If nothing else, my own business depends primarily on my reputation and information.

This all raises the idea – if you think data is so valuable, maybe you should get more of it. Areas in which enterprises have made significant and/or successful investments in data acquisition include: 

  • Actual scientific, clinical, seismic, or engineering research.
  • Actual selling of (usually proprietary) data, with the straightforward economic proposition of “Get once, sell to multiple customers more cheaply than they could get it themselves.” Examples start:
    • This is the essence of the stock quote business. And Michael Bloomberg started building his vast fortune by adding additional data to what the then-incumbents could offer, for example by getting fixed-income prices from Cantor Fitzgerald.*
    • Multiple marketing-data businesses operate on this model.
    • Back when there was a small but healthy independent paper newsletter and directory business, its essence was data.
    • And now there are many online data selling efforts, in niches large and small.
  • Internet ad-targeting businesses. Making money from your great ad-targeting technology usually involves access to lots of user-impression and de-anonymization data as well.
  • Aggressive testing by internet businesses, of substantive offers and marketing-display choices alike. At the largest, such as eBay, you’ll rarely see a page that doesn’t have at least one experiment on it. Paper-based direct marketers take a similar approach. Call centers perhaps should follow suit more than they do.
  • Surveys, focus groups, etc. These are commonly expensive and unreliable (and the cheap internet ones commonly irritate people who do business with you). But sometimes they are, or seem to be, the only kind of information available.
  • Free-text data. On the whole I’ve been disappointed by the progress in text analytics. Still — and this overlaps with some previous points — there’s a lot of information in text or narrative form out there for the taking.
    • Internally you might have customer emails, call center notes, warranty reports and a lot more.
    • Externally there’s a lot of social media to mine.

*Sadly, Cantor Fitzgerald later became famous for being hit especially hard on 9/11/2001.

And then there’s my favorite example of all. Several decades ago, especially in the 1990s, supermarkets and mass merchants implemented point-of-sale (POS) systems to track every item sold, and then added loyalty cards through which they bribed their customers to associate their names with their purchases. Casinos followed suit. Airlines of course had loyalty/frequent-flyer programs too, which were heavily related to their marketing, although in that case I think loyalty/rewards were truly the core element, with targeted marketing just being an important secondary benefit. Overall, that’s an awesome example of aggressive data gathering. But here’s the thing, and it’s an example of why I’m confused about the value of data — I wouldn’t exactly say that grocers, mass merchants or airlines have been bastions of economic success. Good data will rarely save a bad business.

Related links

Categories: Other

Misconceptions about privacy and surveillance

DBMS2 - Mon, 2014-09-15 11:07

Everybody is confused about privacy and surveillance. So I’m renewing my efforts to consciousness-raise within the tech community. For if we don’t figure out and explain the issues clearly enough, there isn’t a snowball’s chance in Hades our lawmakers will get it right without us.

How bad is the confusion? Well, even Edward Snowden is getting it wrong. A Wired interview with Snowden says:

“If somebody’s really watching me, they’ve got a team of guys whose job is just to hack me,” he says. “I don’t think they’ve geolocated me, but they almost certainly monitor who I’m talking to online. Even if they don’t know what you’re saying, because it’s encrypted, they can still get a lot from who you’re talking to and when you’re talking to them.”

That is surely correct. But the same article also says:

“We have the means and we have the technology to end mass surveillance without any legislative action at all, without any policy changes.” The answer, he says, is robust encryption. “By basically adopting changes like making encryption a universal standard—where all communications are encrypted by default—we can end mass surveillance not just in the United States but around the world.”

That is false, for a myriad of reasons, and indeed is contradicted by the first excerpt I cited.

What privacy/surveillance commentators evidently keep forgetting is:

  • There are many kinds of privacy-destroying information. I think people frequently overlook just how many kinds there are.
  • Many kinds of organization capture that information, can share it with each other, and gain benefits from eroding or destroying privacy. Similarly, I think people overlook just how pervasive the incentive is to snoop.
  • Privacy is invaded through a variety of analytic techniques applied to that information.

So closing down a few vectors of privacy attack doesn’t solve the underlying problem at all.

Worst of all, commentators forget that the correct metric for danger is not just harmful information use, but chilling effects on the exercise of ordinary liberties. But in the interest of space, I won’t reiterate that argument in this post.

Perhaps I can refresh your memory why each of those bulleted claims is correct. Major categories of privacy-destroying information (raw or derived) include:

  • The actual content of your communications – phone calls, email, social media posts and more.
  • The metadata of your communications — who you communicate with, when, how long, etc.
  • What you read, watch, surf to or otherwise pay attention to.
  • Your purchases, sales and other transactions.
  • Video images, via stationary cameras, license plate readers in police cars, drones or just ordinary consumer photography.
  • Monitoring via the devices you carry, such as phones or medical monitors.
  • Your health and physical state, via those devices, but also inferred from, for example, your transactions or search engine entries.
  • Your state of mind, which can be inferred to various extents from almost any of the other information areas.
  • Your location and movements, ditto. Insurance companies also want to put monitors in cars to track your driving behavior in detail.

Of course, these categories overlap. For example, information about your movements can be derived not just from your mobile phone, but also from your transactions, from surveillance cameras, and from the health-monitoring devices that are likely to become much more pervasive in the future.

So who has reason to invade your privacy? Unfortunately, the answer boils down to “just about everybody”. In particular:

  • Any internet or telecom business would like to know, in great detail, what you are doing with their offerings, along with any other information that might influence what you’re apt to buy or do next.
  • Anybody who markets or sells to consumers wants to know similar things.
  • Similar things are true of anybody who worries about credit or insurance risk.
  • Anybody who worries about fraud wants to know who you’re connected to, and also wants to match you against any known patterns of fraud-related behavior.
  • Anybody who hires employees wants to know who might be likely to work hard, get sick or quit.
  • Similarly, they’d like to know who does or might engage in employee misconduct.
  • Medical researchers and caregivers have some of the most admirable reasons for wanting to violate privacy.

And that’s even without mentioning the most obvious suspects — law enforcement and national security of many kinds, who can be presumed to in at least certain cases be able to get any information that’s available to any other organization.

Finally, my sense is:

  • People appreciate the potential of fancy-schmantzy language and image recognition.
  • The graph analysis done on telecom metadata is so simple that people generally “get” what’s going on.
  • Despite all the “big data analytics” hype, commentators tend to forget just how powerful machine learning/predictive analytics privacy intrusions could be. Those psychographic clustering techniques devised to support advertising and personalization could be applied in much more sinister ways as well.

Related links

Categories: Other

Webinar: 21st Century Education Goes Digital with Oracle WebCenter

Oracle Corporation Banner 21st Century Education Goes Digital with Oracle WebCenter

Learn how The Digital Campus with WebCenter can address top-of-mind issues for creating exceptional digital learning experiences, put content in context for the user and optimize business processes

The global education market is under-going a fundamental transformation — from the printed textbook and physical classroom to newer digital, online and mobile experiences.  Today, students can learn anywhere, anytime, from anyone on any device, bridging administrative and academic systems into single universal view.

Oracle WebCenter is at the center of innovation and engagement for any digital enterprise looking to empower exceptional experiences for students, faculty, administrators and researchers. It powerfully connects people, processes, and information with the most complete portfolio of portal, content management, Web experience management and collaboration technologies to enable student success.

Join this special event featuring the University of Pretoria, Fishbowl Solutions and Oracle, whose experts will illustrate successful design patterns and solution delivery for:

  • Student Portals. Create rich, interactive student experiences
  • Digital Repository. Deliver advanced content capture, tagging and sharing while securing enterprise data
  • Admissions. Leverage image capture and business process design to enable improved self-service

Attendees will benefit from the use-case insights and strategies of a world re-knowned university as well as a pre-built solution approach from Oracle and solutions partner Fishbowl to enable a truly modern digital campus.

Audio information:

Dial in Numbers: U.S / Canada: 877-698-7943 (toll free)
International: 706-679-0060(chargeable)
Passcode:
solutions2 Red Button Top Register Now Red Button Bottom

Calendar Sep 11, 2014
10:00 AM PT |
01:00 PM ET

If you are an employee or official of a government organization, please click here for important ethics information regarding this event. Hardware and Software Engineered to Work Together Copyright © 2014, Oracle Corporation and/or its affiliates.
All rights reserved. Contact Us | Legal Notices and Terms of Use | Privacy Statement SEO100151617

Oracle Corporation – Worldwide Headquarters, 500 Oracle Parkway, OPL – E-mail Services, Redwood Shores, CA 94065, United States

Your privacy is important to us. You can login to your account to update your e-mail subscriptions or you can opt-out of all Oracle Marketing e-mails at any time.

Please note that opting-out of Marketing communications does not affect your receipt of important business communications related to your current relationship with Oracle such as Security Updates, Event Registration notices, Account Management and Support/Service communications.

The post Webinar: 21st Century Education Goes Digital with Oracle WebCenter appeared first on Fishbowl Solutions' C4 Blog.

Categories: Fusion Middleware, Other

An idealized log management and analysis system — from whom?

DBMS2 - Sun, 2014-09-07 06:38

I’ve talked with many companies recently that believe they are:

  • Focused on building a great data management and analytic stack for log management …
  • … unlike all the other companies that might be saying the same thing :)
  • … and certainly unlike expensive, poorly-scalable Splunk …
  • … and also unlike less-focused vendors of analytic RDBMS (which are also expensive) and/or Hadoop distributions.

At best, I think such competitive claims are overwrought. Still, it’s a genuinely important subject and opportunity, so let’s consider what a great log management and analysis system might look like.

Much of this discussion could apply to machine-generated data in general. But right now I think more players are doing product management with an explicit conception either of log management or event-series analytics, so for this post I’ll share that focus too.

A short answer might be “Splunk, but with more analytic functionality and more scalable performance, at lower cost, plus numerous coupons for free pizza.” A more constructive and bottoms-up approach might start with: 

  • Agents for any kind of machine that admits streams of data.
  • Parsers that:
    • Immediately identify explicit name-value pairs in popular formats such as JSON or XML.
    • Also immediately extract a significant fraction of all implicit fields in text strings — timestamps for sure, but also a lot else. (Splunk is the current gold standard for such capabilities.)
    • Allow you to easily write rules for more such extractions.
  • Immediate indexing in line with everything the parsers do.
  • Easy import of log files, relational tables, and other relevant data structures.
  • Queries that can exploit all the indexes, at least up to the functionality level of SQL 2003 analytics (including windowing) and StreamSQL, of course with …
  • … blazing scalable performance.
  • Strong workload management and concurrent performance support. (Teradata is the gold standard for such capabilities in the analytic sphere.)
  • Various other mature-DBMS features, e.g. in backup, manageability, and uptime.

Further, there would be numerous styles of business intelligence interface, at least including:

  • Generic BI like we generally see for tabular data.
  • Constantly-changing displays of streaming data.
  • BI with an event-series orientation.
  • Strong alerting.
  • Mobile versions of everything.

And there would be good support for quick-turnaround, easily-operationalized predictive analytics, of the sort that’s fairly central to the visions for Kiji and Spark.

The data management part of that is particularly hard, in that:

  • Different architectures seem naturally well-suited for different parts of the problem.
  • Maturing a new data management product is always difficult, costly and slow.

My thoughts on strengths and weaknesses of some obvious log data management contenders start:

  • Oracle, IBM, and Microsoft have a lot of heft in all things database. But while each of those vendors has great resources and occasionally impressive pieces of new database engineering, none shows much evidence of framing, let alone solving, the problem in the right way(s).
  • SAP owns Sybase, HANA, several old CEP companies, and Business Objects. Add them to the Oracle/IBM/Microsoft list.
  • Teradata has a lot going for them. Their core analytic data management strengths are obvious. They’ve owned Aster for a while, and Aster innovated nPath quite some time ago. They recently added Hadapt, a leader in schema-on-need, as well as Revelytix, which has some good ideas in dataset management. Like most other DBMS vendors, however, Teradata doesn’t yet have much of a story for streaming data, and anyhow the most optimistic case for Teradata involves the difficult task of stitching together disparate data management technologies.
  • HP Vertica has a decent position as well. Probably more proven in general concurrent, scalable performance than others in their peer group (Netezza, Greenplum, et al.), Vertica also was relatively early in innovations relevant to log analysis, including a range of time series/event series features and its own schema-on-need effort. Vertica was also founded by people who were also streaming pioneers (there were heavily overlapping groups of academics behind StreamBase, Vertica and VoltDB), but it’s not clear how that background is reflected in present Vertica product.
  • Splunk, of course, has a complete stack. At the data acquisition and parsing layers, it’s second to none, and it has a considerable set of log-appropriate BI capabilities as well. And for data management it in effect is stitching together two different inverted-list data stores, plus Hadoop.
  • Hadoop distribution vendors such as Cloudera, MapR or Hortonworks offer typically bundle a range of relevant capabilities. HDFS (Hadoop Distributed File System) is the default place to dump entire logs. In most distros, Spark offers a new approach to streaming. Impala, Drill and so on offer query. Flume gathers the log data in the first place. But a lot of the cooler capabilities are immature or unproven, and in some cases that’s putting it mildly.

In the interest of length, I’ll omit discussion of smaller vendors, except to say that Platfora’s integrated-stack event series analytics story deserves attention, and I’m disappointed that I never hear about Sumo Logic. And I don’t know a lot about companies positioned as SIEM (Security Information and Event Management), especially now that SenSage has left the scene.

Categories: Other

Migrating Existing PeopleSoft Attachments into the Managed Attachments Solution

This post comes from Fishbowl’s Mark Heupel. Mark is an Oracle Webcenter consultant, and he has worked on a few different projects over the last year helping customers integrate WebCenter with Oracle E-Business Suite and PeopleSoft. One of WebCenter’s strengths is it provides these integrations out-of-the-box, including a document imaging integration to automate invoice processing with WebCenter’s capture, forms recognition and imaging capabilities, as well as workflows leveraging Oracle Business Process Management. Mark discusses WebCenter’s integration with PeopleSoft and its managed attachments solution below.

Application Integration

Oracle’s Managed Attachments solution enables business users in PeopleSoft to attach, scan, and retrieve document attachments stored in an Oracle WebCenter Content Server repository.

One of the issues that our clients face when moving to Oracle’s Managed Attachments solution is determining what to do with the attachments that already exist in PeopleSoft. We at Fishbowl have come up with a method to migrate these attachments into WebCenter Content in bulk while still maintaining the attachments’ context within PeopleSoft.

A high-level view of the solution is as follows. Queries are written on the PeopleSoft side to export each of the attachments, as well as a file containing each attachment’s metadata and PeopleSoft contextual information, to a network share. This is a task done by a PeopleSoft administrator. We then use our Enterprise Batchloader product to bulk load these files into WebCenter Content. We’ve written a customization that overrides the set of services that qualify for Managed Attachments to include our Enterprise Batchloader service. Since the context of the attachments is included in the metadata file, the Enterprise Batchloader check-ins work in the same way that a normal check-in from Managed Attachments would and the attachments retain their PeopleSoft context. Let’s get into the details of how this works.

Managed Attachments Overview

In order to understand the migration strategy, we first need to understand how Managed Attachments works under the covers. The important piece to know for this migration is that the table that stores the Managed Attachment object information on the WebCenter side is the AFObjects table. This table stores the PeopleSoft context information as well as the dDocName of each of the attachments currently being stored in WebCenter. Here is an example of what the AFObjects table looks like:

AFObjects Table

Each row in this table represents one PeopleSoft attachment being managed in WebCenter Content. The dAFApplication, dAFBusinessObjectType, and dAFBusinessObject fields make up the context for where the attachment is located in PeopleSoft. The dAFApplication field represents the application, the dAFObjectType field represents the page, and the dAFBusinessObject field is a pipe delimited list of the primary key values from the page where the attachment is located in PeopleSoft. The dDocName field is simply the dDocName of the content item in WebCenter.

When a user clicks the Managed Attachments link on the PeopleSoft screen a request is made over to WebCenter that contains the contextual page information from PeopleSoft (dAFApplication, dAFBusinessObjectType, and dAFBusinessObject). Using this contextual information, a query is then made against the AFObjects table to find the content IDs of the attachments that should be returned back to the user. A similar request is made when a user checks in a document through the Managed Attachments screen in PeopleSoft. The PeopleSoft context information is sent to WebCenter, the document is checked in, and then a row is inserted into the AFObjects table that contains the PeopleSoft contextual information as well as the dDocName of the newly checked-in document.

Loading Content into WebCenter

In order to be able to successfully load a large number of content items into WebCenter, while still maintaining the correct PeopleSoft context, we had to write a customization to hook into the existing Managed Attachments check-in functionality. The AppAdapterCore component, one of the two components installed on WebCenter for Managed Attachments, contains the core Managed Attachments code. This component contains a list of services such as CHECKIN_NEW that, when called with the PeopleSoft contextual information in the binder (dAFApplication, dAFObjectType, and dAFObject), executes the query that inserts a row into the AFObjects table. The customization that we wrote overrides the list of services specified in the AppAdapterCore component to include our Enterprise Batchloader check-in services. By doing so, we’re able to hook into the same insert query that Managed Attachments already uses, assuming we have placed the correct PeopleSoft context information in the binder.

Here is an example of what a standard Enterprise Batchloader blf (batch load file) would look like:

Batch Load File
As you can see, the file simply contains the action to take (insert), the location of the primary file, and the required metadata fields for WebCenter. In order to assign the correct PeopleSoft context we simply need to specify the dAFApplication, dAFObjectType, and dAFObject fields in the blf file:

Batch Load File 2

This effectively places each of those fields into the binder in WebCenter. When Enterprise Batchloader is run and performs its check-ins into WebCenter, the Managed Attachments functionality gets called and a row is inserted into the AFObjects table for each attachment that specifies the PeopleSoft context information. As long as the correct PeopleSoft contextual information is placed into the Enterprise Batchloader blf file, we’re able to bulk load as many attachments as needed into WebCenter while still retaining the correct PeopleSoft context information for use with the Managed Attachments solution.

I hope this provides you with an example of how your existing PeopleSoft Managed Attachments content could be migrated to WebCenter. After all, getting this content into WebCenter has many additional benefits, such as version control, renditions, retention management and the ability to surface this content to WebCenter-based mobile apps and portals. If you have questions or would like to engage with Fishbowl on such projects, please email info@fishbowlsolutions.com.

 

The post Migrating Existing PeopleSoft Attachments into the Managed Attachments Solution appeared first on Fishbowl Solutions' C4 Blog.

Categories: Fusion Middleware, Other

Notes from a visit to Teradata

DBMS2 - Sun, 2014-08-31 03:17

I spent a day with Teradata in Rancho Bernardo last week. Most of what we discussed is confidential, but I think the non-confidential parts and my general impressions add up to enough for a post.

First, let’s catch up with some personnel gossip. So far as I can tell:

  • Scott Gnau runs most of Teradata’s development, product management, and product marketing, the big exception being that …
  • … Darryl McDonald run the apps part (Aprimo and so on), and no longer is head of marketing.
  • Oliver Ratzesberger runs Teradata’s software development.
  • Jeff Carter has returned to his roots and runs the hardware part, in place of Carson Schmidt.
  • Aster founders Mayank Bawa and Tasso Argyros have left Teradata (perhaps some earn-out period ended).
  • Carson is temporarily running Aster development (in place of Mayank), and has some sort of evangelism role waiting after that.
  • With the acquisition of Hadapt, Teradata gets some attention from Dan Abadi. Also, they’re retaining Justin Borgman.

The biggest change in my general impressions about Teradata is that they’re having smart thoughts about the cloud. At least, Oliver is. All details are confidential, and I wouldn’t necessarily expect them to become clear even in October (which once again is the month for Teradata’s user conference). My main concern about all that is whether Teradata’s engineering team can successfully execute on Oliver’s directives. I’m optimistic, but I don’t have a lot of detail to support my good feelings.

In some quick-and-dirty positioning and sales qualification notes, which crystallize what we already knew before:

  • The Teradata 1xxx series is focused on cost-per-bit.
  • The Teradata 2xxx series is focused on cost-per-query. It is commonly Teradata’s “lead” product, at least for new customers.
  • The Teradata 6xxx series is supposed to be above to do “everything”.
  • The Teradata Aster “Discovery Analytics” platform is sold mainly to customers who have a specific high-value problem to solve. (Randy Lea gave me a nice round dollar number, but I won’t share it.) I like that approach, as it obviates much of the concern about “Wait — is this strategic for us long-term, given that we also have both Teradata database and Hadoop clusters?”

Also:

  • 1xxx and 2xxx systems are meant to be I/O-constrained. 6xxx systems are meant to be constrained mainly by CPU, but every system will be I/O-constrained at some point.
  • There is at least one example of a Very Well Known organization buying Teradata’s Hadoop-only appliance despite not otherwise being a Hadoop customer. Teradata concedes, however, that this is not a common occurrence.
  • Customers are increasingly using co-location rather than their own data centers. Many colo organizations charge more or less strictly by floor space. Hence, there’s a push for maximum processing density per rack, power density and weight be damned.

Speaking of not being CPU-constrained — I heard 7-10% as an estimate for typical Hadoop utilization, and also 10-15%. While I didn’t ask, I presume these figures assume traditional MapReduce types of Hadoop workloads. I’m not sure why these figures are yet lower than eBay’s long-ago estimates of Hadoop “parallel efficiency”.

Like Carson used to do, Jeff shared a variety of hardware and networking tidbits with me. In particular:

  • Jeff is confident in Moore’s Law continuing for at least 5 more years. (I think that’s a near-consensus; the 2020s, however, are another matter.)
  • Teradata still uses SAS rather than SATA for all disk (spinning or solid-state) controllers. They’re now seeing 6-700 MB/sec/device on SSDs (Solid State Disk), up from 3-400.
  • SSD prices are down 60% over the past 6 months, vs. much slower declines previously.
  • Formerly a SanDisk/Pliant partisan, Teradata now thinks there are multiple vendors of good SSDs. (I’m not sure whether they’d be happy if I said which one they currently like best.)
  • Jeff foresees InfiniBand and Ethernet more or less merging. Right now Teradata is using a lot of 56 Gb/sec InfiniBand.

Since Oliver is now a Teradata mucky-muck, I asked about virtual data marts, an idea that he pretty much invented or at least popularized back in his eBay days. Comments included:

  • Teradata now calls them Data Labs.
  • Adoption is very high.
  • One major feature is “time boxing” — they expire after a period of time unless you renew them.
  • Analysis of virtual data mart usage is a good guide as to what you might want to add to your permanent data warehouse.

And I’ll stop here, although I hope that a couple more-focused posts will also eventually flow from the visit.

Categories: Other

Subscription Notifier Version 4.0 Enables WebCenter Users to Create Custom Content Email Notifications

Fishbowl Solutions’ Subscription Notifier has been used by many of our customers for years to manage business content stored in Oracle WebCenter Content. Subscription Notifier automatically sends email notifications based on scheduled queries. Fishbowl released version 4.0 of the product last week, and it includes several significant updates.

Now, users of Subscription Notifier can:

  • Attach native or web-viewable files to notification emails
  • Send individual notification emails for each content item
  • Configure hourly notification schedules
  • Run subscription side effects without sending emails

In addition to the latest updates, the product also offers a host of other features that enable WebCenter users to keep track of their high-value content.

You begin by naming the subscription and specifying whether emails should be sent for items matching the query. The scheduler lets you specify exactly when you want email notifications to go out (note the hourly option, new with version 4.0).

 

SubNoti general settings

The email settings specify who you want to send emails to and how they should appear to recipients. The new “Attach Content” feature gives you the option of sending web-viewable or native files, which provides a way for recipients who don’t use Oracle WebCenter to still see important files. Using the query builder is very simple and determines what content items are included in the subscription. Advanced users also have the option to write more complex queries using SQL.

SubNoti email

The Current Subscription Notifications page gives a summary of all subscriptions. In Version 4.0, simple changes such as enabling, disabling, or deleting subscriptions can be done here.

SubNoti current subscription notifications

Subscription Notifier is a very useful tool for any organization that needs to keep tabs on a large amount of business content. It is part of Fishbowl’s Administration Suite, which also includes Advanced User Security Mapping, Workflow Solution Set, and Enterprise BatchLoader. This set of products works together to simplify the most common administrative tasks in Oracle WebCenter Content.

To learn more about Subscription Notifier, visit Fishbowl’s website or read the press release announcing Version 4.0.

The post Subscription Notifier Version 4.0 Enables WebCenter Users to Create Custom Content Email Notifications appeared first on Fishbowl Solutions' C4 Blog.

Categories: Fusion Middleware, Other

“Freeing business analysts from IT”

DBMS2 - Thu, 2014-08-14 06:21

Many of the companies I talk with boast of freeing business analysts from reliance on IT. This, to put it mildly, is not a unique value proposition. As I wrote in 2012, when I went on a history of analytics posting kick,

  • Most interesting analytic software has been adopted first and foremost at the departmental level.
  • People seem to be forgetting that fact.

In particular, I would argue that the following analytic technologies started and prospered largely through departmental adoption:

  • Fourth-generation languages (the analytically-focused ones, which in fact started out being consumed on a remote/time-sharing basis)
  • Electronic spreadsheets
  • 1990s-era business intelligence
  • Dashboards
  • Fancy-visualization business intelligence
  • Planning/budgeting
  • Predictive analytics
  • Text analytics
  • Rules engines

What brings me back to the topic is conversations I had this week with Paxata and Metanautix. The Paxata story starts:

  • Paxata is offering easy — and hopefully in the future comprehensive — “data preparation” tools …
  • … that are meant to be used by business analysts rather than ETL (Extract/Transform/Load) specialists or other IT professionals …
  • … where what Paxata means by “data preparation” is not specifically what a statistician would mean by the term, but rather generally refers to getting data ready for business intelligence or other analytics.

Metanautix seems to aspire to a more complete full-analytic-stack-without-IT kind of story, but clearly sees the data preparation part as a big part of its value.

If there’s anything new about such stories, it has to be on the transformation side; BI tools have been helping with data extraction since — well, since the dawn of BI. The data movement tool I used personally in the 1990s was Q+E, an early BI tool that also had some update capabilities.* And this use of BI has never stopped; for example, in 2011, Stephen Groschupf gave me the impression that a significant fraction of Datameer’s usage was for lightweight ETL.

*Q+E came from Pioneer Software, the original predecessor of Progress DataDirect, which first came to fame in association with Microsoft Excel and the invention of ODBC.

More generally, I’d say that there are several good ways for IT to give out data access, the two most obvious of which are:

  • “Semantic layers” in BI tools.
  • Data copies in departmental data marts.

If neither of those works for you, then most likely either:

  • Your problem isn’t technology.
  • Your problem isn’t data access.

And so we’ve circled back to what I wrote last month:

Data transformation is a better business to enter than data movement. Differentiated value in data movement comes in areas such as performance, reliability and maturity, where established players have major advantages. But differentiated value in data transformation can come from “intelligence”, which is easier to excel in as a start-up.

What remains to be seen is whether and to what extent any of these startups (the ones I mentioned above, or Trifacta, or Tamr, or whoever) can overcome what I wrote in the same post:

When I talk with data integration startups, I ask questions such as “What fraction of Informatica’s revenue are you shooting for?” and, as a follow-up, “Why would that be grounds for excitement?”

It will be interesting to see what happens.

Categories: Other

Actian Vector Hadoop Edition

DBMS2 - Thu, 2014-08-07 05:12

I have a small blacklist of companies I won’t talk with because of their particularly unethical past behavior. Actian is one such; they evidently made stuff up about me that Josh Berkus gullibly posted for them, and I don’t want to have conversations that could be dishonestly used against me.

That said, Peter Boncz isn’t exactly an Actian employee. Rather, he’s the professor who supervised Marcin Zukowski’s PhD thesis that became Vectorwise, and I chatted with Peter by Skype while he was at home in Amsterdam. I believe his assurances that no Actian personnel sat in on the call. :)

In other news, Peter is currently working on and optimistic about HyPer. But we literally spent less than a minute talking about that

Before I get to the substance, there’s been a lot of renaming at Actian. To quote Andrew Brust,

… the ParAccel, Pervasive and Vectorwise technologies are being unified under the Actian Analytics Platform brand. Specifically, the ParAccel technology … is being re-branded Actian Matrix; Pervasive’s technologies are rechristened Actian DataFlow and Actian DataConnect; and Vectorwise becomes Actian Vector.

and

Actian … is now “one company, with one voice and one platform” according to its John Santaferraro

The bolded part of the latter quote is untrue — at least in the ordinary sense of the word “one” — but the rest can presumably be taken as company gospel.

All this is by way of preamble to saying that Peter reached out to me about Actian’s new Vector Hadoop Edition when he blogged about it last June, and we finally talked this week. Highlights include: 

  • Vectorwise, while being proudly multi-core, was previously single-server. The new Vector Hadoop Edition is the first version with node parallelism.
  • Actian’s Vector Hadoop edition uses HDFS (Hadoop Distributed File System) and YARN to manage an Actian-proprietary file format. There is currently no interoperability whereby Hadoop jobs can read these files. However …
  • … Actian’s Vector Hadoop edition relies on Hadoop for cluster management, workload management and so on.
  • Peter thinks there are two paying customers, both too recent to be in production, who between then paid what I’d call a remarkable amount of money.*
  • Roadmap futures* include:
    • Being able to update and indeed trickle-update data. Peter is very proud of Vectorwise’s Positional Delta Tree updating.
    • Some elasticity they’re proud of, both in terms of nodes (generally limited to the replication factor of 3) and cores (not so limited).
    • Better interoperability with Hadoop.

Actian actually bundles Vector Hadoop Edition with DataFlow — the old Pervasive DataRush — into what it calls “Actian Analytics Platform – Hadoop SQL Edition”. DataFlow/DataRush has been working over Hadoop since the latter part of 2012, based on a visit with my then clients at Pervasive that December.

*Peter gave me details about revenue, pipeline, roadmap timetables etc. that I’m redacting in case Actian wouldn’t like them shared. I should say that the timetable for some — not all — of the roadmap items was quite near-term; however, pay no attention to any phrasing in Peter’s blog post that suggests the roadmap features are already shipping.

The Actian Vector Hadoop Edition optimizer and query-planning story goes something like this:

  • Vectorwise started with the open-source Ingres optimizer. After a query is optimized, it is rewritten to reflect Vectorwise’s columnar architecture. Peter notes that these rewrites rarely change operator ordering; they just add column-specific optimizations, whatever that means.
  • Now there are rewrites for parallelism as well.
  • These rewrites all seem to be heuristic/rule-based rather than cost-based.
  • Once Vectorwise became part of the Ingres company (later renamed to Actian), they had help from Ingres engineers, who helped them modify the base optimizer so that it wasn’t just the “stock” Ingres one.

As with most modern MPP (Massively Parallel Processing) analytic RDBMS, there doesn’t seem to be any concept of a head-node to which intermediate results need to be shipped. This is good, because head nodes in early MPP analytic RDBMS were dreadful bottlenecks.

Peter and I also talked a bit about SQL-oriented HDFS file formats, such as Parquet and ORC. He doesn’t like their lack of support for columnar compression. Further, in Parquet there seems to be a requirement to read the whole file, to an extent that interferes with Vectorwise’s form of data skipping, which it calls “min-max indexing”.

Frankly, I don’t think the architectural choice “uses Hadoop for workload management and administration” provides a lot of customer benefit in this case. Given that, I don’t know that the world needs another immature MPP analytic RDBMS. I also note with concern that Actian has two different MPP analytic RDBMS products. Still, Vectorwise and indeed all the stuff that comes out Martin Kersten and Peter’s group in Amsterdam has always been interesting technology. So the Actian Vector Hadoop Edition might be worth taking a look at before you redirect your attention to products with more convincing track records and futures.

Categories: Other

Javascript Driven ADF Taskflows for WebCenter Portal

This is a continuation from my previous post - Developing WebCenter Content Cross Platform iDoc Enabled Components for Mobile, ADF, Sharepoint, Liferay.

You can see a video of JIVE Forums integration with a JS Taskflows vs ADF Taskflow running in WebCenter Portal here -

Click here for hi-resolution

This post is aimed at Web Developers, Designers and Marketing web teams who aren’t familiar with ADF and want to create reusable dynamic taskflows without the need to learn ADF or Java to provide interactive dynamic regions using Javascript, HTML and CSS with custom frameworks like jQuery designed not to conflict with ADF JS environment.

Read on for a step by step run through on creating JS driven taskflows  -

    1. You will need to download JDeveloper – I’m using JDev 11.1.1.7.0 for WebCenter Portal 11g where I will deploy my custom taskflow driven entirely with Javascript.
    2. Run through the following Oracle guide to setup your project to extend Portal (11.1.1.8.3) - Developing Components for WebCenter Portal Using JDeveloper
    3. Add new taskflow to library by right-clicking WebCenterSpacesExtensions and selecting “New…”
    4. Add ADF Task Flow (JSF)
      .
      1
      .
    5. Name the xml file, leaving the Directory the JDev default
      .
      2
      .
    6. Double click the new xml file and drag a View element into the diagram from the Component Palette
      .
      3
      .
    7. Rename “view1″ to “[taskflow name]View”.
    8. Double click the new view to create a page fragment.
      Update the directory and add \taskflows\[taskflow name]\view
      This will make it easier to sort through in the future when you develop more taskflows.
      .
      4
      .
    9. Edit the JSFF and display code in source view.
      .
      5
      .
    10. Replace with the following -
      <?xml version='1.0' encoding='UTF-8'?>
      <jsp:root xmlns:jsp="http://java.sun.com/JSP/Page" version="2.1"
                xmlns:af="http://xmlns.oracle.com/adf/faces/rich"
                xmlns:f="http://java.sun.com/jsf/core">
      <af:resource type="javascript">
      <![CDATA[
      /**
       * CREATE BASE JS CONTAINER OBJ
       * This is base class to assist PSA javascript methods to init after page loaded.
       * You can add this script in the head of you template instead of the portlet.
       */
      var FB = window.FB || {},
      	Base = Base || (function() {
      		return {
      			//create multi-cast delegate.
      			onPortalInit: function(function1, function2) {
      				return function() {
      					if (function1) {
      						function1();
      					}
      					if (function2) {
      						function2();
      					}
      				}
      			},
      			//used for chaining methods
      			chainPSA: function() {}
      		}
      	})();
      
      //Use Base method if FB.Base hasn't been created
      FB.Base = FB.Base || Base;
      /************************/
      
      
      
      
      /**
       * CREATE CHAIN WRAPPER
       * Chain method will initialise from Base requirejs core script
       */
      FB.Base.chainPSA = FB.Base.onPortalInit(FB.Base.chainPSA, function() {
      	//set base mustache template name to load and inject
      	var vUID = 'FB_sampleContainer_${pageFlowScope.containerID}', //(UID) Unique Classname to inject template into - can't use IDs in portal 
      		oConstructor = {
      			vTemplate: 		'import/tpl/sampleTpl', //location of sampleTpl.mustache to load
      			oParams: { //Obj list of default params pulled from sample.xml Input definition
      				title:			'${pageFlowScope.title}',
      				displayTitle: 	'${pageFlowScope.displayTitle}',
      				activeUser: 	'${pageFlowScope.activeUser}'
      			},
      			containerID: 		vUID
      		};
      	
      	//check if array exists from other custom JS Portlets
      	if (typeof(FB.loadTemplate) === 'object') {
      		FB.loadTemplate.portletUIDList.push(vUID);
      	//create empty object
      	} else {
      		FB.loadTemplate = {
      			portletUIDList:[vUID],
      			portlets: {}
      		};
      	}
      
      	//inject params
      	FB.loadTemplate.portlets[vUID] = oConstructor;
      
      });
      /************************/
      ]]>
      </af:resource>
      
      
      <!-- Sample template will be injected here -->
      <af:panelGroupLayout layout="vertical" id="FB-SampleContainer" styleClass="FB_sampleContainer_#{pageFlowScope.containerID} portlet-sampleContainer"></af:panelGroupLayout>
      <!-- xSample template will be injected here -->
      
      
      </jsp:root>

      OVERVIEW:

      This is where the mustache template will be injected into to provide the sample component functionality.

    11. <af:panelGroupLayout layout="vertical" id="FB-SampleContainer" styleClass="FB_sampleContainer_#{pageFlowScope.containerID} portlet-sampleContainer"></af:panelGroupLayout>

      The oConstructor specifies the configuration of the the component to inject.
      vTemplate points to a JS file that requireJS imports and configures the base multiUploader components from the params defined.

      oParams contains all configuration for the App at the moment these are scoped params associated with the taskflow that you can allow the user to define and use within you sample component as a JS var.

      var vUID = 'FB_sampleContainer_${pageFlowScope.containerID}', //(UID) Unique Classname to inject template into - can't use IDs in portal 
      		oConstructor = {
      			vTemplate: 		'import/tpl/sampleTpl', //location of sampleTpl.mustache to load
      			oParams: { //Obj list of default params pulled from sample.xml Input definition
      				title:			'${pageFlowScope.title}',
      				displayTitle: 	'${pageFlowScope.displayTitle}',
      				activeUser: 	'${pageFlowScope.activeUser}'
      			},
      			containerID: 		vUID
      		};

      A simple check to see if other components exist on the page and append the new component within the JS Array “PortletUIDList” associated with a JS Object holding the component params in “portlets”

      //check if array exists from other custom JS Portlets
      	if (typeof(FB.loadTemplate) === 'object') {
      		FB.loadTemplate.portletUIDList.push(vUID);
      	//create empty object
      	} else {
      		FB.loadTemplate = {
      			portletUIDList:[vUID],
      			portlets: {}
      		};
      	}
      
      	//inject params
      	FB.loadTemplate.portlets[vUID] = oConstructor;

      Finally the JS configuration is wrapped in JS chain wrapper that will only initialise when requireJS has loaded in all its core base libraries like Jquery etc.

      FB.Base.chainPSA = FB.Base.onPortalInit(FB.Base.chainPSA, function() {
      
      //code
      
      });

      Make sure within your ADF Template you have setup requirejs core and have the following to initialise the FB.Base.chainPSA and loop through the custom taskflows to display on the page -

      //load JS Components
      		if (FB.Base.chainPSA) {
      			FB.Base.chainPSA();
      		}

      //loop and request all templates required
      			for (x;x<lPortletList;x++) {
      				var vPortletUID 	= aPortletList[x],
      					oPortlet 		= FB.loadTemplate.portlets[vPortletUID];
      				
      				//define temp object info to pass into script when init	
      				define('temp'+x, oPortlet);
      				
      				//request and initialise portlet template & pass params
      				require([oPortlet.vTemplate,'temp'+x], function(tpl,oPortlet) {
      					console.log('[IMPORTED TEMPLATE]',tpl.component,oPortlet);
      					tpl.init(oPortlet);
      				});
      			}

    12. To add taskflow parameters open the xml file again.
    13. Select Overview tab bottom left of the screen.
      Select the Parameters side tab.
      Add the following four example params -
      .
      6
      You will see these when we add and edit the taskflow to a portal page in WebCenter Composer.
    14. Deploy the taskflow to WebCenter Portal following the last steps in the Oracle GuideOnce the new taskflow / spaces extension project has been deployed load WebCenter Portal.
      The following screenshots from PS5 the UI has changed since PS7 but you should be able to work out the differences.
    1. Go into administration area of the portal and select the “Resources” Tab
    2. Select the “Resource Catalogs” from the items on the left under the “Structure” heading.
      A list of Resource Catalogs will be available. You can create a new one or use an existing one. Make sure the one you are updating is the one being used by the portal you want to add the taskflow into.
      .
      7
      .
    3. Select the resource catalogue and Edit from the Edit Menu drop down down.
      .
      8
      .
    4. A window will appear hear you can add folders and where you want your components to appear.
      I have created a Demo Taskflow folder.
    5. Select “Add From Library” from the Add dropdown menu.
      .
      9
      .
    6. Drill into Taskflows and add your [Taskflow] – I am adding the sample taskflow I created earlier.
    7. Go into your portal create a new page and add the new taskflow.
      Here is an example of the Jive Forums that I recreated as a JS driven taskflow.
      .
      11
      .
      12
      .
    8. And the final output of the taskflow on the page.
      13

 

 

 

The post Javascript Driven ADF Taskflows for WebCenter Portal appeared first on Fishbowl Solutions' C4 Blog.

Categories: Fusion Middleware, Other

Developing WebCenter Content Cross Platform iDoc Enabled Components for Mobile, ADF, Sharepoint, Liferay

frankensteinSo over the last couple of months I’ve been thinking and tinkering with code, wondering, “What’s the best approach for creating WebCenter Content (WCC) components that I can consume and reuse across multiple platforms and environments?”
Is it pagelet producer or maybe an iFrame? These solutions just weren’t good enough or didn’t allow the flexibility I really wanted.

I needed a WCC Solution that could easily be consumed into mobile, either Cordova (Hybrid APP) or ADF Mobile (AMX views), and that worked on different devices/platforms as well as on any enterprise app, i.e. Sharepoint (.Net), Lifreray,  WebCenter Portal (ADF) or even consumed into the new WebCenter Content ADF WebUI. It also needed to provide the added advantage that there would not need to be multiple branches of code or redevelopment of the component for each platform and environment.

And in the famous words of Victor Frankenstein.. “It’s Alive!!”

After tinkering around and trying different approaches, this is the solution I created to support the above model.
I’m not saying this is the right approach or supported by the enterprise vendors, but an approach that is reusable and can work on all enterprise apps.

 

[VIDEO CONVERTING]…

Here’s a quick video of a drag/drop MultiUploader component I created for WebCenter Content Classic that I can reuse on .Net and ADF WebCenter Portal/Content as well as mobile.

Read on to find out more on how this was achieved.

1) First, I’m going to dig into WebCenter Content and explain the underlying structure of the component.

To create a flexible base model, I created a light Javascript framework, very similar to AngularJS or ReactJS.

This would be the base component that would enable additional components on the page with the use of Mustache (JS templates) to drive and inject dynamic functional areas of content into a specified DOM node by ID or className.
Any changes of layout with the component are handled via an AJAX request to a cached mustache template which updates the DOM when needed (similar to ADFs PPR). Any user interaction is handled through event-driven actions from the imported templates.

RequireJS is used to supply a flexible module loading framework, where I do not need to be worried over conflicts of JS libraries and is used to load in mustache templates and additional JS functionality when needed.

You’re probably thinking that there are going to be a lot of AJAX requests going back and forth and it’s going to be slow. Just check out the video – the answer is not really. The mustache templates are going to be smaller than average images you load on a page.

So as an example for the MultiUploader, I only have 1 mustache template that is 9kb. All interaction is handled by 2 JS files that are 39kb uncompressed.

2) As mentioned, a base model WCC component, “FishbowlModuleLoader”, will load in and initiate all other components on the page and will only load and cache required templates and JS files as and when is needed. There is no point to load in all templates and JS functionality on a page if it is not needed, which improves performance and interaction of the component.

3) Following is a quick overview of how the WCC component “FishbowlMultiUploader” works.

WebCenter Content Resource Asset

This is the base structure of the Content Component configuration, “fb_multi_upload_page_body”. It is consumed into a custom template, “MULTI_UPLOAD_PAGE”, which is requested via a custom service request, “?IdcService=GET_FB_MULTI_UPLOAD_PAGE”.

<!--
Name:           fb_multi_upload_page_body
Author:         John Sim  [18/06/2014]
Parameters:		
Description:	Page Body for Multi Checkin used in MULTI_UPLOAD_PAGE template
-->
<@dynamichtml fb_multi_upload_page_body@>
[[% FB fb_multi_upload_page_body Template body MULTI_UPLOAD_PAGE %]]

<div id="FB-multiCheckin" class="FB_multiCheckin"></div>

<script>
/**
 * CREATE CHAIN WRAPPER
 * Chain method will load from Base ModuleLoader requirejs core script
 */
FB.Base.chainPSA = FB.Base.onPortalInit(FB.Base.chainPSA, function() {
	//set base mustache template name to load and inject
	var vUID = 'FB_multiCheckin', //(UID) Unique Classname to inject template into - can't use IDs in portal 
		oConstructor = {
			vTemplate: 'import/tpl/multiUploadTpl', //location of template.mustache to load
			oParams: { //Obj list of default params pulled from multiUploader.xml Input definition
				maxUploadSize:			'10mb',
				defaultDocType:			('<$multiUploadDefaultType$>' !== '')? '<$multiUploadDefaultType$>': 'Document', 
				defaultSecurityGroup:		('<$multiUploadDefaultSecurityGroup$>' !== '')? '<$multiUploadDefaultSecurityGroup$>': 'Public',
				defaultAccount:			'Workspace/'+userName, 
				author:				(typeof(userName) !== 'undefined')? userName: '', 
				httpEnterpriseCgiPath: 		(typeof(httpEnterpriseCgiPath) !== 'undefined')? httpEnterpriseCgiPath: '',
				idcToken: 			(typeof(idcToken) !== 'undefined')? idcToken: '',
				httpWebRoot: 			(typeof(httpWebRoot) !== 'undefined')? httpWebRoot: '',
				enableTagging:			true,
				enableEmails:			true,
				enableBarcode:			true,
				enableCheckinProfiles: 		true,
				showHelpOption: 		true
			},
			containerID: 		vUID
		};
	
	//check if array exists from other custom JS Portlets
	if (typeof(FB.loadTemplate) === 'object') {
		FB.loadTemplate.portletUID.push('FB_multiUploadContainer_' + vUID);
	//create empty object
	} else {
		FB.loadTemplate = {
			portletUIDList:['FB_multiUploadContainer_' + vUID],
			portlets: {}
		};
	}

	//inject params
	FB.loadTemplate.portlets['FB_multiUploadContainer_' + vUID] = oConstructor;
});
/************************/
</script>


<@end@>

This is where the mustache template will be injected into to provide the multiUpload component functionality.

<div id="FB-multiCheckin" class="FB_multiCheckin"></div>

The oConstructor specifies the configuration of the the component to inject.
vTemplate points to a JS file that requireJS imports and configures the base multiUploader components from the params defined.

oParams contains all configuration for the app; at the moment, these are mostly hard coded, but could be defined as iDoc Variables when you install and enable the component within WCC.

var vUID = 'FB_multiCheckin', //(UID) Unique Classname to inject template into - can't use IDs in portal 
		oConstructor = {
			vTemplate: 'import/tpl/multiUploadTpl', //location of template.mustache to load
			oParams: { //Obj list of default params pulled from multiUploader.xml Input definition
				maxUploadSize:			'10mb',
				defaultDocType:			('<$multiUploadDefaultType$>' !== '')? '<$multiUploadDefaultType$>': 'Document', 
				defaultSecurityGroup:		('<$multiUploadDefaultSecurityGroup$>' !== '')? '<$multiUploadDefaultSecurityGroup$>': 'Public',
				defaultAccount:			'Workspace/'+userName, 
				author:				(typeof(userName) !== 'undefined')? userName: '', 
				httpEnterpriseCgiPath: 		(typeof(httpEnterpriseCgiPath) !== 'undefined')? httpEnterpriseCgiPath: '',
				idcToken: 			(typeof(idcToken) !== 'undefined')? idcToken: '',
				httpWebRoot: 			(typeof(httpWebRoot) !== 'undefined')? httpWebRoot: '',
				enableTagging:			true,
				enableEmails:			true,
				enableBarcode:			true,
				enableCheckinProfiles: 		true,
				showHelpOption: 		true
			},
			containerID: 		vUID
		};

This is a simple check to see if other components exist on the page and append the new component within the JS Array “PortletUIDList” associated with a JS Object holding the component params in “portlets”.

//check if array exists from other custom JS Portlets
	if (typeof(FB.loadTemplate) === 'object') {
		FB.loadTemplate.portletUID.push('FB_multiUploadContainer_' + vUID);
	//create empty object
	} else {
		FB.loadTemplate = {
			portletUIDList:['FB_multiUploadContainer_' + vUID],
			portlets: {}
		};
	}

	//inject params
	FB.loadTemplate.portlets['FB_multiUploadContainer_' + vUID] = oConstructor;

Finally, the JS configuration is wrapped in JS chain wrapper that will only initialize when required. JS has loaded in all its core base libraries like Jquery, etc.

FB.Base.chainPSA = FB.Base.onPortalInit(FB.Base.chainPSA, function() {

//code

});

 

4) So lets take a look at how the base component “FishbowlModuleLoader” works.

Essentially, this defines the FB.Base.chainPSA chain wrapper method in the header – does not need jquery or any other library.

<!--
Name:           std_html_head_declarations
Author:         John Sim  [18/06/2014]
Parameters:		
Description:	Add required header resources
-->
<@dynamichtml std_html_head_declarations@>
[[% FB std_html_head_declaration Update head add JS libs for module loader %]]

<$include super.std_html_head_declarations$>

<script>
/**
 * CREATE BASE JS CONTAINER OBJ
 * DONOT ADD JQUERY this is base class to assist PSA javascript methods to init after page loaded.
 */
var FB = window.FB || {},
	Base = Base || (function() {
		return {
			//create multi-cast delegate.
			onPortalInit: function(function1, function2) {
				return function() {
					if (function1) {
						function1();
					}
					if (function2) {
						function2();
					}
				}
			},
			//used for chaining methods
			chainPSA: function() {}
		}
	})();

//Use Base method if FB.Base hasn't been created
FB.Base = FB.Base || Base;
/************************/
</script>

<@end@>

You could cache this and put it in a script file, I’ve just put it inline easier for you to read.

In the footer, we define requireJS and the configuration that loads in base libraries that we need for all components ie Jquery and maybe a few others.
Also we setup fb.core.js as our base script to import and load in the core framework I built to handle routing and DOM event interaction as well as global vars.

<!--
Name:           std_page_end
Author:         John Sim  [18/06/2014]
Parameters:		
Description:	Component Module Loader RequireJS setup
-->
<@dynamichtml std_page_end@>
[[% FB std_page_end Add Module Loader RequireJS lib %]]

<$include super.std_page_end$>


<!-- Init FB Component Module Loader -->
<script src="<$HttpWebRoot$>resources/FishbowlModuleLoader/js/core/config.js"></script>
<script src="<$HttpWebRoot$>resources/FishbowlModuleLoader/js/libs/requirejs/require.min.js" data-main="fb.core"></script>
<!-- Init FB Component Module Loader -->
<@end@>

fb.core.js so here is where the magic begins:

// REQUIREJS Base configuration
require([
	//Dom ready req plugin
	'domReady',
	
	
	//core 
	'import/Layout',
	'import/Action',
	'import/Navigation',
	'import/Global',
	
	
	//Plugins
	'Moment',		//date plugin momentjs
	'ftlabsFastClick', 	//fix touch 300ms delay
	'fb'			//fb global methods
	

	
], function(domReady, Layout){
console.info('[ALL MODULES LOADED]');

	domReady(function() {
		console.info('[DOM READY]');
		
		//initialise layout DOM events ie click, touch etc.
		Layout.init();
		
		//load JS Components
		if (FB.Base.chainPSA) {
			FB.Base.chainPSA();
		}
		
		//check if any JS driven template containers exist
		if (typeof(FB.loadTemplate) !== 'undefined') {
			var aPortletList 	= FB.loadTemplate.portletUIDList,
				lPortletList 	= aPortletList.length,
				x 				= 0;
				
			//loop and request all templates required
			for (x;x<lPortletList;x++) {
				var vPortletUID 	= aPortletList[x],
					oPortlet 		= FB.loadTemplate.portlets[vPortletUID];
				
				//define temp object info to pass into script when init	
				define('temp'+x, oPortlet);
				
				//request and initialise portlet template & pass params
				require([oPortlet.vTemplate,'temp'+x], function(tpl,oPortlet) {
					console.log('[IMPORTED TEMPLATE]',tpl.component,oPortlet);
					tpl.init(oPortlet);
				});
			}
		}
		
	});
	
});

Once the Dom has fully loaded, FB.Base.chainPSA(); is initiated. This sets up and configures the FB.loadTemplate object that contains all information associated to required components that will need to be loaded into the page.

Here we loop through and load in all templates, and pass across the component configuration to the templates to be initialized:

//loop and request all templates required
			for (x;x<lPortletList;x++) {
				var vPortletUID 	= aPortletList[x],
					oPortlet 		= FB.loadTemplate.portlets[vPortletUID];
				
				//define temp object info to pass into script when init	
				define('temp'+x, oPortlet);
				
				//request and initialise portlet template & pass params
				require([oPortlet.vTemplate,'temp'+x], function(tpl,oPortlet) {
					console.log('[IMPORTED TEMPLATE]',tpl.component,oPortlet);
					tpl.init(oPortlet);
				});
			}

And thats all there is to it.

5) Lets dig into WebCenter Portal now. How can you reuse all that code you’ve written for WebCenter Content Classic within ADF?

Easy: let’s create a JS driven taskflow template that we can dump into the resource catalogue and drag, drop, and reuse it throughout any page where ever it is needed.

I’ve created a new post for this part:
Read on here to find out how to create JS Driven Taskflow templates.

 

Some gotcha’s - 

Some things to think about if you do decide to use this approach.

  1. You will need to make sure that all AJAX requests are made on the same domain.
    1. or enable CORs from UCM to accepts requests cross domain. (Mobile works crossdomain)
  2. WCC needs to be accessible by the users browser
    1. You can setup a proxy service and only allow access to the custom services you require to lock down other UCM environment access if needed.

And finally - one thing that comes to mind here: I am using static mustache templates but there is nothing stopping you from creating a custom WCC service to generate mustache templates with embedded idoc if you want..

The post Developing WebCenter Content Cross Platform iDoc Enabled Components for Mobile, ADF, Sharepoint, Liferay appeared first on Fishbowl Solutions' C4 Blog.

Categories: Fusion Middleware, Other

Teradata bought Hadapt and Revelytix

DBMS2 - Wed, 2014-07-23 02:29

My client Teradata bought my (former) clients Revelytix and Hadapt.* Obviously, I’m in confidentiality up to my eyeballs. That said — Teradata truly doesn’t know what it’s going to do with those acquisitions yet. Indeed, the acquisitions are too new for Teradata to have fully reviewed the code and so on, let alone made strategic decisions informed by that review. So while this is just a guess, I conjecture Teradata won’t say anything concrete until at least September, although I do expect some kind of stated direction in time for its October user conference.

*I love my business, but it does have one distressing aspect, namely the combination of subscription pricing and customer churn. When your customers transform really quickly, or even go out of existence, so sometimes does their reliance on you.

I’ve written extensively about Hadapt, but to review:

  • The HadoopDB project was started by Dan Abadi and two grad students.
  • HadoopDB tied a bunch of PostgreSQL instances together with Hadoop MapReduce. Lab benchmarks suggested it was more performant than the coyly named DBx (where x=2), but not necessarily competitive with top analytic RDBMS.
  • Hadapt was formed to commercialize HadoopDB.
  • After some fits and starts, Hadapt was a Cambridge-based company. Former Vertica CEO Chris Lynch invested even before he was a VC, and became an active chairman. Not coincidentally, Hadapt had a bunch of Vertica folks.
  • Hadapt decided to stick with row-based PostgreSQL, Dan Abadi’s previous columnar enthusiasm notwithstanding. Not coincidentally, Hadapt’s performance never blew anyone away.
  • Especially after the announcement of Cloudera Impala, Hadapt’s SQL-on-Hadoop positioning didn’t work out. Indeed, Hadapt laid off most or all of its sales and marketing folks. Hadapt pivoted to emphasize its schema-on-need story.
  • Chris Lynch, who generally seems to think that IT vendors are created to be sold, shopped Hadapt aggressively.

As for what Teradata should do with Hadapt:

  • My initial thought for Hadapt was to just double down, pushing the technology forward, presumably including a columnar option such as the one Citus Data developed.
  • But upon reflection, if it made technical sense to merge the Aster and Hadapt products, that would be better yet.

I herewith apologize to Aster co-founder and Hadapt skeptic Tasso Argyros (who by the way has moved on from Teradata) for even suggesting such heresy. :)

Complicating the story further:

  • Impala lets you treat data in HDFS (Hadoop Distributed File System) as if it were in a SQL DBMS. So does Teradata SQL-H. But Hadapt makes you decide whether the data is in HDFS or the SQL DBMS, and it can’t be in both at once. Edit: Actually, see Dan Abadi’s comments below.
  • Impala and Oracle’s new SQL-H competitor have daemons running on every data node. So does one option in Hadapt. But I don’t think SQL-H does that yet.

I was less involved with Revelytix that with Hadapt (although I’m told I served as the “catalyst” for the original Teradata/Revelytix partnership). That said, Teradata — like Oracle — is always building out a data integration suite to cover a limited universe of data stores. And Revelytix’ dataset management technology is a nice piece toward an integrated data catalog.

Related posts

Categories: Other

Data integration as a business opportunity

DBMS2 - Sun, 2014-07-20 21:59

A significant fraction of IT professional services industry revenue comes from data integration. But as a software business, data integration has been more problematic. Informatica, the largest independent data integration software vendor, does $1 billion in revenue. INFA’s enterprise value (market capitalization after adjusting for cash and debt) is $3 billion, which puts it way short of other category leaders such as VMware, and even sits behind Tableau.* When I talk with data integration startups, I ask questions such as “What fraction of Informatica’s revenue are you shooting for?” and, as a follow-up, “Why would that be grounds for excitement?”

*If you believe that Splunk is a data integration company, that changes these observations only a little.

On the other hand, several successful software categories have, at particular points in their history, been focused on data integration. One of the major benefits of 1990s business intelligence was “Combines data from multiple sources on the same screen” and, in some cases, even “Joins data from multiple sources in a single view”. The last few years before application servers were commoditized, data integration was one of their chief benefits. Data warehousing and Hadoop both of course have a “collect all your data in one place” part to their stories — which I call data mustering — and Hadoop is a data transformation tool as well.

And it’s not as if successful data integration companies have no value. IBM bought a few EAI (Enterprise Application Integration) companies, plus top Informatica competitor Ascential, plus Cast Iron Systems. DataDirect (I mean the ODBC/JDBC guys, not the storage ones) has been a decent little business through various name changes and ownerships (independent under a couple of names, then Intersolv/Merant, then independent again, then Progress Software). Master data management (MDM) and data cleaning have had some passable exits. Talend raised $40 million last December, which is a nice accomplishment if you’re French.

I can explain much of this in seven words: Data integration is both important and fragmented. The “important” part is self-evident; I gave examples of “fragmented” a couple years back. Beyond that, I’d say:

  • A new class of “engine” can be a nice business — consider for example Informatica/Ascential/Ab Initio, or the MDM players (who sold out to bigger ETL companies), or Splunk. Indeed, much early Hadoop adoption was for its capabilities as a data transformation engine.
  • Data transformation is a better business to enter than data movement. Differentiated value in data movement comes in areas such as performance, reliability and maturity, where established players have major advantages. But differentiated value in data transformation can come from “intelligence”, which is easier to excel in as a start-up.
  • “Transparent connectivity” is a tough business. It is hard to offer true transparency, with minimal performance overhead, among enough different systems for anybody to much care. And without that you’re probably offering a low-value/niche capability. Migration aids are not an exception; the value in those is captured by the vendor of what’s being migrated to, not by the vendor who actually does the transparent translation. Indeed …
  • … I can’t think of a single case in which migration support was a big software business. (Services are a whole other story.) Perhaps Cast Iron Systems came closest, but I’m not sure I’d categorize it as either “migration support” or “big”.

And I’ll stop there, because I’m not as conversant with some of the new “smart data transformation” companies as I’d like to be.

Related links

Categories: Other

The point of predicate pushdown

DBMS2 - Tue, 2014-07-15 07:52

Oracle is announcing today what it’s calling “Oracle Big Data SQL”. As usual, I haven’t been briefed, but highlights seem to include:

  • Oracle Big Data SQL is basically data federation using the External Tables capability of the Oracle DBMS.
  • Unlike independent products — e.g. Cirro — Oracle Big Data SQL federates SQL queries only across Oracle offerings, such as the Oracle DBMS, the Oracle NoSQL offering, or Oracle’s Cloudera-based Hadoop appliance.
  • Also unlike independent products, Oracle Big Data SQL is claimed to be compatible with Oracle’s usual security model and SQL dialect.
  • At least when it talks to Hadoop, Oracle Big Data SQL exploits predicate pushdown to reduce network traffic.

And by the way – Oracle Big Data SQL is NOT “SQL-on-Hadoop” as that term is commonly construed, unless the complete Oracle DBMS is running on every node of a Hadoop cluster.

Predicate pushdown is actually a simple concept:

  • If you issue a query in one place to run against a lot of data that’s in another place, you could spawn a lot of network traffic, which could be slow and costly. However …
  • … if you can “push down” parts of the query to where the data is stored, and thus filter out most of the data, then you can greatly reduce network traffic.

“Predicate pushdown” gets its name from the fact that portions of SQL statements, specifically ones that filter data, are properly referred to as predicates. They earn that name because predicates in mathematical logic and clauses in SQL are the same kind of thing — statements that, upon evaluation, can be TRUE or FALSE for different values of variables or data.

The most famous example of predicate pushdown is Oracle Exadata, with the story there being:

  • Oracle’s shared-everything architecture created a huge I/O bottleneck when querying large amounts of data, making Oracle inappropriate for very large data warehouses.
  • Oracle Exadata added a second tier of servers each tied to a subset of the overall storage; certain predicates are pushed down to that tier.
  • The I/O between Exadata’s two sets of servers is now tolerable, and so Oracle is now often competitive in the high-end data warehousing market,

Oracle evidently calls this “SmartScan”, and says Oracle Big Data SQL does something similar with predicate pushdown into Hadoop.

Oracle also hints at using predicate pushdown to do non-tabular operations on the non-relational systems, rather than shoehorning operations on multi-structured data into the Oracle DBMS, but my details on that are sparse.

Related link

Categories: Other

21st Century DBMS success and failure

DBMS2 - Mon, 2014-07-14 00:37

As part of my series on the keys to and likelihood of success, I outlined some examples from the DBMS industry. The list turned out too long for a single post, so I split it up by millennia. The part on 20th Century DBMS success and failure went up Friday; in this one I’ll cover more recent events, organized in line with the original overview post. Categories addressed will include analytic RDBMS (including data warehouse appliances), NoSQL/non-SQL short-request DBMS, MySQL, PostgreSQL, NewSQL and Hadoop.

DBMS rarely have trouble with the criterion “Is there an identifiable buying process?” If an enterprise is doing application development projects, a DBMS is generally chosen for each one. And so the organization will generally have a process in place for buying DBMS, or accepting them for free. Central IT, departments, and — at least in the case of free open source stuff — developers all commonly have the capacity for DBMS acquisition.

In particular, at many enterprises either departments have the ability to buy their own analytic technology, or else IT will willingly buy and administer things for a single department. This dynamic fueled much of the early rise of analytic RDBMS.

Buyer inertia is a greater concern.

  • A significant minority of enterprises are highly committed to their enterprise DBMS standards.
  • Another significant minority aren’t quite as committed, but set pretty high bars for new DBMS products to cross nonetheless.
  • FUD (Fear, Uncertainty and Doubt) about new DBMS is often justifiable, about stability and consistent performance alike.

A particularly complex version of this dynamic has played out in the market for analytic RDBMS/appliances.

  • First the newer products (from Netezza onwards) were sold to organizations who knew they wanted great performance or price/performance.
  • Then it became more about selling “business value” to organizations who needed more convincing about the benefits of great price/performance.
  • Then the behemoth vendors became more competitive, as Teradata introduced lower-price models, Oracle introduced Exadata, Sybase got more aggressive with Sybase IQ, IBM bought Netezza, EMC bought Greenplum, HP bought Vertica and so on. It is now hard for a non-behemoth analytic RDBMS vendor to make headway at large enterprise accounts.
  • Meanwhile, Hadoop has emerged as serious competitor for at least some analytic data management, especially but not only at internet companies.

Otherwise I’d say: 

  • At large enterprises, their internet operations perhaps excepted:
    • Short-request/general-purpose SQL alternatives to the behemoths — e.g. MySQL, PostgreSQL, NewSQL — have had tremendous difficulty getting established. The last big success was the rise of Microsoft SQL Server in the 1990s. That’s why I haven’t mentioned the term mid-range DBMS in years.
    • NoSQL/non-SQL has penetrated large enterprises mainly for a few specific use cases, for example the lists I posted for MongoDB or graph databases.
  • Internet-only companies have few inertia issues when it comes to database managers. They’ll consider anything they regard as being in their price ballpark (which is however often restricted to open source). I think part of the reason is that as quickly as they rewrite their applications, DBMS are vastly less “strategic” to them than they are to most larger enterprises.
  • The internet operations of large companies — especially large retailers — in many cases behave like internet-only companies, but in many other cases behave like the rest of the enterprise.

The major reasons for DBMS categories to get established in the first place are:

  • Performance and/or scalability (many examples).
  • Developer features (for example dynamic schema).
  • License/maintenance cost (for example several open source categories).
  • Ease of installation and administration (for example open source again, and also data warehouse appliances).

Those same characteristics are major bases for competition among members of a new category, although as noted above behemoth-loyalty can also come into play.

Cool-vs.-weird tradeoffs are somewhat secondary among SQL DBMS.

  • There’s not much of a “cool” factor, because new products aren’t that different in what they do vs. older ones.
  • There’s not a terrible “weird” factor either, but of course any smaller offering faces FUD, and also …
  • … appliances are anti-strategic for many buyers, especially ones who demand a smooth path to the cloud.)

They’re huge, however, in the non-SQL world. Most non-SQL data managers have a major “weird” factor. Fortunately, NoSQL and Hadoop both have huge “cool” cred to offset it. XML/XQuery unfortunately did not.

Finally, in most DBMS categories there are massive issues with product completeness, more in the area of maturity than that of whole product. The biggest whole product issues are concentrated on the matter of interoperating with other software — business intelligence tools, packaged applications (if relevant to the category), etc. Most notably, the handful of DBMS that are certified to run SAP share a huge market that other DBMS can’t touch. But BI tools are less of a differentiator — I yawn when vendors tell me they are certified for/partnered with MicroStrategy, Tableau, Pentaho and Jaspersoft, and I’m surprised at any product that isn’t.

DBMS maturity has a lot of aspects, but the toughest challenges are concentrated in two main areas:

  • Reliability, especially but not only in short-request use cases.
  • Performance across a great variety of use cases. I observe frequently that performance in best-case scenarios, performance in the lab and performance in real-world environments are much further apart than vendors like to think.

In particular:

  • Maturity demands seem to be much higher for SQL DBMS than for NoSQL.
    • I think this is one of several reasons NoSQL has been much more successful than NewSQL.
    • It’s why I think MarkLogic’s “Enterprise NoSQL” positioning is a mistake.
  • As for MySQL:
    • MySQL wasn’t close to reliable enough for enterprises to trust it until InnoDB became the default storage engine.
    • MySQL 5 point releases have added major features, or decent performance for major features. I’ll confess to having lost track of what’s been fixed and what’s still missing.
    • In saying all that I’m holding MySQL to a much higher maturity standard than I’m holding NoSQL — because that’s what I think enterprise customers do.
  • PostgreSQL “should” be doing a lot better than it is. I have an extremely low opinion of its promoters, and not just for personal reasons. (That said, the personal reasons don’t just apply to EnterpriseDB anymore. I’ve also run out of patience waiting for Josh Berkus to retract untruths he posted about me years ago.)
  • SAP HANA checks boxes for performance (In-memory rah rah rah!!) and whole product (Runs SAP!!). That puts it well ahead of most other newish SQL DBMS, purely analytic ones perhaps excepted.
  • Any other new short-request SQL DBMS that sounds like is has traction is also memory-centric.
  • Analytic RDBMS are in most respects held to lower maturity standards than DBMS used for write-intensive workloads. Even so, products in the category are still frequently tripped up by considerations of concurrent performance and mixed workload management.

Related links

There have been 1,470 previous posts in the 9-year history of this blog, many of which could serve as background material for this one. A couple that seem particularly germane and didn’t get already get linked above are:

Categories: Other

Big Data in the Cloud at Google I/O

William Vambenepe - Tue, 2014-07-01 00:55

Last week was a great party for the entire Google developer family, including Google Cloud Platform. And within the Cloud Platform, Big Data processing services. Which is where my focus has been in the almost two years I’ve been at Google.

It started with a bang, when our fearless leader Urs unveiled Cloud Dataflow in the keynote. Supported by a very timely demo (streaming analytics for a World Cup game) by my colleague Eric.

After the keynote, we had three live sessions:

In “Big Data, the Cloud Way“, I gave an overview of the main large-scale data processing services on Google Cloud:

  • Cloud Pub/Sub, a newly-announced service which provides reliable, many-to-many, asynchronous messaging,
  • the aforementioned Cloud Dataflow, to implement data processing pipelines which can run either in streaming or batch mode,
  • BigQuery, an existing service for large-scale SQL-based data processing at interactive speed, and
  • support for Hadoop and Spark, making it very easy to deploy and use them “the Cloud Way”, well integrated with other storage and processing services of Google Cloud Platform.

The next day, in “The Dawn of Fast Data“, Marwa and Reuven described Cloud Dataflow in a lot more details, including code samples. They showed how to easily construct a streaming pipeline which keeps a constantly-updated lookup table of most popular Twitter hashtags for a given prefix. They also explained how Cloud Dataflow builds on over a decade of data processing innovation at Google to optimize processing pipelines and free users from the burden of deploying, configuring, tuning and managing the needed infrastructure. Just like Cloud Pub/Sub and BigQuery do for event handling and SQL analytics, respectively.

Later that afternoon, Felipe and Jordan showed how to build predictive models in “Predicting the future with the Google Cloud Platform“.

We had also prepared some recorded short presentations. To learn more about how easy and efficient it is to use Hadoop and Spark on Google Cloud Platform, you should listen to Dennis in “Open Source Data Analytics“. To learn more about block storage options (including SSD, both local and remote), listen to Jay in “Optimizing disk I/O in the cloud“.

It was gratifying to see well-informed people recognize the importance of these announcement and partners understand how this will benefit their customers. As well as some good press coverage.

It’s liberating to now be able to talk freely about recent progress on our quest to equip Google Cloud users with easy to use data processing tools. Everyone can benefit from Google’s experience making developers productive while efficiently processing data at large scale. With great power comes great productivity.

Categories: Other

Using multiple data stores

DBMS2 - Wed, 2014-06-18 10:03

I’m commonly asked to assess vendor claims of the kind:

  • “Our system lets you do multiple kinds of processing against one database.”
  • “Otherwise you’d need two or more data managers to get the job done, which would be a catastrophe of unthinkable proportion.”

So I thought it might be useful to quickly review some of the many ways organizations put multiple data stores to work. As usual, my bottom line is:

  • The most extreme vendor marketing claims are false.
  • There are many different choices that make sense in at least some use cases each.

Horses for courses

It’s now widely accepted that different data managers are better for different use cases, based on distinctions such as:

Vendors are part of this consensus; already in 2005 I observed

For all practical purposes, there are no DBMS vendors left advocating single-server strategies.

Vendor agreement has become even stronger in the interim, as evidenced by Oracle/MySQL, IBM/Netezza, Oracle’s NoSQL dabblings, and various companies’ Hadoop offerings.

Multiple data stores for a single application

We commonly think of one data manager managing one or more databases, each in support of one or more applications. But the other way around works too; it’s normal for a single application to invoke multiple data stores. Indeed, all but the strictest relational bigots would likely agree: 

  • It’s common and sensible to manage authentication and authorization data in its own data store. Commonly, the data format is LDAP (Lightweight Directory Access Protocol).
  • It’s common and sensible to manage the “content” and “e-commerce transaction records” aspects of websites separately.
  • Even beyond that case, there are often performance reasons to manage BLOBs (Binary Large OBjects) outside your relational database.
  • Internet “interaction” data is also often best managed outside an RDBMS, in part because of its very non-tabular data structures.

The spectacular 2010 JP Morgan Chase outage was largely caused, I believe, by disregard of these precepts.

There also are cases in which applications dutifully get all their data via SQL queries, but send those queries to two or more DBMS. Teradata is proud that its systems can support rather transactional queries (for example in call-center use cases), but the same application may read from and write to a true OTLP database as well.

Further, many OLTP (OnLine Transaction Processing) applications do some fraction of their work via inbound or outbound messaging. Many buzzwords can come into play here, including but not limited to:

  • SOA (Service-Oriented Architecture). This is the most current and flexible one.
  • EAI (Enterprise Application Integration). This was a hot concept in the late 1990s, but was generally implemented with difficulties that SOA was later designed to alleviate.
  • Message-oriented middleware (MOM) and Publish/Subscribe. These are even older, and overlap greatly.

Finally, every dashboard that combines information from different data stores could be assigned to this category as well.

Multiple storage approaches in a single DBMS

In theory, a single DBMS could operate like two or more different ones glued together. A few functions should or must be centralized, such as administration, and communication with the outside world (connection handling, parsing, etc.). But data storage, query execution and so on could for the most part be performed by rather loosely coupled subsystems. And so you might have the best of both worlds — something that’s multiple data stores in the ways you want that diversity, but a single system in how it fits into your environment.

I discussed this idea last year with cautious optimism, writing:

So will these trends succeed? The forgoing caveats notwithstanding, my answers are more Yes than No.

  • …  multi-purpose DBMS will likely always have performance penalties, but over time the penalties should become small enough to be affordable in most cases.
  • Machine-generated data and “content” both call for multi-datatype DBMS. And taken together, those are a large fraction of the future of computing. Consequently …
  • … strong support for multiple datatypes and DMLs is a must for “general-purpose” RDBMS. Oracle and IBM [have] been working on that for 20 years already, with mixed success. I doubt they’ll get much further without a thorough rewrite, but rewrites happen; one of these decades they’re apt to get it right.

In 2005 I had been more ambivalent, in part because my model was a full 1990s-dream “universal” DBMS:

IBM, Oracle, and Microsoft have all worked out ways to have integrated query parsing and query optimization, while letting storage be more or less separate. More precisely, Oracle actually still sticks everything into one data store (hence the lack of native XML support), but allows near-infinite flexibility in how it is accessed. Microsoft has already had separate servers for tabular data, text, and MOLAP, although like Sybase, it doesn’t have general datatype extensibility that it can expose to customers, or exploit itself to provide a great variety of datatypes. IBM has had Oracle-like extensibility all along, although it hasn’t been quite as aggressive at exploiting it; now it’s introduced a separate-server option for XML.

That covers most of the waterfront, but I’d like to more explicitly acknowledge three trends:

  • Among other things, Hadoop is a collection of DBMS (HBase, Impala, et al.) that in some cases are very loosely coupled to each other. The question is less how well the various data stores work together, and more how mature any one of them is on its own.
  • The multiple-data-models idea has been extended into schema-on-need, which is sometimes but not always housed in Hadoop.
  • Even on the relational side, multiple storage capabilities exist in one product.
    • Vertica was designed that way from the get-go. (Like the old joke about police duos, one is to read and one is to write.)
    • IBM, Microsoft and Oracle have all recently added some kind of in-memory columnar capability.
    • Teradata, Aster (before Teradata bought them), Greenplum and Vertica all added some variant on row/column dual stores.

Related links

Categories: Other

Announcing Fishbowl’s Technical Support Offerings for Oracle WebCenter

support_logo

Supporting an enterprise software system like Oracle WebCenter is no easy task. Technical complexities, customizations, and multiple versions make it difficult to resolve issues quickly and keep the system up and running. Without a dedicated and knowledgeable support team, WebCenter environments can suffer from system downtime, poor performance, and frustrated users.

Join Fishbowl Solutions for a webinar as they discuss their Oracle WebCenter technical support offerings. These offerings include specific technical services to support WebCenter administrators, end users, as well as customized environments. If you are a WebCenter administrator, power user, or an IT Director/Manager that oversees your company’s WebCenter environment, this webinar is for you. Come hear how Fishbowl’s support offerings could help you increase up-time, improve SR issue resolution, and ensure overall user satisfaction.

Attendees of this webinar will learn:

  • The reasons Fishbowl is best positioned to be your single point of contact for Oracle WebCenter technical support
  • What support services does Fishbowl offer and what does each include
  • The benefits Cascade Corporation has already realized with Fishbowl’s Enterprise Support offering for Oracle WebCenter

Date: Thursday, June 12th
Time: 1:00 – 2:00 PM EST, 12 – 1:00 PM CST

Register: https://www2.gotomeeting.com/register/941379506

 

The post Announcing Fishbowl’s Technical Support Offerings for Oracle WebCenter appeared first on Fishbowl Solutions' C4 Blog.

Categories: Fusion Middleware, Other