Searching for a specific document out of millions can be a daunting task, particularly if you don’t know what you’re searching for. Often the title is on the tip of your tongue, and you would know it if you saw it. Wouldn’t it be nice if search tools accommodated for vague criteria just as easily as pinpoint queries? What if users could take a heap of guesses and whittle it down into a small set of relevant results?
We set about to solve this problem for WebCenter Content by leveraging a tool designed to handle massive amounts of data. Microsoft’s PivotViewer feeds off data sources and molds them into views for the end user to consume. The most popular example of this technology in action is the Netflix catalog at http://netflixpivot.cloudapp.net, where 1000 movies from Netflix Instant are pulled down and organized by year, cast, rating, and more. The applications for such powerful control over this data are clear for anyone who can’t remember the name of the movie that starred so-and-so and was released at the turn of the century. We immediately recognized the value of this within the domain of Digital Asset Management, and so we brought it to WebCenter.
PivotViewer is a control for Microsoft Silverlight, which is installed as a browser plugin much like Flash. Once it receives a collection of data that it can understand, PivotViewer organizes the data by common attributes called facets, allowing documents to be sorted and filtered on any metadata field. The thumbnail rendition is pulled in to represent the document in the canvas. Silverlight operates asynchronously, meaning that it doesn’t need to wait for every image to download before it can be used.
This control is made accessible on the main search result page. In practice, users can perform a quick search for latest documents or use existing search methods to gather a large set of documents, and drill down from those results using PivotViewer. All that it needs is a QueryText parameter in the URL.
For example, say I was looking for a Powerpoint presentation that held an important piece of information, but I could only reliably identify it by its red background. I would first use the full-text search for fragments of content, narrowing candidates down to 200 or so results. These would be piped into PivotViewer to show two-dozen red-colored documents. Using the metadata filters, I would select the Presentation document type and the date range of its release, yielding 2 documents. This process allows quick retrieval in spite of the vague search criteria, and is much more precise than wading through 10 pages of possibilities.
Selecting a document brings up a short list of content information; these fields can be customized for each distribution of the component. Each of these fields is a hyperlink that can quickly create a filter on its value. Say that a collection of documents was checked in together: by finding one document and filtering on its Release Date and Document Type, the entire collection is immediately available to me. I can also create a filter across all fields with a keyword search.
Complementing the default grid layout is a bar chart representation of results along any metadata field. This view is helpful for identifying patterns within data, allowing me to actively pivot on fields and drill down on interesting pockets of documents. Every action is recorded in a breadcrumb trail at the top of the control, so if I ever get lost, a few clicks will undo the filters I’ve added and get me back to where I was.
All of these features are packed into a content server component and ready to be installed in a few clicks. Contact our sales team at email@example.com to discuss your search needs and schedule a demo.
Senior Software Consultant, AJ LaVenture recaps the new Multi-Upload and Batch Metadata Editor just released from Fishbowl Solutions!
This software component allows you to upload and tag lots of content at once, saving time and frustration.
Hear about all the benefits now: http://bit.ly/17UG1rU
The post Video Recap: WebCenter Content (Oracle UCM) Multi-Upload and Batch Metadata Editor appeared first on C4 Blog by Fishbowl Solutions.
2. Numerous vendors are blending SQL and JSON management in their short-request DBMS. It will take some more work for me to have a strong opinion about the merits/demerits of various alternatives.
The default implementation — one example would be Clustrix’s — is to stick the JSON into something like a BLOB/CLOB field (Binary/Character Large Object), index on individual values, and treat those indexes just like any others for the purpose of SQL statements. Drawbacks include:
- You have to store or retrieve the JSON in whole documents at a time.
- If you are spectacularly careless, you could write JOINs with odd results.
IBM DB2 is one recent arrival to the JSON party. Unfortunately, I forgot to ask whether IBM’s JSON implementation was based on IBM DB2 pureXML when I had the chance, and IBM hasn’t gotten around to answering my followup query.
3. Nor has IBM gotten around to answering my followup queries on the subject of BLU, an interesting-sounding columnar option for DB2.
4. Numerous clients have asked me whether they should be active in DBaaS (DataBase as a Service). After all, Amazon, Google, Microsoft, Rackspace and salesforce.com are all in that business in some form, and other big companies have dipped toes in as well.
I’m skeptical that one can succeed both in that market and in selling database software, for reasons including:
- Nobody I can think of has done so.
- The value propositions are different.
- DBaaS is about having administration be so easy that you the customer doesn’t need to worry about it.
- Database software is about one or more of:
- Development ease.
- Big-enterprise/legacy-vendor considerations.
I’m also skeptical about service-only DBaaS strategies, because users will naturally resist vendor lock-in.
But despite all my skepticism, DBaaS is an area I should probably learn more about.
5. I plan to spend more time looking at machine learning and other advanced analytics. I doubt they’ll soon match the past few years’ hype about “big data analytics”, but even the reality of modern analytics looks like it’s getting more interesting. Ditto if somebody has an interesting twist on more traditional predictive analytics.
6. Three years ago, I wrote:
- It is inevitable* that governments and other constituencies will obtain huge amounts of information, which can be used to drastically restrict everybody’s privacy and freedom.
- To protect against this grave threat, multiple layers of defense are needed, technical and legal/regulatory/social/political alike.
- One particular layer is getting insufficient attention, namely restrictions upon the use (as opposed to the acquisition or retention) of data.
*And indeed in many ways even desirable
It is now frighteningly obvious that the US is becoming a high-surveillance society. The Boston Marathon bombing added three new elements to an already snowballing trend:
- A revelation that the FBI could track Tamerlan Tsarnaev’s communication content without any known warrant.
- A further revelation that the police know how to put on large paramilitary displays of force (and that the public generally approves).
- An increased belief that widespread video surveillance of public places is a Good Thing.
I need to write more about privacy.
Fishbowl Solutions was recently featured on Oracle’s Blog during WebCenter Partners Week, showcasing our mobile application for iPhone/Android – FishbowlToGo. Mobility product manager, Kim Negaard authored a post detailing how our newest mobility venture helps WebCenter customers get the most from their investment.
Access Oracle WebCenter Content on your iPhone or Android with FishbowlToGo
Fishbowl Solutions has been working with Oracle WebCenter customers since 2010 to extend WebCenter Content to mobile devices. We started working with mobile sales force enablement and have since extended our offerings to meet expanding customer needs. We are excited to announce the release of our newest mobile app, FishbowlToGo.
Read the whole blog post here: http://bit.ly/ZHLDxX
The post Fishbowl Solutions featured on Oracle Blog for WebCenter Partners Week appeared first on C4 Blog by Fishbowl Solutions.
Today I would like to share something Fishbowl Solutions has been working on internally for a little while now and started to implement at customer sites – Fishbowl Multi-Upload and Batch Metadata Editor.
This component was initially developed as part of Fishbowl’s Innovation Event. The combination of the first place and second place idea meshed very well together in delivering a seamless method for bulk contributing and editing metadata. This is now within production at a customer site with several modifications made to satisfy their requirements.
This was done using a combination of several API’s, PlupLoad (Oracle uses an older version of this for their Drag & Drop upload feature within WebCenter Spaces Document Explorer Taskflow), Handsontable, JQuery, (in addition to extensively utilizing my WebCenter Content and Jquery Framework for calling WebCenter Content services) and Fishbowl’s overall knowledge of WebCenter and web development techniques.
The most compelling feature that was added for this deployment is “Profile Awareness”. By this I mean all aspects of the profile and rules set up within content server are taken into consideration. This includes, but is not limited to:
- Metadata field state (hidden, edit, info only, required, excluded)
- Custom field labels
- Standard list and profile restricted lists for drop down lists
- Date selection
- Default values for profiles
- Metadata field ordering if rule is set as a group
Here are several screenshots of the features and use case it provides (click on any of them to see actual sizes):
- A user had several images to upload and know they will go within a certain profile. Navigate to the upload page and drag and drop the files into the drop area:
- All of the items are now checked into the content server into a private workspace for that user. Within the workspace you can filter by keywords uploaded and categorize content by profile. (Note: You can also tag content without profiles as well).
- As there are bound to be erroneous uploads of duplicates, or extra files, supporting a delete function was crucial.
- The user is now ready to check the boxes for the items they want to categories and tag with final metadata. Here we present the user with a spreadsheet within the browser. This is built using the Handsontable JQuery plugin which supports common Excel like features; copy / paste, undo, and cell dragging. UCM is integrated to provide a high level of context while editing this data; Dropdowns, date fields, required fields, and info only fields aid in user tagging.
- With column support for dropdowns and dates.
- Once the user is done editing, they can execute an update. The table will provide feedback in real time as each item is updated. The result of the update will be relayed to the user via row highlighting and an error / exception table informing them of the failure.
This expands upon the use case of updating content already in the system with the spreadsheet (Note: That use case is still supported, however, locked down to administrators only).
I hope you find this post compelling about the power that Fishbowl can provide by combining ideas from an innovation event with the years of experience Fishbowl has within the WebCenter Content (UCM) world to provide ease of contribution and bulk editing of content. For more information, feel free to reach out to us at 952-465-3400 or firstname.lastname@example.org.
Senior Software Consultant
The post WebCenter Content (Oracle UCM) Multi-Upload and Batch Metadata Editor appeared first on C4 Blog by Fishbowl Solutions.
Collaborate brought Fishbowl Solutions to Denver, Colorado this year. Overall, it was another well-coordinated and well-attended event. Special kudos go to Al Hoof and Dave Chaffee of the WebCenter SIG for IOUG (Independent Oracle Users Group). They spend a lot of time scheduling the WebCenter sessions, scanning the attendees who go to those sessions, and providing a friendly face each morning. Thanks again guys.
Here are some themes and hot topics that I picked up on this year. Some of these were discussed last year as well, but based on session attendance and booth traffic this year, these topics seemed to stand out even more and attendees dug deeper into benefits and ROI.
User Experience – Portals and Intranets
Some customers that deployed the initial versions of WebCenter Portal 11g struggled to roll it out on a large scale. Additionally, user feedback was pretty negative. Overall performance was poor and usability was marginal. Since those initial versions or patch sets, specifically PS2 and PS3, WebCenter Portal has become much more stable and usable. Customers have seen this as well, and they are now looking to evolve their initial deployments and create that next-generation intranet or portal.
One of the key considerations moving forward though is user experience. They want their portals and intranets to provide the flash or sizzle that make them inviting, but they also want the navigation to be intuitive and the contribution capabilities to be open yet governed. They are also looking for the overall user experience to be personalized, so that users have similar yet different experiences that help them to keep coming back. Lastly, they want their portals and intranets to be that true, one-stop shot that has always been the goal but has been hard to achieve. This means that they want to integrate data from other business applications, such as customer purchase history from PeopleSoft or JDEdwards, or employee expenses from E-Business Suite. The customers we talked to really stressed not only getting internal or external users to visit the intranet or portal, but also stay to consume or share information, and keep coming back. Again, the goal that customers are trying to achieve is to provide one view into the business processes or information that users need daily from one site – instead of having to jump between or open multiple applications to complete tasks or connect with others.
It was good timing for Fishbowl Solutions to be able to talk about portal and intranet use cases, and how those use cases could be further enabled or extended using our Intranet In A Box solution. WebCenter Customers are no different than other enterprise application customers – they all would like a starting point and “accelerator” to begin using the system. Fishbowl’s Intranet In A Box, as detailed in American Axle’s Collaborate presentation and white paper, helps them do just that. It also incorporates and enables user experience capabilities and application integrations, providing that portal or intranet jumpstart to build an enterprise system.
Mobile Content Management
No surprise that, once again, mobility and mobile content management were popular topics at Collaborate. In fact, they have been popular topics for many years now. I remember back to Collaborate 2010, which took place around the same time that the Apple iPad was released. Fishbowl Solutions announced its mobile strategy – extending WebCenter Content to smartphones – at this event as well, and it seems the excitement for mobile ECM has been building ever since.
2013 finds Oracle, and more specifically WebCenter, customers thinking about or planning their mobile strategy as it applies to content management. This seems to be the next evolutionary step for most organizations, which is being driven in party by the rise of the tablet and other mobile devices in the workplace. See our recent Mobile Tablet Application webinar for more details on tablet usage in the workplace. What used to be more of a pull from employees – I have a tablet, where are my business-enabled mobile applications and content? – is turning into a push from the business with governance and use case policies being put into place for mobile technologies. The reality is the mobile-enabled employee is the more productive employee, so organizations are providing this enablement but doing so with proper control and oversight.
This applies to extending high-value sales and marketing collateral, stored in Oracle WebCenter Content, to mobile devices as well. Customers that we talked to at Collaborate were aware of Oracle’s Application Developer Framework (ADF) Mobile, which Oracle announced in October of last year. We received questions on what the differences are between that solution and our Mobile ECM offerings, including our tablet and phone apps. The easy answer is that while Oracle ADF Mobile can be used to create feature-rich, powerful mobile applications, if WebCenter customers want to consume, share and interact with WebCenter assets from their mobile devices, they would have to build such an application themselves – pretty much fro scratch. Fishbowl offers packaged mobile offerings for iOS and Android, and we have customers in production with Applie iPads and Android tablets – including Banner Engineering (Collaborate preso and white paper).
Document Imaging – ROI
The last topic I would like to mention is document imaging. I’m not sure how many document imaging sessions there were at Collaborate, but I know the two I attended were packed. Document imaging and capture technologies continue to represent sure-fire ways to reduce business process costs. The most popular process where these technologies have been applied is invoice processing. Fishbowl Solutions was fortunate to partner with Land O’ Lakes for a presentation – here is their white paper as well – on their document imaging use case, and what really resonated in their presentation was the amount of manual invoice steps they were able to eliminate with document capture and imaging.
What stood out to me most from conversations at Collaborate was how hungry WebCenter customers were to realize ROI. Having made significant investments in the WebCenter stack over the years, they were looking for projects that would produce hard-dollar, measurable ROI. That isn’t to discredit how WebCenter is being used for websites, portals, or records management, but many times these use cases represent overhead that are much harder to measure. Document Imaging, and specifically Oracle’s end-to-end invoice processing system, helps organizations reduce invoice processing costs by reducing labor costs and late fees, while also making it possible for organizations to realize early pay discounts. For these reasons, I expect to see more WebCenter customers ramp up imaging projects over the next few years.
Collaborate returns to Las Vegas next year and will be held at the Venetian. Until then, good luck with your WebCenter projects, and feel free to contact Fishbowl if you need any assistance.
Email delivery of posts has been screwed up; multiple people tell me they haven’t gotten their email for months. (In the future, please tell me of such difficulties!) So it’s time for a change, and I’m asking for your advice as to what you’d suggest for our mailing list.
Yes, I’m asking via a blog post, even thought the core problem is that people who want to see my posts via e-mail aren’t getting them. Please work with me on this anyway.
My two basic questions are:
- What should be the frequency of delivery? To date, it’s been nightly (at least in theory).
- What delivery technology should be used? To date, it’s been FeedBlitz.
1. The nightly scheduling has been an artifact of an RSS-to-email link that no longer seems stable. So I’m thinking of just manually pasting each post into a list email, in which case:
- Posts could be sent without delay.
- Every post would be delivered by separate mail. (As opposed to having only one post per night be mailed, while others just get linked to.)
It’s a bit more work for me, but probably nothing dire. Does lower latency sound good to everybody?
2. The main technical options seem to be:
- Free services oriented to discussion lists, such as Yahoo Groups, but set to announce-only. These have very basic functionality.
- Commercial services oriented to marketing email lists, such as Aweber or MailChimp. Does anybody have favorable or unfavorable experience with particular services? Most vendors surely use one or another, but it’s tough to guess which they’ve selected just based on their spam and pabulum informative communications, given the customizability those services provide.
Any thoughts would be most welcomed.
3. And while I’m at it — what I should I do for social/sharing buttons? Presumably, if I included buttons that made it easy for you to tweet links to my posts, submit them to Hacker News, etc., more of you would do so. Which specific options would you like to use?
- Google +?
- Hacker News?
Anything else? I’d like to omit the more dubious possibilities, as offering everything could be a lot of clutter …
Learn how to extend the content management capabilities of Oracle WebCenter Content to Apple iPads and Android tablets. Plan to attend this webinar to see how Fishbowl’s Mobile Library Tablet Application enables users to find, store, view, organize and share content from tablets while away from their desks. We will discuss the underlying technologies that make this app possible, as well as how these applications are able to securely deliver targeted content for sales reps and other field workers.
Tablets for Sales Enablement
Mobile Content Management can increase the effectiveness of customer, prospect and partner meetings by enabling sales professionals to access and share information directly from their iPads or Android tablets with clients during sales meetings. Sharing high-value collateral from mobile devices, as opposed to PowerPoint presentations or paper brochures, can increase the engagement level for such meetings, eliminate the need for paper catalogs, and make the sales professional more efficient.
Tablets for Field Workers & Procedure Management
The Fishbowl Mobile Library can provide field workers such as contractors, maintenance crews and construction teams with offline access to project plans, standard procedures, or installation instructions. This means your team is never without the information they need to resolve issues, keep projects moving, and respond appropriately to disasters.
Thursday May 2, 2013
1pm EST, 12pm CST
The post Join us for a Webinar on Fishbowl’s Mobile Library Tablet Application appeared first on C4 Blog by Fishbowl Solutions.
My quick reaction to the Actian/ParAccel deal was negative. A few challenges to my views then emerged. They didn’t really change my mind.
Amazon did a deal with ParAccel that amounted to:
- Amazon got a very cheap license to a limited subset of ParAccel’s product …
- … so that it could launch a service called Amazon Redshift.
- Amazon also invested in ParAccel.
Some argue that this is great for ParAccel’s future prospects. I’m not convinced.
No doubt there are and will be Redshift users, evidently including Infor. But so far as I can tell, Redshift uses very standard SQL, so it doesn’t seed a ParAccel market in terms of developer habits. The administration/operation story is similar. So outside of general validation/bragging rights, Redshift is not a big deal for ParAccel.
OEMs and bragging rights
It’s not just Amazon and Infor; there’s also a MicroStrategy deal to OEM ParAccel — I think it’s the real ParAccel software in that case — for a particular service, MicroStrategy Wisdom. But unless I’m terribly mistaken, HP Vertica, Sybase IQ and even Infobright each have a lot more OEMs than ParAccel, just as they have a lot more customers than ParAccel overall.
This OEM success is a great validation for the idea of columnar analytic RDBMS in general, but I don’t see where it’s an advantage for ParAccel vs. the columnar leaders.
As I admitted in the comment thread to my first Actian/ParAccel post, I’m confused about what kind of concurrent usage ParAccel can really support. The data I have, e.g. in the link immediately above, is not conclusive. Googling suggests that VectorWise was at one user per core a couple of years ago, supportive of my hypothesis that it doesn’t have some big concurrency edge on ParAccel. But to repeat — I don’t really know.
DBMS acquisitions in the past
My history blog on DBMS acquisitions yielded more favorable examples than I was expecting. (Of course, I omitted a lot of small and boring failures.) And DBMS conglomerates are the rule more than the exception, with IBM, Sybase, Teradata and Oracle all adopting acquisition-aided multi-DBMS strategies, at least to some extent.
That said, Sybase is the main example of a vendor of a slow-growth DBMS (Adaptive Server Enterprise) doing well with a faster-growing one (Sybase IQ). Perhaps not coincidentally, Actian’s latest management team draws significantly on Sybase. So yes; ParAccel is now owned by a company run by guys who know something about selling columnar DBMS.
But the whole thing would be more convincing if Ingres had shown more life under Actian’s ownership, or indeed at any point in the past 20 years. My bottom line is that Actian was floundering badly in the DBMS market 1 1/2 years ago, and not a lot of favorable news has emerged in the interim — except, quite arguably, for the management changes and acquisitions themselves.
Actian, which already owns VectorWise, is also buying ParAccel. The argument for why this kills VectorWise is simple. ParAccel does most things VectorWise does, more or less as well. It also does a lot more:
- ParAccel scales out.
- ParAccel has added analytic platform capabilities.
- I don’t know for sure, but I’d guess ParAccel has more mature management/plumbing capabilities as well.
One might conjecture that ParAccel is bad at highly concurrent, single-node use cases, and VectorWise is better at them — but at the link above, ParAccel bragged of supporting 5,000 concurrent connections. Besides, if one is just looking for a high-use reporting server, why not get Sybase IQ?? Anyhow, Actian hasn’t been investing enough in VectorWise to make it a major market player, and they’re unlikely to start now that they own ParAccel as well.
But I expect ParAccel to fail too. Reasons include:
- ParAccel’s small market share and traction.
- The disruption of any acquisition like this one.
- My general view of Actian as a company.
2 years after being acquired, Vertica — which conceptually has always been ParAccel’s closest competitor — has finally taken major hits on engineering staffing. Even so, I expect HP Vertica to reopen what was once a large technology and momentum gap vs. ParAccel.
My views on Actian start:
- Actian is attempting to build a database software conglomerate on the cheap, starting with Ingres, ParAccel, VectorWise, Pervasive (itself a small conglomerate) and Versant.
- Actian hasn’t accomplished much with Ingres, its original acquisition.
- Actian hasn’t accomplished much with VectorWise.
- Actian’s brief, embarrassing pivot away from database software was a joke. (The comments at that link also show VectorWise’s positioning as very different in September, 2011 than it is now.)
- I’ve had some very bad experiences with Actian management, although it seems to have largely turned over since then.
- I can’t identify the folks to make this work at the acquired pieces either (even though I think well of a few of them, e.g. Mike Hoskins and Rick Glick).
I.e., building a database conglomerate is hard, and Actian isn’t up to the challenge.
Actian has three main paths it can follow for synergy:
- Acquire a lot of pieces and flip the whole thing for more money to a foolish buyer. This strategy worked splendidly for Autonomy, and to some extent for Sybase as well. But it’s a longshot, and not necessarily a win for customers even if investors do well.
- Sell a bunch of disparate products through the same sales force. Tough to execute. And at best it raises sales coverage up to the level of that for the most successful product — and Actian doesn’t really have successful new products.
- Integrate the technologies. Blech. You don’t integrate DBMS with wildly different architectures, as Informix died trying in the 1990s.
I don’t see enough opportunity there for the whole thing to work out, with sales synergy being the best opportunity to prove me wrong.
I talk with a lot of companies, and repeatedly hear some of the same application themes. This post is my attempt to collect some of those ideas in one place.
1. So far, the buzzword of the year is “real-time analytics”, generally with “operational” or “big data” included as well. I hear variants of that positioning from NewSQL vendors (e.g. MemSQL), NoSQL vendors (e.g. AeroSpike), BI stack vendors (e.g. Platfora), application-stack vendors (e.g. WibiData), log analysis vendors (led by Splunk), data management vendors (e.g. Cloudera), and of course the CEP industry.
Yeah, yeah, I know — not all the named companies are in exactly the right market category. But that’s hard to avoid.
Why this gold rush? On the demand side, there’s a real or imagined need for speed. On the supply side, I’d say:
- There are vast numbers of companies offering data-management-related technology. They need ways to differentiate.
- Doing analytics at short-request speeds is an obvious data-management-related challenge, and not yet comprehensively addressed.
2. More generally, most of the applications I hear about are analytic, or have a strong analytic aspect. The three biggest areas — and these overlap — are:
- Customer interaction
- Network and sensor monitoring
- Game and mobile application back-ends
Also arising fairly frequently are:
- Algorithmic trading
- Risk measurement
- Law enforcement/national security
- Stakeholder-facing analytics
I’m hearing less about quality, defect tracking, and equipment maintenance than I used to, but those application areas have anyway been ebbing and flowing for decades.
3. Much of customer interaction revolves around recommendation and personalization. In connection with that I’ll remind you:
- Multiple sources say that 5 millisecond response is a real need. Srini Srinivasan explained why in a January comment.
- The results of the recommendation and personalization can be delivered in many different ways — product recommendations, ads, special offers, email, snail mail, call center scripts and more. This is the paradigmatic example for my skepticism about complete analytic applications.
4. Networks and sensors emit the epitome of machine-generated data. Data sources include web logs, network logs (in the IT sense), telecommunication networks, other utilities (e.g. electric), vehicle fleets, and more. Application themes include:
- Human monitoring, via some kind of real-time business intelligence view. I hear about that a lot.
- Various kinds of automated response. (Security is an obvious example.)
- Integration with other kinds of application, data source, or use case.
As one example of the last point, Oliver Ratzesberger told me years ago that eBay had up-to-the-minute BI cubes integrating customer response and log data, for the purpose of quickly detecting technology problems. Acunu recently told me that similar applications are one of their sales focuses.
5. In another example, games and mobile applications can be a lot like websites in terms of the analytics that support them (all the more so if we’re talking about games with in-app purchases). Two special features come up repeatedly, however — leaderboards for games, and geospatial data sent by mobile devices.
6. Algorithmic trading is flashy because of the sums of money involved, and because of what is often hyper-low latency; I’ve even heard 50 microseconds, and that’s a slightly out of date figure for a sequence of several atomic operations. But otherwise it’s not one of the more interesting areas to me, for at least two reasons:
- It depends on a lot of latency-specific stuff, such as hand-crafted hardware.
- The participants are secretive — understandably so as they’re literally in a race with each other –and don’t reveal much.
Another reason I don’t study it much is that high-frequency trading could be devastated at any time by some simple regulatory changes.
7. I finally figured out one of the big drivers for better risk analysis. Banks need to keep capital lying around to cover a fraction of the risk they take on. If they can estimate the risk more precisely, and come up with a lower number, then they need to keep less capital. That’s a lot like finding large bags of money.
8. Anti-fraud applications arise in many industries, with many different kinds of data and latency requirement. For example:
- Insurers don’t want to pay bogus claims. They usually have weeks to think about that problem.
- Telcos don’t want to provision services for customers who will defraud them. They have to decide at call-center speed.
- Similarly, retailers don’t want to accept bogus returns.
- Stockbrokers don’t want rogue traders to defeat their controls. A lot of data and analysis go into that mission, as billions of dollars — literally — can be at stake.
9. And finally, the recent Boston Marathon bombing has brought law-enforcement/anti-terrorism applications to the fore. The Boston Globe criticized difficulties in information sharing, but the money quote is:
The FBI followed up by checking government databases and looking for things such as “derogatory telephone communications, possible use of online sites associated with the promotion of radical activity, associations with other persons of interest, travel history and plans, and education history,” according to FBI Supervisory Agent Jason J. Pack. “The FBI also interviewed Tamerlan Tsarnaev and family members. The FBI did not find any terrorism activity.”
Neither the telephone intercept nor the web-surfing tracking is a capability the government routinely admits, unless there was something like a wiretap order that I so far haven’t seen reported.
- Government surveillance is even more inevitable than when I wrote in 2010 that freedom can only be preserved by limiting government USES of data.
- Stakeholder-facing analytics isn’t much better understood than when I wrote about it in 2010.
- I wrote up a different list of analytic use cases back in 2006.
- The continued drop in high-frequency trading latency strengthens my 2009 contrast between the speed of a turtle and the speed of light; we’re now over a 3 * 10^10 difference between the speed of trading and the speed of generic planning, and many turtles walk well faster than 1 cm/sec.
The third of my three MySQL-oriented clients I alluded to yesterday is MemSQL. When I wrote about MemSQL last June, the product was an in-memory single-server MySQL workalike. Now scale-out has been added, with general availability today.
MemSQL’s flagship reference is Zynga, across 100s of servers. Beyond that, the company claims (to quote a late draft of the press release):
Enterprises are already using distributed MemSQL in production for operational analytics, network security, real-time recommendations, and risk management.
All four of those use cases fit MemSQL’s positioning in “real-time analytics”. Besides Zynga, MemSQL cites penetration into traditional low-latency markets — financial services (various subsectors) and ad-tech.
Highlights of MemSQL’s new distributed architecture start:
- There are two kinds of MemSQL node — “aggregator” and “leaf”.
- Aggregators are a kind of head node. You can have a bunch of them.
- Leafs run full single-server MemSQL. You can have a bunch of them too.
- MemSQL has two query optimizers. One kind runs on the aggregator nodes, and thinks about the whole cluster. The other runs on the leafs, and only thinks about its own node.
- Much of the join and aggregation work is done on the aggregator nodes, but I didn’t pursue that issue in much detail.
- It is good policy — and supported — to replicate small dimension/reference tables across the cluster. These are replicated to aggregator and leaf nodes alike. (This tells us that some joins are indeed done on the leafs. )
- MemSQL replication can be synchronous or asynchronous. It can be used for high availability.
- MemSQL writes (whether primary or replicated) go to a buffer. The buffer size can be 0 or positive, in a tradeoff of durability vs. the likelihood of a disk I/O bottleneck.
- MemSQL has many virtual nodes on each physical (leaf) node. (This is pretty much an industry-standard best practice, as it helps with elasticity, recovery from node failure, and so on.)
- Compression is still a future feature.
- So is online schema change.
- Leaf nodes have cost-based optimizers.
- MemSQL’s aggregator (cluster-wide) optimizer is mainly heuristic, but is supposed to get more cost-based in future releases.
- In some releases it will be possible to keep MemSQL running while upgrading the software. But that’s not a promise for releases that change how replication works.
And which not-easily-parallelized aggregate did MemSQL implement first? The same one Platfora did — COUNT DISTINCT.
Last week, I edited press releases back-to-back-to-back for three clients, all with announcements at this week’s Percona Live. The ones with embargoes ending today are Tokutek and GenieDB.
Tokutek’s news is that they’re open sourcing much of TokuDB, but holding back hot backup for their paid version. I approve of this strategy — “doesn’t lose data” is an important feature, and well worth paying for.
I kid, I kid. Any system has at least a bad way to do backups — e.g. one that involves slowing performance, or perhaps even requires taking applications offline altogether. So the real points of good backup technology are:
- To keep performance steady.
- To make the whole thing as easy to manage as possible.
GenieDB is announcing a Version 2, which is basically a performance release. So in lieu of pretending to have much article-worthy news, GenieDB is taking the opportunity to remind folks of its core marketing messages, with catchphrases such as “multi-regional self-healing MySQL”. Good choice; indeed, I wish more vendors would adopt that marketing tactic.
Along the way, I did learn a bit more about GenieDB. In particular:
- GenieDB is now just backed by a hacked version of InnoDB (no more Berkeley DB Java Edition).
- Why hacked? Because GenieDB appends a Lamport timestamp to every row, which somehow leads to a need to modify how indexes and caching work.
- Benefits of the chamge include performance and simpler (for the vendor) development.
- An arguable disadvantage of the switch is that GenieDB no longer can use Berkeley DB’s key-value interface — but MySQL now has one of those too.
I also picked up some GenieDB company stats I didn’t know before — 9 employees and 2 paying customers.
Teradata is announcing its new high-end systems, the Teradata 6700 series. Notes on that include:
- Teradata tends to get 35-55% (roughly speaking) annual performance improvements, as measured by its internal blended measure Tperf. A big part of this is exploiting new-generation Intel processors.
- This year the figure is around 40%.
- The 6700 is based on Intel’s Sandy Bridge.
- Teradata previously told me that Ivy Bridge — the next one after Sandy Bridge — could offer a performance “discontinuity”. So, while this is just a guess, I expect that next year’s Teradata performance improvement will beat this year’s.
- Teradata has now largely switched over to InfiniBand.
Teradata is also talking about data integration and best-of-breed systems, with buzzwords such as:
- Teradata Unified Data Architecture.
- Fabric-based computing, even though this isn’t really about storage.
- Teradata SQL-H.
The upshot is that Teradata has at least 6 kinds of rack or cabinet it wants to sell you — along with software to connect them — of which it really thinks you should get at least 3:
- The 4 main Teradata-software appliances:
- Active Enterprise Data Warehouse (the new 6700). Teradata thinks every sufficiently large enterprise should have one of these.
- Extreme Performance Appliance (Teradata 4xxx), based on solid-state drives (which are also used in the 6xxx systems). At least I think so; the 4xxx wasn’t in the most recent slide deck I saw.
- Data Warehouse Appliance (Teradata 2700).
- Extreme Data Appliance (Teradata 1650).
- The Teradata Aster Big Analytics Appliance, running Aster and Hadoop software. Teradata basically thinks everybody should have one of these too.
- A separate cabinet for special-purpose “Teradata Managed Servers”. While there’s some space for Managed Servers in other Teradata appliances, Teradata now offers so many such capabilities that it thinks you will likely need a separate rack for those as well. These include (partial list):
- Viewpoint system management.
- Teradata Unity.
- Data movement, which is not the same thing as Teradata Unity.
- Data loading, which is yet something else.
- Generic compute (notably, to run SAS).
Even that doesn’t exhaust the possibilities:
- The 36 InfiniBand ports Teradata can fit into a cabinet aren’t enough, it suggests and presumably will sell you free-standing Mellanox switches as an alternative.
- That slide deck split the Big Analytics Appliance back out into Aster and Hadoop options.
- There also seems to be a SAS-specific modeling appliance.
And you can have — or in some cases must have — Teradata Managed Server nodes in other kinds of Teradata appliance.
Finally, Teradata also offers a stand-alone single- or several-node Teradata 670 Data Mart Appliance, notes on which include:
- The Teradata 670′s entry price is under $1/2 million, if you want to use it as your first Teradata system (something that evidently is happening, mainly outside the Americas).
- Another use for the Teradata 670 is for physical — as opposed to virtual — data mart spin-out.
- The primary use for the Teradata Data Mart Appliance, however, seems to be test/development for larger Teradata systems.
- The Teradata Data Mart Appliance is one of the options for placing in a separate managed-server Teradata rack.
As vendors so often do, Teradata has caused itself some naming confusion. SQL-H was introduced as a facility of Teradata Aster, to complement SQL-MR.* But while SQL-MR is in essence a set of SQL extensions, SQL-H is not. Rather, SQL-H is a transparency interface that makes Hadoop data responsive to the same code that would work on Teradata Aster …
*Speaking of confusion — Teradata Aster seems to use the spellings SQL/MR and SQL-MR interchangeably.
… except that now there’s also a SQL-H for regular Teradata systems as well. While it has the same general features and benefits as SQL-H for Teradata Aster, the details are different, since the underlying systems are.
I hope that’s clear.
I talked Friday with Deep Information Sciences, makers of DeepDB. Much like TokuDB — albeit with different technical strategies — DeepDB is a single-server DBMS in the form of a MySQL engine, whose technology is concentrated around writing indexes quickly. That said:
- DeepDB’s indexes can help you with analytic queries; hence, DeepDB is marketed as supporting OLTP (OnLine Transaction Processing) and analytics in the same system.
- DeepDB is marketed as “designed for big data and the cloud”, with reference to “Volume, Velocity, and Variety”. What I could discern in support of that is mainly:
- DeepDB has been tested at up to 3 terabytes at customer sites and up to 1 billion rows internally.
- Like most other NewSQL and NoSQL DBMS, DeepDB is append-only, and hence could be said to “stream” data to disk.
- DeepDB’s indexes could at some point in the future be made to work well with non-tabular data.*
- The Deep guys have plans and designs for scale-out — transparent sharding and so on.
*For reasons that do not seem closely related to product reality, DeepDB is marketed as if it supports “unstructured” data today.
Other NewSQL DBMS seem “designed for big data and the cloud” to at least the same extent DeepDB is. However, if we’re interpreting “big data” to include multi-structured data support — well, only half or so of the NewSQL products and companies I know of share Deep’s interest in branching out. In particular:
- Akiban definitely does. (Note: Stay tuned for some next-steps company news about Akiban.)
- Tokutek has planted a small stake there too.
- Key-value-store-backed NuoDB and GenieDB probably leans that way. (And SanDisk evidently shut down Schooner’s RDBMS while keeping its key-value store.)
- VoltDB, Clustrix, ScaleDB and MemSQL seem more strictly tabular, except insofar as text search is a requirement for everybody. (Edit: Oops; I forgot about Clustrix’s approach to JSON support.)
Edit: MySQL has some sort of an optional NoSQL interface, and hence so presumably do MySQL-compatible TokuDB, GenieDB, Clustrix, and MemSQL.
Also, some of those products do not today have the transparent scale-out that Deep plans to offer in the future.
Among the 10 people listed as part of Deep Information Sciences’ team, I noticed 2 who arguably had DBMS industry experience, in that they worked at virtualization vendor Virtual Iron, and stayed on for a while after Virtual Iron was bought by Oracle. One of them, Chief Scientist & Architect Tom Hazel, also was at Akiban for a few months, where he did actually work on a DBMS. Other Deep Information Sciences notes include:
- Deep has 25 or so people in all.
- Deep had a recent $10 million funding round.
- Deep Information Sciences is the former Cloudtree, which as of February, 2011 was pursuing quite a different strategy. (Evidently there was a pivot.) Deep was founded in 2010.
- There are 2 paying customers for DeepDB, even though it’s still in beta, and 8 trials. A similar number of trials and strategic partners are queued up.
- DeepDB general availability is expected later this quarter.
Although our call was blessedly technical, we didn’t have a chance to go through the DeepDB architecture in great detail. That said, DeepDB seems to store data in all of 3 ways:
- An in-memory row store.
- An on-disk row store with a very different architecture.
- Indexes, which can also serve as a column store.
Notes on that include:
- DeepDB’s in-memory row store is designed to manage single rows as much as possible, rather than pages. Indeed, there are “aspects of tries”, although we didn’t drill down into what exactly that meant.
- Indexes are streamed to disk no less than once every 15 seconds, by default, and perhaps with latency as low as 10 milliseconds.
- Perhaps the most important point I didn’t grasp is “segments”. The data and indexes on disk are stored in segments, which can be of different sizes, and which may each carry some summary data/metadata/whatever. Somehow, this is central to DeepDB’s design.
- In what is evidently a design focus, DeepDB tries to get the benefit of “in-memory data” that isn’t actually taking up RAM. B-trees can point at rows that aren’t actually in memory. Segments evicted from cache can leave some metadata or summary data behind.
- DeepDB’s compression story seems to be a work in progress.
- There’s prefix compression already, at least in the indexes, which Deep just calls “compaction”.
- Other compression is working in the lab, but not scheduled for Version 1.0.
- Block compression seems to be in play.
- Delta compression was mentioned once
- Dictionary compression wasn’t mentioned at all.
- DeepDB apparently will keep compressed data in cache, then decompress it to operate on it.
- Different segments can be compressed/uncompressed differently.
- DeepDB’s on-disk row store is append-only. Time-travel is being worked on. While I forgot to ask, it seems likely that DeepDB has MVCC (Multi-Version Concurrency Control).
And finally: DeepDB in its current form is a “drop-in” InnoDB replacement, but not necessarily bug-compatible.
I have been using the analogy that sometimes getting WebCenter projects started, progressed or completed is like climbing a mountain. Customers aren’t always sure where to begin, how to stay on path, or what obstacles may lie ahead. Most customers seem to want to evolve their WebCenter use cases, say from standard content management to an enterprise portal, but not knowing such things as the amount of effort required, technical complexities, and deployment options tends to keep such projects at the base of the proverbial WebCenter mountain.
What better place to start your trek up that mountain than Denver, Colorado – site of Collaborate 13. Fishbowl Solutions will be there, and we would enjoy discussing your WebCenter projects and how we might assist in helping those projects get started, progressed and completed – avoiding the cliffs and jagged rocks along the way. We would also like to share with you some new and exciting ways that your trek can be made easier through our value-add WebCenter solutions. Here is a quick description of the solutions we will highlight at Collaborate 13:
- Mobile Applications: Access WebCenter Content on Apple and Android mobile devices
- Google Search Appliance Connector: Improve the relevancy of search results across your WebCenter-based systems
- Intranet In A Box: Framework to Build a Next-Generation Intranet in 60 Days
- WebCenter Upgrade Package: Comprehensive plan to move to 11g
These solutions will be demonstrated in our booth – #1277 – and will be discussed across our six presentations. Be sure to check out our Collaborate 13 page for all the details on our Collaborate activities. We look forward to helping you start your WebCenter ascent at Collaborate 13.
The post Reach the Oracle WebCenter Summit with Fishbowl Solutions at Collaborate 13 appeared first on C4 Blog by Fishbowl Solutions.
Last year back in Feb we had PS5 and now with PS6 the WebCenter Suite released yesterday I can say its all just getting better and better!..
A rundown of the new Jdev 18.104.22.168.0 Enhancements can be seen here
Here are some of the items that catch my eye and you may have seen it on the twitter stream with a couple early tweets before the official release.
Firstly the new Skyros Skin (I’m presuming after the Greek Island)
Very clean and great looking skin; uses a lot of CSS3 properties instead of hundred of images to structure components – tabs degrades nicely for older browsers IE8 and below ie rounded corners become square.
Also a few new skin selector properties that tidy up the structure for better skinning development – I’ll try to post some updates later on to give you a rundown of some of the new enhancements with skinning.
You can see it in action in the new PS6ADF Faces Rich Client here
There are a few DVT extras like Sunburst; although I’m sorry to say I’ve never been too impressed with DVTs you get out of the box.
PanelGridLayout makes it across from R2 into R1PS6 looks promising
Follows the CSS3 specs for grid layout so it can be optimized for layout performance and is also the recommended UI layout component for most pages.
Runtime code editor Finally colour coded goodness!
I believe it’s using codeMirror great job.
File Uploader also looks interesting haven’t tried it out yet – drag drop support is interesting with java support for older browsers looking forward to seeing it in action.
Hmm. I probably should have broken this out as three posts rather than one after all. Sorry about that.
Discussions of DBMS performance are always odd, for starters because:
- Workloads and use cases vary greatly.
- In particular, benchmarks such as the YCSB or TPC-H aren’t very helpful.
- It’s common for databases or at least working sets to be entirely in RAM — but it’s not always required.
- Consistency and durability models vary. What’s more, in some systems — e.g. MongoDB — there’s considerable flexibility as to which model you use.
- In particular, there’s an increasingly common choice in which data is written synchronously to RAM on 2 or more servers, then asynchronously to disk on each of them. Performance in these cases can be quite different from when all writes need to be committed to disk. Of course, you need sufficient disk I/O to keep up, so SSDs (Solid-State Drives) can come in handy.
- Many workloads are inherently single node (replication aside). Others are not.
MongoDB and 10gen
I caught up with Ron Avnur at 10gen. Technical highlights included:
- MongoDB’s tunable consistency seems really interesting, with numerous choices available at the program-statement level.
- All rumored performance problems notwithstanding, Ron asserts that MongoDB often “kicks butt” in actual proof-of-concept (POC) bake-offs.
- Ron cites “12 different language bindings” as a key example of developer functionality giving 10gen an advantage vs. Ron’s previous employer MarkLogic.
- 10gen is working hard on management tools, security, and so on.
- Ron claims that the “MongoDB loses data” knock is a relic of the distant — i.e. 1-2 years ago — past.
- We had the same “Who needs joins?” discussion that I used to have with MarkLogic — Ron’s former company — and which MarkLogic has since disavowed.
- There’s nothing special about MongoDB’s b-tree indexes. (I mention that because Tokutek thinks it offers a faster MongoDB indexing option.)
While this wasn’t a numbers-oriented conversation, business highlights included:
- A lot of MongoDB’s competition is RDBMS — Oracle, SQL Server, MySQL, etc.
- MongoDB’s top NoSQL competitor is Cassandra. 10gen sees less Couchbase than before, and also less HBase than Cassandra.
- There’s yet another favorable MongoDB soft metric — 50,000 registrants for free online education, 2/3 outside the US.
I can add that anecdotal evidence from other industry participants suggests there’s a lot of MongoDB mindshare.
Specific traditional-enterprise use cases we discussed focused on combining data from heterogeneous systems. Specifically mentioned were:
- Reference data/360-degree customer view.
- Reference data about securities.
- Aggregation of analytic results from various analytic systems across an enterprise. (For risk management).
DBAs’ roles in development
A lot of marketing boils down to “We don’t need no stinking DBAs!!!” I’m thinking in particular of:
- Hadoop and/or exploratory BI* messaging that positions against the alleged badness of “traditional data warehousing”.
*See in particular the comments to that post.
The worst-case data warehousing scenario is indeed pretty bad. It could feature:
- Much internal discussion and politicking to determine the One True Way to view various data fields, with …
- … lots of ongoing bureaucratic safeguards in the area of data governance.
- Long additional efforts in the area of performance tuning.
- Data integration projects up the wazoo.
But if the goal is just to grab some data from an existing data warehouse, perhaps add in some additional data from the outside, and start analyzing it — well, then there are many attempted solutions to that problem, including from within the analytic RDBMS world. The question is whether the data warehouse administrators try to help — which usually means “Here’s your data; now go away and stop bothering me!” — or whether they focus on “business prevention”.
Meanwhile, on the NoSQL side:
- The smart folks at WibiData felt the need for schema-definition tools over HBase.
- Per Ron Avnur, MongoDB users are clamoring for consistency-rule specification via an administrative (rather than programmatic) UI.
It’s the old loose-/tight-coupling trade-off. Traditional relational practices offer a clean interface between database and code, but bundle the database characteristics for different applications tightly together. NoSQL tends to tie the database for any one app tightly to that app, at the cost of difficulties if multiple applications later try to use the same data. Either can make sense, depending on (for example):
- How it seems natural to organize your development and data administration talent.
- Whether the app is likely to survive long enough that you’ll want to run many other applications against the same database.