Dylan Wan
Data Mining Scoring Development Process
I think that the process of building a data mining scoring engine is similar to develop an application.
We have the requirement analysis, functional design, technical design, coding, testing, deployment, etc. phases.
Data Mining Scoring Development Process
I think that the process of building a data mining scoring engine is similar to develop an application.
We have the requirement analysis, functional design, technical design, coding, testing, deployment, etc. phases.
Data Warehouse for Big Data: Scale-Up vs. Scale-Out
Found a very good paper: http://research.microsoft.com/pubs/204499/a20-appuswamy.pdf
This paper discuss if it is a right approach of using Hadoop as the analytics infrastructure.
It is hard to argue with the industry trend. However, Hadoop is not new any more. It is time for people to calm down and rethink about the real benefits.
Technologies behind Oracle Transactional Business Intelligence (OTBI)
Oracle Transactional Business Intelligence (OTBI) is one of the business intelligence solutions provided as part of Fusion Applications.
To build a real-time BI, the major challenge is to make sure that it can perform and has no or minimum interfere to the core objective of the transactional application, the online processing.
This is the reason why we need Oracle Business Intelligence Applications (OBIA) for Fusion Applications. The idea is to keep the minimal processing of detecting changes and capturing changes in the transactional system and leave everything else, such as, preparing and consolidating the data for reporting, to BI Applications.
Here are some of the technologies available to make OTBI possible:
1. SQL Trimming from ADF
ADF stands for Application Development Framework. It is the application development framework used in developing Fusion Applications. In general, it is a declarative metadata driven framework to let the application developers to define the data model, define the data access layer, define the UI rendering, put the validation logic and processing in the middle tier.
The underlying data model, in most of cases, is still the relational model defined in the Fusion Apps transactional database under the 3rd NF design.
The key enabling technologies provided from ADF to OTBI is the “Composite VO” or “Composite View Object”. For me, it can generate the database SQL for us based on the metadata. Unlike the database technology using the database view, ADF engine can look further down to the entity objects included in the view object and selectively choose which entities are needed in a given SQL. If the view object includes two tables (EOs), one primary EO for data at the line level, and the other EO for getting the parent data, When the query (Composite VO) does not include any column from the parent EO, the SQL generated by ADF will not include the table in the join.
This is a superior technologies, comparing to the old technologies of building the business views.
If you are a Java programmer and would like to get the feeling about what Composite View Object looks like and how it works, here is a good blog post:
Do you know what is a Composite View Object?
2. BI Platform – ADFQuery to Composite VO
This enabling technology is provided by BI platform and available as a Java library. It adds a layer on top of the ADF composite VO. Without writing the Java code, it generates the codes of creating the composite VO on the fly. It allows us to query the data from the ADF engine by sending them a XML block called ADFQuery.
This doc shows some of the ADFQuery XML blocks.
http://docs.oracle.com/cd/E15586_01/fusionapps.1111/e20836/adf.htm#BIEMG3435
To see better examples, you can find them in NQQuery.log files.
It is a query language like SQL. You have the section for the column projection, the join criteria using view links, and the filter using view criteria.
Here are other enabling technologies behind OTBI.
3. ADFQuery generation from BI Server
4. SQL By Pass Database
5. Relational to Dimensional Mapping (Physical Layer to Business Model Layer)
6. SELECT Physical in initialization block
7. ADFQuery Initialization block
8. Physical Lookup function from BI platform
9. Logical Lookup function from BI platform
10. Data Security enabled at the VO layer via Fusion AppCore
11. Applcore Tree Flattening
12. Applcore Business Intelligence Column Flatten VO (BICVO)
13. BI Flexfield VO generator
14. BI Extender via Import Wizard
15. BI View Object created based on the BI EE Logical SQL (BIJDBC)
16. Effective Date VO with as of date filter
17. ADF Application Module to BI variable interface
and more…
Regardless, the goal of these technologies is to enable the users to get the real time data access to the Fusion Apps. There is really little or no much we can do for providing the feature like data snapshot, pre-built aggregate, multiple currencies, data consolidation and conformance, cross subject area analysis, and the most important, the query performance with complexity logic to be available in a reasonable time without the interfere to the transactional system.
Using Load Plan for managing your ETL task in BI Apps 11.1.1.7.1 (1)
One of the major change introduced in BI Apps 11.1.1.7.1 is the way how we manage the ETL task sequence and trim the unnecessary tasks.
This functionality was accomplished earlier using DAC. The problem we frequently faced was that the DAC repository and the INFA repository are maintained as two separate repositories. We have to sync up the name of tasks exactly in order to use DAC to manage the task execution of the Informatica workflow tasks.
Load Plan and Load Plan Generator was designed for addressing this requirement.
Here is a good article that describes the story.
Load Plan Generator – An Inside LookBI Apps 11.1.1.7.1 (BI Apps in ODI) is available in OTN
OTBI vs. OBIA
Why we do not use PowerConnect to access PeopleSoft Tree
1. It does not allow you to use parameters to the PeopleSoft connect. It may be changed later. However, it was a big issue when we try to address customer issues.
2. It requires EFFDT as an option.It expect that people change the EFFDT using Mapping Editor. How can a business user does that every month?
3. It asks for a Tree Name. Many PeopleSoft tree structure supports multiple trees. Tree is just a header of the hierarchy. Whenever you add a new Tree, you need to create a new mapping!!
It does not make sense to use PowerConnect due to the customer demands. All requirements are from customers.
We have no choice but stop using it.
Why do we not report by the Gregorian calendar?
- The number of days is different in each calendar month.
- The week and month cannot be aligned. The number of weekends is different in each calendar month.
- The number of working days is different in each calendar month. It ends up that the number of days in each quarter is also different.
- The period closing day will fall into different days in each period. The accounting department prefers always close the period by a given day in a week, such as Wednesday or Friday.
13 Period Calednar
In this posting, I will discuss the following topics:
* What is the 13 period calendar? * Who uses the 13 period calendar? * How is it different from the 4-4-5 calendar?
Essbase and IBM DB2
I read an interesting article, IBM DB2 Minus OLAP from the SQL Server magazine. Essbase used to be OEM-ed and re-branded by IBM as IBM DB2 OLAP server for ten years. The relationship stopped two yeas ago.
Many DB2 customers actually built their custom analytics applications on the top of Essbase.
Oracle BI Applications and Embedded BI, Part II
This is a topic I wrote in six month ago. In the Part I of this series ofarticles, I mentioned that a warehouse like architecture is required ina heterogeneous environment. I want to elaborate more about this. Inthe future posts, I will also describe the integration technology Ilearned for supporting the embedded BI.
Key Roles involved in a BI Data Warehouse Project
To develop ordeploy a BI solution for your organizations, you need to have the rightpeople involved in the time time. Here are typical roles involved in aBI data warehouse project.
- Project Sponsor
- Project Manager
- Functional Analyst
- SME
- BI Architect
- ETL Developers
- DBA
The job description and responsibilities are listed in this table: Read the rest of this entry >>
DSS and BI
I found a very old book, called Decision Support Systems: An Organizational Perspective, in a library last weekend. It was written by Peter Keen,an author of several popular books, which help many business managersand users understand the value of information technology. His DSS bookdraw my attention because he is also the author of my textbook Network in Actions.
The DSS book uses a very typical and conventional categorization system which puts the IT systems into three types:
Transactional System, Structure Decision system, and Decision Support System.
These categories are created based on the classification ofdecisions into structured, unstructured, and partially structureddecision. His focus is the 3rd category, DSS. Peter believes that a DSSshould assist in solving the semi-structured problems. A DSS shouldsupport, not replace, the managers.
I feel that the above is a very good framework to view the role ofan analytics apps. A BI analytics application should be a DSS solution.However, BI analytics apps can do much more then just a decisionsupport system. BI may help the structured decision making.
BI is not just a collection of reports. The design of a BI analyticsapps needs to consider what are the business decision need to make andwhat kind of information is helpful for making the decision.
Data Warehouse Project Lifecycle
Here is the typical lifecycle for data warehouse deployment project:
0. Project Scoping and Planning
Project Triangle - Scope, Time and Resource.
- Determine the scope of the project - what you would like toaccomplish? This can be defined by questions to be answered. The numberof logical star and number of the OLTP sources
- Time - What is the target date for the system to be available to the users
- Resource - What is our budget? What is the role and profile requirement of the resources needed to make this happen.
1. Requirement
- What are the business questions? How does the answers of these questions can change the business decision or trigger actions.
- What are the role of the users? How often do they use the system?Do they do any interactive reporting or just view the defined reportsin guided navigation?
- How do you measure? What are the metrics?
2. Front-End Design
- The front end design needs for both interactive analysis and the designed analytics workflow.
- How does the user interact with the system?
- What are their analysis process?
3. Warehouse Schema Design
- Dimensional modeling - define the dimensions and fact and define the grain of each star schema.
- Define the physical schema - depending on the technology decision.If you use the relational tecknology, design the database tables
4. OLTP to data warehouse mapping
- Logical mapping - table to table and column to column mapping. Also define the transformation rules
- You may need to perform OLTP data profiling. How often the data changes? What are the data distribution?
- ETL Design -include data staging and the detail ETL process flow.
5. Implementation
- Create the warehouse and ETL staging schema
- Develop the ETL programs
- Create the logical to physical mapping in the repository
- Build the end user dashboard and reports
6. Deployment
- Install the Analytics reporting and the ETL tools.
- Specific Setup and Configuration for OLTP, ETL, and data warehouse.
- Sizing of the system and database
- Performance Tuning and Optimization
7. Management and Maintenance of the system
- Ongoing support of the end-users, including security, training, and enhancing the system.
- You need to monitor the growth of the data.