Dylan's BI Notes
My notes about Business Intelligence, Data Warehousing, OLAP, and Master Data Management
Updated: 8 hours 51 min ago
Predictive Analytics and AI/ML
We have heard about advanced analytics, which was described by dividing analytics into three main types: While many articles explore these concepts,, such as Prescriptive vs. Predictive Analytics: Examples & Use Cases I aim to approach these categories from the standpoint of a software vendor, describe how these categories relate to machine learning practices. The […]
Categories: BI & Warehousing
Semantic Layer for Data Scientist
I recently read a good buyer’s guide from AtScale: The Buyer’s Guide to the Best Semantic Layer Tools for Data and Analytics. I think the buyer’s guide is fair not because that the company I worked for also has such semantic layer, but I really feel that the drawback of the vendor specific semantic layer […]
Categories: BI & Warehousing
Cloud Database and Cloud DataLake
The term DataLake was invented to describe the data storage and the fact that after Hadoop and HDFS were introduced, you can have a cheaper way and place to store your data without using a traditional database, by traditional, I mean a RDBMS, relational database management system. Cheaper is not just about cost, it is […]
Categories: BI & Warehousing
ML Data Engineering and Feature Store
A typical ML process flow is about Load the data Explore and Clean the data Create features Create ML model Deploy the ML model for inference/prediction The problem of this flow is that it ignores the fact that the process has to be repeatable and the data need to be reused. In the real world, […]
Categories: BI & Warehousing
Getting Data into Cloud
When I worked on the data warehousing technologies, we extract the data from the source. The “Extract” is the first step in ETL (or ELT). The extraction was typically done by using SQL connection to the database that holds the transactional data. When we start introducing cloud based storage, or the Data Lake, many of […]
Categories: BI & Warehousing
Time Series
Times Series is defined as a series of data, typically values of a variables, the value of which may change over time. A set of statistical methods were developed for analyzing such data. Those methods help to understand and interpret the data, and once the data can be understood, then the model can be used […]
Categories: BI & Warehousing
Migrating from OBI to Incorta
I am sharing my experience of migrating from OBI to Incorta. Process Start with Incorta EBS Blueprint Configure and customize for the deploying company Optionally, Demo the Fusion Connector Preview and demo to business users using their own data Provide the existing OBI dashboard usage analysis – Help prioritize the replacement project Provide the lineage […]
Categories: BI & Warehousing
Oracle App Cloud and Incorta
OTBI is great. But when people are migrating from Oracle EBS to Oracle Cloud App, they would like to view the data from both EBS and Oracle Cloud, Incorta becomes a cost saving and a quick implementation solution without implementing a data warehouse. Incorta is not a data warehouse although it does has the data […]
Categories: BI & Warehousing
Scalable Distributed BI Architecture
Incorta, a scalable distributed BI system...
Categories: BI & Warehousing
Is ETL still necessary?
ETL stands for Extract, Transform, and Load. Extract and Load, their existence itself implies that the source data and target data are stored separately, so you need to extract from source and load the data into the target data store. Extract and Load won’t go away if the data used for reporting is not stored […]
Categories: BI & Warehousing
Is Star Schema necessary?
A star schema describes the data by fact and dimension. From one angle, it is a data modeling technique for designing the data warehouse based on relational database technology. In the old OLAP world, even though a cube is also links to the dimensions that describe the measure, we typically won’t call them Star Schema. […]
Categories: BI & Warehousing
Incremental ETL : Streaming via Micro-Batch
A modern analytic application takes the approach of streaming data to perform the similar process as the traditional data warehousing incremental ETL. Actually, if we look into Spark Streaming in details, the concept of streaming in Spark and Incremental ETL are the same: Spark Streaming is a Micro-Batch based streaming. Each micro-patch is much like […]
Categories: BI & Warehousing
Incremental ETL – The last refresh date
There are multiple ways to model the last refresh date. In OBIA, DAC and Informatica based ETL, the last refresh date is maintained within DAC. It is maintained at the level of the source tables that populates the data. Oracle BI DAC User Guide > About Refresh Dates and DAC’s Incremental Load Strategy In OBIA […]
Categories: BI & Warehousing
Use Bit to represent groups
Here I am providing an alternate approach of supporting group membership in MySQL. It is a common seen requirement that a group may have multiple members and a person may be added to multiple groups. This many to many relationship is typically modeled in an intersection table. When the group membership is being used as […]
Categories: BI & Warehousing
Schema On Read?
I saw “create external table ” first in Oracle DBMS 11G. It was created for the purpose of loading data. When Hive was introduced, a lot of data were already created in HDFS. Hive was introduced to provided the SQL interface on these data. Using the external table concept is a nature of the design. […]
Categories: BI & Warehousing
Preserve Surrogate Key During Upgrade
The generated surrogate key is used everywhere in the data warehouse. What do we do during upgrade? Here are some approaches: 1. Full Refresh You can perform a full refresh of the data warehouse. The surrogate keys will be regenerated. The FK will be updated. Obviously, this is not a good approach. There are problems […]
Categories: BI & Warehousing
Unified Data Model or Not
Do we need to store the data all together in same places? Do we need to use the same data model ? Do we need to put data into cloud? Storing the data into a central place is not necessary, as nowadays, I do not really know where the data are stored. If we talk […]
Categories: BI & Warehousing
How to – Incremental ETL
This is a very basic topic. An ETL 101 question come up a lot in interview. Even we are moving to a different storage and different processing framework, the concepts are still important. The idea is simple – you do not need to keep extracting and updating all data in the data store that are […]
Categories: BI & Warehousing
Use Surrogate Key in Data Warehouse
Using surrogate key is part of dimensional modeling technique for populating a data warehouse using a relational database. The original idea was to generate the sequence generated IDs and use them in between the fact and dimension table, so we can avoid using the concatenated string or using composite key to join. Also, due to […]
Categories: BI & Warehousing
Prebuilt BI Contents should replace BI Tools
Most school districts need the same kind of reports and dashboard for measuring the performance of students, teachers, and schools. They do not really need to have IT to build reports for them if the vendors can provide the reports OOTB. There is really hardly a need to have a custom reporting tool for building […]
Categories: BI & Warehousing