I have decided to start a series of blog posts on Oracle's in-database SQL analytics. The aim of this series of posts is to provide some historical background to show how SQL analytics has evolved and then start to explore some of the core concepts behind our SQL analytics followed by an examination of some of the key features and functions. The reason for creating this series of posts is that in talking to customers and partners it is clear to me that many project teams are just not aware of all the SQL features inside the Oracle Database that support enhanced reporting, data sampling, advanced aggregations, statistical analysis, pattern matching and spreadsheet-like data modelling.
While many teams opt to use external analytical engines and/or use bespoke application code, Oracle believes that the most efficient way to process data is: a)inside the database (taking the analytics to the data) and b)using SQL as the analytical language. The following post provides some historical background to the evolution of our in-database analytics and explains why using SQL as the default development language for analytics delivers so many important benefits.
Flexible and Comprehensive Development Framework
Oracle’s in-database analytical SQL offers an agile analytical development platform that enhances both query performance and developer productivity. It supports all types of processing, ranging from real-time analysis to interactive decision support to operational batch reporting. Using Oracle Database both corporate developers and independent software vendors can take advantage of a rich set of SQL based analytical features within their projects and applications.
First some historical background information. Let’s look at how the list of SQL in-database analytics has evolved over time:
In versions 5, 6 and 7 we laid down the basic foundations for SQL analytics with functions for aggregations and advance data processing. The development of what we today call SQL Analytics really started with the release of Oracle 8i. In this release we we extended the aggregation capabilities by creating the multi-dimensional-like CUBE and ROLLUP features along with analytic window functions (more on this in a future blog post).
As part of Oracle 9i we introduced the concept of grouping sets, hypothetical ranking (the rank or percentile that a row would hypothetically have if inserted into a specified data set which is useful for what-if analysis), first/last functions and the ability to create inverse distribution.
With 10g we added the SQL model clause that brings a spreadsheet-like approach to data processing (more on this in a future post). We added advanced analytical processing in the form of data mining features and functions.
In 11g we added the pivot/unpivot features that allows you to create spreadsheet-like crosstab reports directly from a table using simple SQL. We extended the capabilities of the window functions and we delivered the recursive WITH syntax that provides more flexibility for recursive processing compared to the CONNECT BY clause. And we added more data mining features and functions.
With 12c we have introduced generic top-n filtering along with big data features such as pattern matching. Obviously this is just a high level summary of core areas of functionality that we have delivered over the last 15-20 years. Here is a more detailed view of what is available in Oracle Database 12c - remember all these features and functions are delivered free of charge, ready to use straight out the box:
As you can see there is a very large library of free SQL analytical functions available for DBAs and developers to incorporate into their workflows and applications.
What are the advantages of using SQL as the foundation for delivering sophisticated analytics?
The key benefits provided by Oracle’s in-database analytical SQL are:
- Enhanced developer productivity
- Improved manageability
- Minimized learning effort
- Investment protection (through ANSI SQL compliance)
- Increased performance
1) Enhanced developer productivity
2) Improved Manageability
SELECT person, region, sales FROM(SELECT person, region, sales,SUM (sales) OVER (PARTITION BY region) r_salesSUM (sales) OVER () t_salesFROM sales_table)WHERE r_sales > 0.5 * t_sales AND sales > 0.18 * r_sales;
3) Minimized Learning Effort
4) ANSI SQL compliance
5) Improved performance
Using SQL to solve business problems
- Who are the top ten sales-reps in each region?
- We can use the Rank function
- What is the 90-day moving average of stock price?
- We can create a moving average by using the Window feature
- What is the percentage growth of Jan-2010 sales over Jan-2009?
- We can use a period-over-period comparisons
- What are January’s sales as a percentage of the entire year’s?
- Compare aggregates on different levels
What is next...
- Enhanced reporting
- Data Sampling
- Advanced aggregations
- Statistical analysis
- Pattern matching
- Spreadsheet-like data modelling