Wednesday, 28 January 2015

January Edition of Oracle DW and Big Data Magazine

 

Follow us on
Oracle Data Warehouse and Big Data Magazine JANUARY Edition for Customers + Partners


The latest edition of our monthly data warehouse and big data magazine for Oracle customers and partners is now available. It brings together all the most important announcements and videos taken from our data warehouse and big data Oracle product management blogs, Oracle press releases, videos posted on Oracle Media Network and Oracle Facebook pages. Click here to view the JANUARY Edition

FB-Jan-2015

Please share this link http://flip.it/fKOUS with colleagues and via your social media sites.


This magazine is optimized for display on tablets and smartphones using the Flipboard App which is available from the Apple App store and Google Play store





Thursday, 22 January 2015

MATCH_RECOGNIZE and the Optimizer

If you have already been working with the new 12c pattern matching feature you will have probably spotted some new keywords appearing in your explain plans. Essentially there are four new keywords that you need to be aware of:
  • MATCH RECOGNIZE
  • SORT
  • BUFFER
  • DETERMINISTIC FINITE AUTO
The fist three is bullet points are reasonably obvious (at least I hope they are!) but just incase…. the keywords MATCH RECOGNIZE refers to the row source for evaluating the match_recognize clause . The “SORT keyword means the row source sorts the data data before running it through the state machine to find the matches.  
 
The last keyword is the most interesting and is linked to the use of “state machine”, as mentioned in the previous sentence. Its appearance or lack of appearance affects the performance of your pattern matching query. The importance of this keyword is based on the way that pattern matching is performed. To search for a pattern containing a specific set of events we build something called a “state-machine”. At this point I will turn to Wikipedia to provide a definition of a state machine:
 
…a mathematical model of computation used to design both computer programs and sequential logic circuits. It is conceived as an abstract machine that can be in one of a finite number of states. The machine is in only one state at a time; the state it is in at any given time is called the current state. It can change from one state to another when initiated by a triggering event or condition; this is called a transition…
 
 
The classic example of a state machine is a traffic light which moves through a given sequence of events in a set order and always in that order: Red  ->  Red & Yellow -> Green -> Yellow. The traffic light model can also be viewed as at “deterministic” state machine. This implies that  for every state there is exactly one transition for a given input, i.e. it is not possible to have two different transitions leading out of a particular state. With our traffic light state model it is clear that given a red light state there is only one next transition which is to red & amber.
 
Let’s use our normal stock ticker sample schema that tracks stock market prices for three ticker symbols. Let’s look at two very similar pattern matching queries:
 
SELECT *
FROM Ticker
MATCH_RECOGNIZE (
   PARTITION BY symbol ORDER BY tstamp
   MEASURES STRT.tstamp AS start_tstamp,
            LAST(DOWN.tstamp) AS bottom_tstamp,
            LAST(UP.tstamp) AS end_tstamp
   ONE ROW PER MATCH
   AFTER MATCH SKIP TO LAST UP
   PATTERN (STRT DOWN* UP*)
   DEFINE
      DOWN AS DOWN.price < PREV(DOWN.price),
      UP AS UP.price > PREV(UP.price)
) MR
WHERE symbol='ACME'
ORDER BY MR.symbol, MR.start_tstamp;
SELECT *
FROM Ticker
MATCH_RECOGNIZE (
   PARTITION BY symbol ORDER BY tstamp
   MEASURES STRT.tstamp AS start_tstamp,
            LAST(DOWN.tstamp) AS bottom_tstamp,
            LAST(UP.tstamp) AS end_tstamp
   ONE ROW PER MATCH
   AFTER MATCH SKIP TO LAST UP
   PATTERN (STRT DOWN UP)
   DEFINE
      DOWN AS DOWN.price < PREV(DOWN.price),
      UP AS UP.price > PREV(UP.price)
) MR
WHERE symbol='ACME'
ORDER BY MR.symbol, MR.start_tstamp;


Match Recognize keywords in Otimizer explain plan


Match Recognize keywords in Otimizer explain plan

 

Note that the key difference between the two sql statements is the PATTERN clause. The statement on the left checks for zero or more instances of two different events: 1) where the price in the current row is less then the price in the previous row and 2) where the price in the current row is more then the price in the previous row. The statement on the right checks for only once instance of each down-up pattern. This difference in the definition of the pattern results in different explain plans where the plan on the right includes the key phrase “DETERMINISTIC FINITE AUTO” .

The phrase “DETERMINISTIC FINITE AUTO” means that the state machine that we constructed is deterministic and thus when running the sorted rows through the state machine, we don’t do backtracking (I will write a separate blog post on this topic very soon as it is a key concept in pattern matching. For the moment I will simply point you to Wikipedia page on backtracking, personally I found the section headed “Description of the method” the most useful). The key benefit of building a “DETERMINISTIC FINITE AUTO” plan is that the execution is more efficient when there is no backtracking.

When we analyze the PATTERN clause and build the corresponding state machine we are able to detect deterministic finite automaton by checking the state machine. If any state has two or more outgoing transitions then we regard the state machine as non-deterministic, if any final state is followed by a non-final state, then the state machine is regarded as non-deterministic. At the moment we can only detect a few trivial cases such as PATTERN (A B C), PATTERN (A B+), PATTERN (A B*), etc.
 
The first example of these patterns that we can detect is shown above (see the statement on the right where we have STRT DOWN UP pattern) and the other two examples of these types of deterministic patterns are shown below:
 
SELECT *
FROM Ticker
MATCH_RECOGNIZE (
   PARTITION BY symbol ORDER BY tstamp
   MEASURES STRT.tstamp AS start_tstamp,
            LAST(DOWN.tstamp) AS bottom_tstamp,
            LAST(UP.tstamp) AS end_tstamp
   ONE ROW PER MATCH
   PATTERN (STRT DOWN UP+)
   DEFINE
      DOWN AS DOWN.price < PREV(DOWN.price),
      UP AS UP.price > PREV(UP.price)
) MR
WHERE symbol='ACME'
ORDER BY MR.symbol, MR.start_tstamp;
SELECT *
FROM Ticker
MATCH_RECOGNIZE (
   PARTITION BY symbol ORDER BY tstamp
   MEASURES STRT.tstamp AS start_tstamp,
            LAST(DOWN.tstamp) AS bottom_tstamp,
            LAST(UP.tstamp) AS end_tstamp
   ONE ROW PER MATCH
   PATTERN (STRT DOWN UP*)
   DEFINE
      DOWN AS DOWN.price < PREV(DOWN.price),
      UP AS UP.price > PREV(UP.price)
) MR
WHERE symbol='ACME'
ORDER BY MR.symbol, MR.start_tstamp;

Match Recognize keywords in Otimizer explain plan 



Match Recognize keywords in Otimizer explain plan 

 
For PATTERN (A | B) , or PATTERN (A B+ C) we just regard the state machine as non-deterministic, therefore, the explain plans only contain the keywords MATCH RECOGNIZE (SORT) as shown below:
 
SELECT *
FROM Ticker
MATCH_RECOGNIZE (
   PARTITION BY symbol ORDER BY tstamp
   MEASURES STRT.tstamp AS start_tstamp,
            LAST(DOWN.tstamp) AS bottom_tstamp,
            LAST(UP.tstamp) AS end_tstamp
   ONE ROW PER MATCH
   PATTERN (STRT | DOWN | UP)
   DEFINE
      DOWN AS DOWN.price < PREV(DOWN.price),
      UP AS UP.price > PREV(UP.price)
) MR
WHERE symbol='ACME'
ORDER BY MR.symbol, MR.start_tstamp;
SELECT *
FROM Ticker
MATCH_RECOGNIZE (
   PARTITION BY symbol ORDER BY tstamp
   MEASURES STRT.tstamp AS start_tstamp,
            LAST(DOWN.tstamp) AS bottom_tstamp,
            LAST(UP.tstamp) AS end_tstamp
   ONE ROW PER MATCH
   PATTERN (STRT DOWN* UP)
   DEFINE
      DOWN AS DOWN.price < PREV(DOWN.price),
      UP AS UP.price > PREV(UP.price)
) MR
WHERE symbol='ACME'
ORDER BY MR.symbol, MR.start_tstamp;

Screen Shot 2015 01 27 at 14 56 07

Screen Shot 2015 01 27 at 14 55 46
 
 
Within the current version of 12c (12.1.2) we are not checking the mutual exclusiveness of the DEFINE predicates in detecting a deterministic state machine, therefore, the execution plan defaults to a MATCH RECOGNIZE (SORT) style plan, where we may or may have to use backtracking. Obviously, as we continue to develop the MATCH_RECOGNIZE feature will expand our ability to detect a deterministic state machine which means we will process your patter more efficiently.
 
In summary, if you want the most efficient execution plan then try to define your pattern in such way that we are able to create a deterministic state machine. This assumes, of course, that backtracking is not needed within each partition/data set in order to identify the required pattern (more on this in my next blog post).
  
Hope this information is useful. If you have any questions then feel free to contact me directly (keith.laker@oracle.com).
  

Technorati Tags: , , ,

Wednesday, 10 December 2014

Analytical SQL scripts now on Github

After this year’s OpenWorld I ran a 1-day workshop on analytical SQL for our Data Warehouse and Big Data Global Leaders customers. This was part of the Global Leaders ‘Days with Development' programme. We had a pack room at the Oracle Conference Center and I was very lucky to have Stew Ashton, Technical Architect, BNP Paribas, Dr. Holger Friedrich, CTO at sumIT AG and Joerg Otto, Head of DB Engineering, IDS GmbH co-present with me and have them explain how they were using analytical SQL in their projects. 

 The workshop covered the following topics:

  • Analytic SQL concepts and foundations
  • Analytic functions for reporting
  • Analytic functions for aggregation
  • More advanced and new 12c features: Pattern Matching
  • SQL Model clause

For the workshop I created a comprehensive slide deck (I will post the presentation shortly on our OTN home page) which included code samples and explain plans to highlight the key benefits of using our analytical SQL features and functions. The great news is that I now have a repository for analytical SQL code samples on the Github repository. To kick things off on this new repository I have posted all the SQL scripts that I created for this workshop so you can now download and work through a series of use cases that explain how to use window functions, intelligently aggregate data, manipulate rows and columns of data, find patterns and create what-if scenarios. Below is my repository home page where you can download the code:

Github Repository for Analytical SQL

 

So what is Github?

At a simple level, it is an online version control system (and a lot more!) that stores and manages modifications to code script files within in a central repository. Its key benefit is that it makes it very easy for developers to work together on a common project. This environment makes it easy to download a new version of code scripts, make changes, and upload the revisions to those files. Everyone can then see these new changes, download them, and contribute. This system is very popular with developers so we have decided to join this community and make our SQL scripts available via this site. It is the ability to “collaborate” which is most important for me.

To help you get started there is a great section on the website called “Set Up Git”. If like me you are using a Mac then GitHub has a Mac client! You can use it without ever touching the command line interface (which can be a little frustrating at times!).

You can contribute too!

It would be great if you could contribute your own scripts to this repository so I can build up a library of scripts for analytical SQL. All you need to do is create an account on Github, search for the analytical SQL repository and then either download the repository as a zip file or use the “Clone in Desktop” option. What I want to do is build up a series of well documented use cases and when we have enough content then I will create industry specific folders to help organize the content.

So my new repository is now live, please have a look and feel free to upload your own scripts that show how you have used analytical SQL to solve specific business problems within your projects. Looking forward to lots of files arriving into this great new repository. 

 

Technorati Tags: , , ,

Friday, 5 December 2014

Part 5: X-Charging for Sandboxes

This is the next part in my on-going series of posts on the topic of how to successfully manage sandboxes within an Oracle data warehouse environment. In Part 1 I provided an overview of sandboxing (key characteristics, deployment models) and introduced the concept of a lifecycle called BOX’D (Build, Observe, X-Charge and Drop). In Part 2 I briefly explored the key differences between data marts and sandboxes. Part 3 explored the Build-phase of our lifecycle. Part 4 explored the Observer-phase of our lifecycle so we have now arrived at the X-Charge part of our model.

To manage the chargeback process for our sandbox environment we are going to use the new Enterprise Manager 12c Cloud Management pack, for more information visit the EM home page on OTN

Why charge for your providing sandbox services? The simple answer is that placing a price or cost on a service ensures that the resources are used wisely. If a project team incurred zero costs for their database environment then there is no incentive to evaluate the effectiveness of the data set and the cost-benefit calculation for the project is skewed by the lack of real-world cost data. This type of approach is the main reason why sandbox projects evolve over time into “production” data marts. Even if the project is not really delivering on its expected goals there is absolutely no incentive to kill the project and free up resources. Therefore, by not knowing the cost, it is impossible to establish the value.

The benefits of metering and x-charging are that it enables project teams to focus on the real value of their analysis. If all analysis is free then it is almost impossible to quantify the benefits or costs of a particular analysis. Project teams can also use x-charging as a way to adjust their consumption of resources and control their IT costs. It benefits the IT team as it enables them to achieve higher utilisation rates across their servers. Most importantly the cost-element attached to running a sandbox acts as a string incentive to finalize and shutdown sandboxes ensuring that they do not morph into uncontrolled marts.

There is a fantastic whitepaper on this topic, which explores the much wider topic of metering and chargeback within a cloud environment which is available on the Enterprise Manager webpage, click here to view the whitepaper.

Overview

Enterprise Manager 12c uses the rich monitoring and configuration data that is collected for Enterprise Manager targets as the basis for a metering and chargeback solution. Enterprise Manager Chargeback provides the administrator with:

  • Assignment of rates to metered resources
  • Management of a cost center hierarchy
  • Assignment of resources to cost centers
  • Usage and charge-back reports

This set of features can be used to implement a chargeback regime for analytical sandboxes. There is a rich set of API’s that allow you to extract metering and charge data so that it can be incorporated into enterprise billing solutions such as Oracle Billing and Revenue Management application.

Setting up a x-charging framework for our analytical sandboxes involves three key stages:

  • Creating chargeback plans for resources and database options
  • Defining users and cost centers to “take” charges
  • Reporting on usage and charges
Let’s look at each of this stages in more details:
 

Step 1: Creating charge plans

A Charge Plan is created by the DBA and it defines the metered resources along with the associated rates. Enterprise Manager Chargeback offers two types of Charge Plan – Universal Charge Plan and Extended Charge Plans.

The Universal Charge Plan is the simplest way to enable chargeback for sandboxes and is probably adequate for the vast majority of projects. It contains just 3 metrics:

  • CPU Usage
  • Memory Allocation
  • Storage Allocation

and the DBA can set the rates for each metric as shown here:

 

Charge Plans

 

Even with this basic profile you can implement quite sophisticated charging models. It is possible to vary the rates used in charge calculations by month/period. Each “period" is known as a “Reporting Cycle”. If rates are modified, the updated rates will be used to re-calculate the charges for all days from the first of the current period onwards.

Some projects may need access to analytical features that are costed database options. For example, if a project needs to build data mining models then they will require the Oracle Advanced Analytics option. Alternatively, to support semantic analysis or social network analysis requires the use of the spatial and graph option. Extended Charge Plans allow the DBA to factor in charging for database options alongside the standard charging metrics of the Universal Charge Plan. For database options it makes sense to make use of the ability to create fixed cost charges to effectively “rent-out" each option for each sandbox environment. Of course if a project suddenly decides it needs access to a specific type of analytical option, such as in-memory, it simply a case of adding the relevant cross-charge item to the profile for the specific sandbox and the project team can start using that feature right away (assuming the database instance has the correct options pre-installed).

Charge Plans Extended

 

Step 2 Setting up users and costs centres

When administering a self-service analytic sandbox, it is necessary to meter resource consumption for each self-service user. These costs then need to rolled up into an aggregate level such as cost centers to generate a total charge for each department/project-team accessing the sandbox. For ease of administration and chargeback the self-service users can be represented within a Cost Center structure. Each cost center contains list of “consumers” who have access to the sandbox and of course its associated resources. The cost centers can be organized in a hierarchical fashion to support aggregation and drill down with the cost analysis or billing reports. A typical hierarchical cost centers within a project might look something like this:

Cost center hierarchy

Step 3: Chargeback Reports

Any chargeback solution will involve reporting so that users can understand how their use of sandbox (storing data, running reports etc) translates to charges. Enterprise Manager provides reports that show both resource usage and charging information. This is broken down into two categories of reports: summary and trending reports.

Summary Reports show information related to charge or resource utilisation broken down by cost center, target type and resource. These reports allow both sandbox owners and business users to drill down and quickly assess analyse charges in terms of type of target (database instance, host operating environment, virtual machine etc) or cost centers as shown below.

EM summary report

Trending Reports These reports show metric or charge trends over time and are useful for project teams who want to see how their charges change over time. At an aggregate level the I.T. team can use this information to help them with capacity planning. A report of CPU usage is shown below. 

EM trend report

What’s missing?

While this latest version of enterprise manager has some great features for managing analytical sandboxes it would be really useful if the project team could enter a total budget for their sandbox. This budget could then shown on graphs such as the trending report. It would be useful to know how much of the budget has been spent, how many days-periods of budget remain based on current spending patterns etc. Of course once the budget has been used up it would be useful if the sandbox could be locked - this would focus the minds of the project team and ensure that a sandbox does not evolve into a “live” data mart. Which brings us nicely to the next blog post which will be on the final part of our lifecycle model: ensuring that sandboxes have a “Drop” phase.

If you want more information about how to setup the chargeback plans then there is a great video on the Oracle Learning Library: Oracle Enterprise Manager 12c: Setup and Use Chargeback.

Sunday, 16 November 2014

Oracle Data Warehouse and Big Data Magazine NOVEMBER Edition

 

Follow us on Facebook Twitter Blogger
The latest edition of our monthly data warehouse and big data magazine for Oracle customers and partners is now available. It brings together all the most important announcements and videos taken from our data warehouse and big data product management blogs, press releases, videos posted on Oracle Media Network and our Facebook pages. Click here to view the November Edition


Please share this link http://flip.it/fKOUS with your colleagues and via your social media sites.


This magazine is optimized for display on tablets and smartphones using the Flipboard App which is available from the Apple App store and Google Play store





Tuesday, 28 October 2014

Part 4: DBAs guide to managing sandboxes

This is the next part in my on-going series of posts on the topic of how to successfully manage sandboxes within an Oracle data warehouse environment. In Part 1 I provided an overview of sandboxing (key characteristics, deployment models) and introduced the concept of a lifecycle called BOX’D (Build, Observer, X-Charge and Drop). In Part 2 I briefly explored the key differences between data marts and sandboxes. Part 3 explored the Build-phase of our lifecycle.

Now, in this post I am going to focus on the Observe-phase. At this stage in the lifecycle we are concerned with managing our sandboxes. Most modern data warehouse environments will be running hundreds of data discovery projects so it is vital that the DBA can monitor and control the resources that each sandbox consumes by establishing rules to control the resources available to each project both in general terms and specifically for each project.  

In most cases, DBAs will setup a sandbox with dedicated resources. However, this approach does not create an efficient use of resources since sharing of unused resources across other projects is just not possible. The key advantage of Oracle Multitenant is its unique approach to resource management. The only realistic way to support thousands of sandboxes, which in today’s analytical driven environments is entirely possible if not inevitable, is to allocate one chunk of memory and one set of background processes for each container database. This provides much greater utilisation of existing IT resources and greater scalability as multiple pluggable sandboxes are consolidated into the multitenant container database.

Resources

 

Using multitenant we can now expand and reduce our resources as required to match our workloads. In the example below we are running an Oracle RAC environment, with two nodes in the cluster. You can see that only certain PDBs are open on certain nodes of the cluster and this is achieved by opening the corresponding services on these nodes as appropriate. In this way we are partitioning the SGA across the various nodes of the RAC cluster. This allows us to achieve the scalability we need for managing lots of sandboxes. At this stage we have a lot of project teams running large, sophisticated workloads which is causing the system to run close to capacity as represented by the little resource meters.

 

Expand 1

 

It would be great if our DBA could add some additional processing power to this environment to handle this increased workload. With 12c what we can do is simply drop another node into the cluster which allows us to spread the processing of the various sandbox workloads loads out across the expanded cluster. 

Expand 2

Now our little resource meters are showing that the load on the system is a lot more comfortable. This shows that the new multitenant feature integrates really well with RAC. It’s a symbiotic relationship whereby Multitenant makes RAC better and RAC makes Multitenant better.

So now we can add resources to the cluster how do we actually manage resources across each of our sandboxes? As a DBA I am sure that you are familiar with the features in Resource Manager that allow you to control system resources: CPU, sessions, parallel execution servers, Exadata I/O. If you need a quick refresher on Resource Manager then check out this presentation by Dan Norris “Overview of Oracle Resource Manager on Exadata” and the chapter on resource management in the 12c DBA guide.

With 12c Resource Manager is now multitenant-aware. Using Resource Manager we can configure policies to control how system resources are shared across the sandboxes/projects. Policies control how resources are utilised across PDBs creating hard limits that can enforce a “get what you pay for” model which is an important point when we move forward to the next phase of the lifecycle: X-Charge. Within Resource Manager we have adopted an “industry standard” approach to controlling resources based on two notions:

  1. a number of shares is allocated to each PDB
  2. a maximum utilization limit may be applied to each PDB

To help DBAs quickly deploy PDBs with a pre-defined set of shares and utilisation limits there is a “Default” configuration that works, even as PDBs are added or removed. How would this work in practice? Using a simple example this is how we could specify resource plans for the allocation of CPU between three PDBs:

RM 1

 

As you can see, there are four total shares, 2 for the data warehouse and one each for our two sandboxes. This means that our data warehouse is guaranteed 50% of the CPU whatever else is going on in the other sandboxes (PDBs). Similarly each of our sandbox projects is guaranteed at least 25%. However, in this case we did not specify settings for maximum utilisation. Therefore, our marketing sandbox could use 100% of the CPU if both the data warehouse and the sales sandbox were idle.

By using the “Default” profile we can simplify the whole process of adding and removing sandboxes/PDBS. As we add and remove sandboxes, the system resources are correctly rebalanced, by using the settings specific default profile, across all the plugged-in sandboxes/PDBs as shown below.

RM 2

 

Summary

In this latest post on sandboxing I have examined the “Observe” phase of our BOX’D sandbox lifecycle. With the new  multitenant-aware Resource Manager we can configure policies to control how system resources are shared across sandboxes. Using Resource Manager it is possible to configure a policy so that the first tenant in a large, powerful server experiences a realistic share of the resources that will eventually be shared as other tenants are plugged in.

In the next post I will explore the next phase of our sandbox lifecycle, X-charge, which will cover the metering and chargeback services for pluggable sandboxes. 

 

Technorati Tags: , ,

Friday, 10 October 2014

Review of Data Warehousing and Big Data at #OOW14

Data Warehousing and Big Data were at the heart of this year’s OpenWorld conference being across in a number of keynotes and a huge number of general sessions. Our hands-on labs were all completely full as people got valuable hands-on time with our most important new features. The key areas at this year’s conference were:

  • Big Data SQL - One Fast SQL Query for All Your Data
  • Database In-Memory - Powering the Real-Time Enterprise
  • Mutitenant - Plug your data warehousing Into the Cloud
 
DW 4 DW 3 DW 3

All these topics appeared in the main keynote sessions including live on-stage demonstrations of how each feature can be used to increased the performance and analytical capability of your data warehouse.

If you want to revisit the most important sessions, or if simply missed this year’s conference and want to catch up on all the most important topics, then I have put together a book of the highlights from this year’s conference. The booklet is divided into the following sections:

  • Key Messages
  • Overview of Oracle Database 12c
  • Session Catalogue
  • Your Oracle Presenters
  • Links
  • OpenWorld 2015
 

PDF-iBook

You can download my review in PDF format by clicking here. Hope this proves useful and if I missed anything then let me know. 

Technorati Tags: , , , , , ,