Less Emails!

By default, the zztat installation will rely on the ability to send out emails. This means that each target database which is managed by the framework has been sending out emails on its own.

This lead to the additional effort of having to configure the database servers to be able to send emails. This is both labor intensive and may also be a security concern in some environments.

Introducing a new reaction: SEND_EMAIL_REPO. This reaction makes use of the framework’s core strengths and allows the repository to send out emails on behalf of the target databases.

How does it work?

Well, it couldn’t be more simple. Any gauge that currently uses the SEND_EMAIL reaction, you can simply swap with the new SEND_EMAIL_REPO reaction. And that’s that.

Email send requests are queued up locally on the target databases and are transmitted to the repository, where the emails are then processed and sent out. And it all happens within seconds. The delay introduced by going through the repository versus the target sending it on its own is minimal.

This new reaction will greatly simplify the effort to deploy zztat in larger environments.

With January approaching fast, so is zztat’s production release. We are going to launch no later than January 15th, and will be launching with a global scope right off the bat. Partnership contracts are signed, distribution chains are being established, and the infrastructure needed is being built as I am writing this.

Are you interested in participating? Ping me at stefan (at) zztat.net and I can put you in touch with the right folks to get you on board.

Happy New Year to you all and may you rest well during your nights in 2018; thanks to zztat’s proactive power!

Cheers!

Stefan

Patch me up, Scotty!

Patches are a necessary evil whenever you’re developing software. Since zztat is shipped as a 100% SQL & PL/SQL software, we have the luxury of providing far better usability to patch zztat compared to some other software.

The zztat framework now comes with a full patching mechanism. And it’s pretty powerful!

Let’s assume that you have a zztat repository, and 20-some target databases which are monitored and reporting in to that repository. What if you need to apply a zztat patch?

Well, you’ll be glad to know that all you gotta do, is load the patch into the repository. The framework can then automatically distribute the patch to all the target databases, and apply it there automatically!

The mechanisms used for patching make full use of the core framework components, which enables us to:

  • Deliver patch sets (a.k.a. bundle patches)
  • Deliver one-off patches or hot-fixes

… and allow you to apply them with a single command. On the entire environment.

The patching component is smart enough to understand:

  • Online vs Offline Patches

Some patches will require the framework to be stopped to apply the patch, and restarted thereafter. This will happen automatically.

  • Patch Dependencies

Some patches may require other patches to install correctly. An example would be a hot-fix produced for a patch-set. We want to take as much of the simple tasks off your hands, so if we need to apply a prerequisite patch and it is available, we’ll apply it automatically. The same goes for patches required to be rolled back before a new patch can be applied.

  • Obsolete Patches

Patches may render previous patches obsolete. This can happen when a new patch is released which combines two or more previous patches, or when a patch-set includes hot-fixes previously released. The patching component can handle that, too.

  • Automated vs Manual Patching

If, for some reason or another you wish to apply the patch manually on a few select databases instead of all of them, there is also a manual mode which gives you full control over the patching process.

The repository of course contains all the information about which patch is ready to apply, applied or rolled back on which target database. And, naturally you will also be controlling all of these things from one single place: the repository database.

So how does it all work?

Say you have a patch that you need to apply to your zztat environment. You’d have to do the following:

  1. Download the patch onto the zztat repository server and extract the archive
  2. Start the patch install script

What the framework then will do is the following:

  1. Locate the patch inventory XML file which contains all the details the patching process needs to do (as well as the rollback steps).
  2. Load the patch data files into the zztat repository database.
  3. Patch the repository database.
  4. Create tasks for the target databases to pick up , and to tell them to download the patch from the repository.
  5. Once a target database has completed the download, it will inform the repository.
  6. The repository will create a new task for the patch to be applied.
  7. The target database will then apply the patch and once complete, inform the repository.
  8. Once all targets have checked in to the repository, the patch is marked as fully applied.
  9. If, at a later time a database comes out of blackout or is restarted and zztat finds an automatic patch install has been done in the meantime, it will automatically pick it up and apply the patch as well.

This is yet another awesome feature that lets zztat stand out against existing monitoring software. And there will be more to come!

In closing, we’d like to hear from you guys what you’d think:

Should zztat automatically download (but not apply) available patches from zztat’s servers? Would you want that feature?

Let us know your opinion in the comments!

Have a great day!

Stefan

 

 

BETA Progress Update

Hi everyone!

The zztat beta is going strong, with lots of bug fixes and feature enhancements going in daily.

The third beta release will be the metric release and is expected to go out end of November, or at the latest end of next week, in the first couple of days of December.

Since the initial beta1, much has been enhanced and added. A short highlight reel is here:

  • All metadata is now refreshed automatically on all target databases whenever a change is done on the repository. This makes zztat fully centralized.
  • Metric data can now be automatically purged, with a configurable retention.
  • Copying gauges to create database-specific checks has been overhauled and is now more intuitive.
  • Metrics fired as a reaction (such as high-speed sampling for example) are now automatically updating the alert to indicate the snapshot data. This enables various reports to easily access the high-speed sampling data.
  • Oracle options can now be monitored by zztat to catch potential license issues with Oracle. Tables, Indexes, Lobs, Flashback Archives, and even RMAN configurations can now be checked.
  • Greatly enhanced memory usage monitoring that goes as deep as showing you which Oracle kernel function has allocated the memory. Comes with non-intrusive but less detailed variations as well as fully-detailed variations which probe the process in question. I’ll be posting more details about this in the near future!
  • dbms_system has been eliminated and its functionality is now integrated in zztat’s own sys_helper package.

There is one more feature that we’d like to specifically highlight, because it requires a  bit more of an elaborate explanation: Automatic Error Reports.

First of all, the feature is disabled by default and must be explicitly enabled in your environment. Once enabled, whenever an error is seen, zztat will send an email to the developer (error_reports@zztat.net) with the error details.

It will look something like this (click to enlarge):

The email has been specifically designed to:

  • Not include any personally identifiable information whatsoever
  • Not include any IP addresses, host names, etc
  • Give us a clear presentation of what happened that led to the error
  • Give us the full zztat error stack to see exactly what happened
  • Enable us to proactively correct issues found
  • Always send you an exact copy of the email we’ve received. It will be sent to the email address defined for “CRITICAL” alerts.

Our privacy policy has also been updated and can as always be found at zztat.net/privacy.

We’re still well on track for the production release come January and look forward to see zztat making your DBA lives so much easier!

Have a great week!

Stefan

 

 

 

zztat: Beta Release Announcement

Hi all!

We’re excited to announce the availability of the zztat Beta-1 for every backer, before the end of this week!

What Will The First Beta Include?

  • 5 Default Metrics
  • 8 Internal Metrics
  • Non-intrusive generic Reactions
  • 3 Advanced Reactions, which are disabled by default
  • The zztat UI

What Do I Need To Get Started?

  • An Oracle database for the repository. This can be a small new instance, or a dedicated schema in an existing instance (2GB SGA + 10GB disk is plenty).
  • The repository is ideally 12.1.0.2 with XMLDB and APEX 5.1 installed (but 11.2.0.4 and 12.2 are supported as well).
  • A few GB of space for a tablespace on the repository and on each target
  • At least one target database to be monitored by zztat. Your beta trial license has no limit on the number of databases you can monitor. Supported versions include 11.2.0.4, 12.1.0.1, 12.1.0.2 and 12.2.0.1.

We will send you the software package pre-configured with the setup configuration file and all already prepared. It will be configured to install with the following options:

  1. XML DB is assumed to be present, sending emails will be enabled.
  2. The ZZ$SYS_HELPER package will be installed in the SYS user. This is the only object created outside of the zztat schemas (apart from the global application contexts which are also stored in SYS automatically regardless of who creates them).
  3. The installer will create two users on the repository: ZZ$REPO (repository schema owner) and ZZ$LINK (database link user where the targets connect to).
  4. The installer will create two users on the target databases: ZZ$USER (zztat monitoring user) and ZZ$LINK (private database link owner who connects to the repository).
  5. All default passwords will be set to “Change.123Me”.

You are of course free to change those settings in setup.sql before running the installer. And yes, you can even choose different names for the users, usernames are not hard-coded anywhere.

More Details

The following default metrics will be enabled out of the box in this beta release:

  • ASM Diskgroup monitoring (every 5 minutes)
  • Tablespace monitoring (every 5 minutes)
  • ASM Disk monitoring (every 5 minutes for offline / unavailable disks)
  • Session wait monitoring (every 5 minutes)
  • Top SQL statements (every 5 minutes)

Each of those metric comes with default gauges which are also by default enabled. They will all default to non-intrusive reactions, such as sending emails or writing to logs. You can view and change the thresholds for those gauges in the zztat UI.

Internal metrics which zztat uses and are enabled by default:

  • Extents (collected every 4 hours)
  • Audit actions (collected once)
  • Event names (collected once)
  • Latch names (collected once)
  • Metric names (collected once)
  • SQL commands (collected once)
  • Stat names (collected once)
  • Wait classes (collected once)

Reactions supplied with this release include:

  • Sending emails (requires XMLDB)
  • Writing to the database alert log
  • Adding datafiles automatically (disabled by default but can be enabled easily)
  • Hi-speed latch sampling (disabled by default but can be enabled easily)
  • Hi-speed mutex sampling (disabled by default but can be enabled easily)

The zztat graphical user interface, with the following functionality:

  • A draft overview Dashboard showing environment health and activity
  • Managing metric queries and schedules
  • Managing gauge queries and schedules
  • Managing gauge filter columns, adding new filter columns
  • Managing gauge ignore values
  • Overriding gauge filter columns
  • Creating and editing reaction chains
  • Managing reaction throttling
  • Configuration screen for many framework parameters
  • Built-in help and tips for every function

Thank you once again for all your support!

zztat UI: New Updates!

The UI is coming along well and includes lots of new functionality. It looks like we will be able to include it with basic functionality well ahead of schedule with the first beta release so that you all can check it out live in action!

The UI is designed with usability in mind and includes loads of tool-tips and help texts to guide the user through the application. Every form field has a help text, and every form has explanations added to it.

Here’s the database configuration screen, which controls the core behavior of the framework:

As any other zztat entity, configurations also follow the same model that there is a “Default” configuration, and you then have the option of overriding the default for a specific target database as seen in the screenshot, for the database O12102.

 

 

 

Help texts such as this one can be opened by clicking on the little question mark icon behind the form fields:

And here’s the notification settings screen of the database configuration:

The metric editor has also been added and initially will allow you to customize the frequency at which the metrics are executed, and change the metric query. For the final release, the editor will be further enhanced to allow even greater customization.

Those new modal dialogues added in APEX 5.1 really make the application flow feel a lot more natural. We’re making heavy use of them in the zztat UI. Here’s another quick screenie showing off the new gauge editor:

And finally, what most users will be needing is the gauge column editor, which allows you to customize the thresholds that will cause the alerts to trigger:

And all this is of course configurable at one central place – the zztat UI – and will be automatically applied where they are needed within the entire zztat environment!

Want to change the tablespace full threshold for a specific database only? No need to log on to that database server and fiddle with a configuration file.

Want to temporarily ignore a tablespace from monitoring and alerting? No need to log on, either. Just do it right there in the UI!

Oh, and one more thing needs to be pointed out. If there are any issues with the form data you entered, zztat will raise descriptive error messages telling you what the problem is:

Usability is the first priority. Naturally that also includes having user-friendly error messages, and not some cryptic ORA-00001: unique constraint (SYS_C000241) violated.

Stay tuned for more updates to come!

 

Stefan

 

zztat: The UI is coming! And an announcement too!

Hi all

It’s been a busy, busy week. Many bugs have been squashed, troubles have been shot and many a lines of code have been written. The framework now sports just under 25’000 lines of code, by the way, with the largest chunk being in the internal job & processes package which comes in at just under 4500 lines.

The big news is that we have decided to make some changes to the planned licensing and as a result the zztat UI will be included with both the basic and the premium packages of zztat. So you’ll always get the GUI, regardless of the package you purchase.

Development on the UI has started on Friday, and it will be based on the latest version of Oracle’s Application Express (APEX) version 5.

Here’s a little sneak peek at what’s in store:

And the first draft of the metric screen:

Stay tuned for more to come!

Stefan

zztat – Status Quo?

In this episode we will talk about one of zztat’s internal processes: the STATE_CHECK.

Its role is to ensure that the framework runs smoothly and what is supposed to be running, runs as configured.

The individual tasks that STATE_CHECK performs are:

Preparation of the zztat environment after an installation, patch or upgrade

The zztat framework uses several on-demand metrics, which collect non-volatile, but version-dependent data about your database. This includes for example v$latchname, which lists all the latches known to your Oracle version or v$event_name which lists all the events externalized by the Oracle Wait Interface for your Oracle version. This data is collected by special metrics which are triggered by the STATE_CHECK when it detects a need to do so. You can see this in action for example when installing zztat on a target database:

25-SEP-17 04.57.19.916194 INFO  STATE_CHECK   Checking database states ...
25-SEP-17 04.57.19.920753 INFO  STATE_CHECK   New database installation detected - firing metrics.
25-SEP-17 04.57.19.922626 INFO  ADD_TASK      Creating new task UPGRADE_INIT |  for metric  (gauge n/a
25-SEP-17 04.57.19.928445 INFO  STATE_CHECK   Upgrade/Init task submitted: action# 1
25-SEP-17 04.57.20.030183 INFO  STATE_CHECK   State check restarting queue ZZ$REACTION_QUEUE
25-SEP-17 04.57.20.353182 INFO  STATE_CHECK   Checking job integrity...

We can see that STATE_CHECK simply submits a task, and then goes right back to what it was doing. The task is then picked up by SYNC (more on this internal process in a future update) and we can see it executing:

25-SEP-17 04.57.22.526558 INFO  SYNC          Checking for tasks on target...
25-SEP-17 04.57.22.531966 INFO  SYNC          1: Found message 1 action: UPGRADE_INIT from sender: STATE_CHECK db: 720515581 state: NEW metric:  gauge:
25-SEP-17 04.57.22.532197 INFO  SYNC          Received upgrade / init - setting up dependent tasks...
25-SEP-17 04.57.22.578117 INFO  SET_STATE     Running on target - archiving data to zz$sync_state...
25-SEP-17 04.57.22.616052 INFO  SET_STATE     Updated 1 rows.
25-SEP-17 04.57.22.616407 INFO  SET_STATE     State for task 1 set to PROCESSING
25-SEP-17 04.57.22.649489 INFO  SYNC          Creating task to fire snapshot job for on-upgrade-metric AUDITACTIONS
25-SEP-17 04.57.22.649704 INFO  ADD_DEPENDENT Creating new dependent task for parent action 1
25-SEP-17 04.57.22.653310 INFO  SYNC          Task submitted for parent task 1 with new task # 2
25-SEP-17 04.57.22.662102 INFO  SYNC          Creating task to fire snapshot job for on-upgrade-metric EVENTNAME
25-SEP-17 04.57.22.662327 INFO  ADD_DEPENDENT Creating new dependent task for parent action 1
25-SEP-17 04.57.22.662820 INFO  SYNC          Task submitted for parent task 1 with new task # 3
25-SEP-17 04.57.22.665864 INFO  SYNC          Creating task to fire snapshot job for on-upgrade-metric LATCHNAME
25-SEP-17 04.57.22.666221 INFO  ADD_DEPENDENT Creating new dependent task for parent action 1
25-SEP-17 04.57.22.666599 INFO  SYNC          Task submitted for parent task 1 with new task # 4
25-SEP-17 04.57.22.669481 INFO  SYNC          Creating task to fire snapshot job for on-upgrade-metric METRICNAME
25-SEP-17 04.57.22.669683 INFO  ADD_DEPENDENT Creating new dependent task for parent action 1
25-SEP-17 04.57.22.670076 INFO  SYNC          Task submitted for parent task 1 with new task # 5
25-SEP-17 04.57.22.673134 INFO  SYNC          Creating task to fire snapshot job for on-upgrade-metric WAIT_CLASS
25-SEP-17 04.57.22.673333 INFO  ADD_DEPENDENT Creating new dependent task for parent action 1
25-SEP-17 04.57.22.673773 INFO  SYNC          Task submitted for parent task 1 with new task # 6
25-SEP-17 04.57.22.676781 INFO  SYNC          Creating task to fire snapshot job for on-upgrade-metric SQL_CURS_MAP
25-SEP-17 04.57.22.676964 INFO  ADD_DEPENDENT Creating new dependent task for parent action 1
25-SEP-17 04.57.22.677353 INFO  SYNC          Task submitted for parent task 1 with new task # 7
25-SEP-17 04.57.22.680221 INFO  SYNC          Creating task to fire snapshot job for on-upgrade-metric STATNAME
25-SEP-17 04.57.22.680404 INFO  ADD_DEPENDENT Creating new dependent task for parent action 1
25-SEP-17 04.57.22.680752 INFO  SYNC          Task submitted for parent task 1 with new task # 8
25-SEP-17 04.57.22.683606 INFO  SYNC          Creating task to fire snapshot job for on-upgrade-metric SYS_OPTENV
25-SEP-17 04.57.22.683801 INFO  ADD_DEPENDENT Creating new dependent task for parent action 1
25-SEP-17 04.57.22.684255 INFO  SYNC          Task submitted for parent task 1 with new task # 9
25-SEP-17 04.57.22.687125 INFO  SYNC          Creating task to fire snapshot job for on-upgrade-metric SQLCOMMAND
25-SEP-17 04.57.22.687342 INFO  ADD_DEPENDENT Creating new dependent task for parent action 1

All of this data is then available to zztat – and it doesn’t need to create any further snapshots anymore since the data won’t have changed. This process is kicked off again by STATE_CHECK during certain maintenance operations:

  • When an Oracle version change is detected
  • When a zztat patch has been installed
  • A new on-upgrade metric has been added

This on-demand metric data is also replicated to the zztat repository, and when the repository runs reports, it automatically uses the correct data for the target’s Oracle version.

Synchronization of the Oracle Scheduler

Another important task STATE_CHECK performs is to ensure that the Oracle Scheduler’s job calendar matches the zztat configuration. With regards to metric and gauge jobs, the zztat configuration tables are regarded as the single source of truth. Whenever the Oracle scheduler deviates from this configuration, STATE_CHECK automatically adjusts the scheduler, as for example seen here:

25-SEP-17 04.57.22.638507 WARN  STATE_CHECK  Job ZZ$JREP00002_ASM_DG should be enabled in scheduler but is not.. enabling...
25-SEP-17 04.57.22.646770 INFO  STATE_CHECK  Enabled job ZZ$JREP00002_ASM_DG
25-SEP-17 04.57.22.648342 INFO  STATE_CHECK  Job integrity check complete. Corrected 1 jobs.

Cleanup tasks and integrity

The STATE_CHECK also performs a variety of smaller tasks, such as ensuring that the queue used for the event-based reaction jobs is started and functional and automatically restarts it when it isn’t, and it checks to ensure that every target database checks in periodically and will trigger a heartbeat-check if it hasn’t seen a database in a short while. We’ll also hear more about heartbeats and pings later on!

The zztat BETA release 1 is coming soon! Stay tuned for more updates!

Cheers

Stefan

zztat – Diagnostics!

zztat Diagnostics has been greatly enhanced. It’s now a full package that collects and prints out various diagnostics data about your zztat environment. In the future, it will also make recommendations automatically to known issues users are having.

You can run zztat diagnostics by simply starting the zzdiag.sql script, or by executing zz$diag.report while connected as a zztat user. A sample output looks like this:


SQL> exec zz$diag.report;
DATABASE STATUS
===============

Database Role OS Port Name Version Default Tablespace State Licensed
--------------- ------------------------------ ---------- ---------------- ------------------------------ ---------- ----------
REPOSITORY x86_64/Linux 2.4.xx FZYDEV 12.1.0.1.0 SE ZZ$DATA NEW FREE
TARGET x86_64/Linux 2.4.xx O12102 12.1.0.2.0 EE ZZ$DATA READY YES
TARGET x86_64/Linux 2.4.xx O12201 12.2.0.1.0 EE ZZ$DATA READY YES

DEFAULT METRICS
===============
Database Metric Snapshot Replicate State Autosync On-Demand On-Upgrade Replicate-Only
---------- ------------- -------------- -------------- ----------- ------------ ------------ ------------ ---------------
(all) ASM_DG 5 MINUTE * ENABLED AUTO-SYNC
(all) ASM_DISK 5 MINUTE * DISABLED AUTO-SYNC
(all) AUDITACTIONS * * ENABLED ON DEMAND ON UPGRADE
(all) DB_OBJ_CACHE * * ENABLED ON DEMAND
(all) EVENTNAME * * ENABLED ON DEMAND ON UPGRADE
(all) EXTENTS 4 HOUR 1 DAY ENABLED
(all) FILESPACE 5 MINUTE * ENABLED AUTO-SYNC
(all) HS_LATCH * * ENABLED ON DEMAND
(all) HS_MUTEX

zztat and the database DBID

When working with zztat, an important concept to understand is how zztat uses the DBIDs – a unique number assigned to each Oracle database. Every type of entity in zztat, the metrics, the gauges, the reactions, etc are always identified by a combination of their name and the DBID of the database they were assigned to.

For example, if we look at the ASM diskgroup metric ‘ASM_DG’ in our lab:

SQL> select zz$db as dbid, name from zz$metric where name = 'ASM_DG';

      DBID NAME
---------- ------------
        -1 ASM_DG
 720515581 ASM_DG

We can see that there are two copies of this metric in the metric table zz$metric. One has the DBID 720515581 which corresponds to the database O12201:

SQL> select zz$db as dbid, name from zz$db;

      DBID NAME
---------- ---------
 720515581 O12201
2307737612 FZYDEV
4116932292 O12102

If we revisit the above query on zz$metric, we can see that the second copy of the ASM_DG metric is assigned to the DBID -1. The value -1 indicates that this is a default metric, which can run on any database. This means that any database that does NOT have a database-specific metric (in this case all databases other than O12201) will run the default metric, including all of its attributes, flags, schedules, etc.

To quickly and easily override the default metric for a specific database, you can use zz$manage.metric_copy:

begin
zz$manage.metric_copy(
  name             => 'ASM_DG',
  to_dbid          => 4116932292,
  flags            => zz$manage.DISABLE,
  comments         => 'Disabled on O12102 //Stefan'
);
end;
/

And with that, the ASM_DG metric will then not run on the database O12102 (4116932292) any longer, since the copy we’ve made had the DISABLE flag specified.

Questions? Send us a mail to info(at)zztat.net!

Stefan

zztat: Smarter Reactions!

This week we’ve been working on improvements to zztat’s reactions. Reactions will now automatically get access to the entire row that caused the gauge to trigger the reaction.

What does this mean?

Take a simple case, if you want to automatically terminate a session that was flagged by a gauge, for example because it is blocking many others. Before you had to use the gauge column’s message attribute to pass the SID/SERIAL to the reaction, as a string, and the reaction then had to parse that. This is now no longer the case.

With this improvement, the gauge will automatically create an XMLTYPE containing the entire row, as selected by the gauge. This greatly simplifies creating reactions and makes them a lot more flexible. You can now simply write a gauge like this:

select sid,serial# as serial,username from v$session where ....

Those three columns, sid, serial and username will then be stored in an xmltype, along with the alert, like this:

<zztat><sid>123</sid><serial>456</serial><username>FOO</username></zztat>

And the great thing is, using XML for simple tasks like this works, even if your database does not have XML DB installed! The added benefit is that it doesn’t require use of any types, or fixed table structures to store the data. Because it’s XML, it’s fully dynamic.

Of course, this happens only for rows that actually trigger an alert – not for every row. The overhead is therefore insignificant.

The reaction can then access the data by simply reading /zztat/sid and /zztat/serial like this:

    select x.sid, x.serial
      into l_sid, l_serial
      from xmltable('/zztat' passing l_details columns "SID" number path 'sid', "SERIAL" number path 'serial' ) x;

This is a great addition which significantly improves the flexibility of the framework!

Watch this space as we near the release of the first beta which is coming soon!

Stefan