Thursday 8 October 2015

Chapter 30: Maintaining Highly-Available Environments



Chapter 30: Maintaining Highly-Available Environments 




How to Maintain Highly-Available Environments 
As an operator, it is your responsibility to monitor the environment and address 
problems that occur. In highly-available environments, scheduler and database failures 
adversely affect, and sometimes disable, high-availability. You can ensure that you 
maintain a properly functioning highly-available environment by resolving database and 
scheduler failures. 
In highly-available cluster environments, use the monitoring tools provided by your 
cluster management software and database vendor to monitor the state of the cluster 
and the state of the database. We recommend that you also monitor the scheduler log 
on the active node. When your cluster or clustered database is not functioning properly, 
consult the documentation for your cluster management software or your database 
vendor and follow the instructions to restore the environment. 
In high-availability and dual event server modes, monitor the scheduler logs for 
database and scheduler failures and take appropriate action to return to 
high-availability mode when failures occur. 
Note: For more information about monitoring tools provided by your cluster 
management software and database vendor, see the documentation for those products. 
 How to Maintain Highly-Available Environments
The following diagram demonstrates how you can maintain a highly-available 
environment: 
Follow these steps: 
1. Monitor the scheduler log ( . 
2. Restore the failed scheduler. 
■ On UNIX    
■ On Windows 
3. Recover the failed database. 
■ On UNIX   
■ On Windows   
 How to Maintain Highly-Available Environments 

Chapter 29: Monitoring and Reporting on Workflow



Chapter 29: Monitoring and Reporting on Workflow 



Monitoring Tools 
Monitoring workflow helps you to identify problems with the current or predicted 
workflow, so that you can resolve those problems. You can use the following CA 
Workload Automation AE tools to monitor workflow: 
Forecast Reports 
Generate reports that display information about the predicted workflow to identify 
problems before they occur. 
Note: You can also use forecast reports to plan changes to your workflow in a test 
environment. 
Monitors 
Track events to identify problems as they occur. 
Browsers 
Generate reports that display information about past events to identify recurring 
problems. 

You can solve problems before they occur or as they occur when you can identify the 
issue that is associated with the problem. When you cannot determine the cause of a 
problem, notify the administrator. 
To solve a problem that you identify in real time using a monitor, correct the associated 
issue and restart the job. To address recurring problems or problems with predicted 
workflow that you identify using browsers and forecast reports, correct the associated 
issues and use monitors to track the progress of the workflow. 
Correcting issues that cause jobs to fail requires modifying workflow objects (job 
definitions, machine definitions, and calendar object definitions). You can modify 
workflow objects only when you have write access to those objects. When you cannot 
solve a problem without modifying a workflow object and you do not have write access 
to the problematic object, notify the scheduler. 
Important! Modifying workflow objects sometimes has unexpected impacts on the rest 
of the workflow. We recommend that you plan changes to the workflow in a test 
environment before you implement the changes in the live instance.