Thursday 8 October 2015

Chapter 30: Maintaining Highly-Available Environments



Chapter 30: Maintaining Highly-Available Environments 




How to Maintain Highly-Available Environments 
As an operator, it is your responsibility to monitor the environment and address 
problems that occur. In highly-available environments, scheduler and database failures 
adversely affect, and sometimes disable, high-availability. You can ensure that you 
maintain a properly functioning highly-available environment by resolving database and 
scheduler failures. 
In highly-available cluster environments, use the monitoring tools provided by your 
cluster management software and database vendor to monitor the state of the cluster 
and the state of the database. We recommend that you also monitor the scheduler log 
on the active node. When your cluster or clustered database is not functioning properly, 
consult the documentation for your cluster management software or your database 
vendor and follow the instructions to restore the environment. 
In high-availability and dual event server modes, monitor the scheduler logs for 
database and scheduler failures and take appropriate action to return to 
high-availability mode when failures occur. 
Note: For more information about monitoring tools provided by your cluster 
management software and database vendor, see the documentation for those products. 
 How to Maintain Highly-Available Environments
The following diagram demonstrates how you can maintain a highly-available 
environment: 
Follow these steps: 
1. Monitor the scheduler log ( . 
2. Restore the failed scheduler. 
■ On UNIX    
■ On Windows 
3. Recover the failed database. 
■ On UNIX   
■ On Windows   
 How to Maintain Highly-Available Environments 

Monitor the Scheduler Log 
The scheduler log displays information about alarms and related error messages so that 
you know when you need to take action to resolve problems with your CA Workload 
Automation AE environment. For example, the scheduler log displays alarm information 
when the scheduler issues a database rollover alarm. 
■ If you are operating in a highly-available cluster environment, monitor the 
scheduler log on the active node. The scheduler log does not display all of the 
information that you need to monitor the environment. Ensure that you also use 
the monitoring tools that are provided by your cluster manager to monitor the state 
of the cluster. 
■ If you are operating in high-availability mode, monitor the state of the dormant 
scheduler. 
The primary scheduler is typically the active scheduler. In this case, the shadow 
scheduler begins processing events and issues an alarm when the primary scheduler 
fails (EP_ROLLOVER). 
If you restore the primary scheduler when the primary failback mode is set to 1, the 
shadow scheduler remains active and the primary scheduler runs dormant. In this 
case, the primary scheduler resumes processing events and issues an alarm when 
the shadow scheduler fails (EP_ROLLOVER). 
■ If you are operating in dual event server mode, monitor the active scheduler log. 
When a database rollover occurs, CA Workload Automation AE begins operating in 
single event server mode and the active scheduler issues a database rollover alarm 
(DB_ROLLOVER). 
■ If you are operating with a clustered database, use the monitoring tools that are 
provided by your cluster manager and database vendor to monitor the state of the 
database. 
■ In all CA Workload Automation AE environments, monitor the scheduler log for 
alarm information and error messages so that you can take appropriate action to 
resolve problems with the environment as they occur. 
Note: For information about the monitoring tools that are provided by your cluster 
manager and database vendor, see the documentation for those products. 
Follow these steps: 
1. Log in as the root user on the machine or node where the scheduler is installed. 
2. Open the operating system or instance command prompt as follows: 
■ (UNIX) Run the shell that is sourced to use CA Workload Automation AE. 
The operating system command prompt appears. 
■ (Windows) Click Start, Programs, CA, Workload Automation AE, Command 
Prompt (instance_name). 
The instance command prompt appears. 
3. Enter the following command: 
autosyslog -e 
The scheduler log appears. When CA Workload Automation AE encounters a 
problem, the scheduler issues an alarm. The log displays the alarm information and 
related error messages so that you can resolve the problem. 
Restore the Failed Scheduler on UNIX 
If you are operating CA Workload Automation AE in high-availability mode and the 
active scheduler fails, high-availability is disabled. If you are operating in a 
highly-available cluster environment, the action that you need to take to return to a 
highly-available environment depends on your cluster management software. 
In highly-available cluster environments, manual intervention following a failover is 
usually not required. The cluster manager may still issue informational messages. When 
the cluster manager issues messages indicating that manual intervention is required, 
take one of the following actions: 
■ Review messages and follow the instructions that they provide. 
■ Consult the documentation for your cluster management software and the 
instructions for manually recovering the failed cluster node. 
■ Contact your network administrator. 
If you are operating in high-availability mode and high-availability is disabled because 
the active scheduler fails, return to high-availability mode by restoring the failed 
scheduler. 
Follow these steps: 
1. If the primary failback mode is set to 0 and you are restoring the primary scheduler, 
stop the shadow scheduler. 
Note: By default, the primary failback mode is set to 0 and this step is required. 
When the primary failback mode is set to 1 or 2, restart the primary scheduler 
without stopping the shadow scheduler. 
a. Log in to a machine in the instance with a client installation as the root user. 
You can issue commands that execute client utilities. 
b. Run the shell that is sourced to use CA Workload Automation AE. 
The operating system command prompt appears. 
c. Enter the following command: 
sendevent –E STOP_DEMON -v ROLE=S How to Maintain Highly-Available Environments 
2. If high-availability mode was disabled because a failover occurred, restore the 
primary scheduler. 
a. Log in to the machine where the primary scheduler is installed and run the shell 
that is sourced to use CA Workload Automation AE. 
The operating system command prompt appears. 
b. Enter the following command: 
eventor 
The primary scheduler is restored. 
The primary scheduler either runs dormant or resumes processing events, 
depending on the primary failback mode. CA Workload Automation AE returns to 
high-availability mode. 
Important! If the primary failback mode is set to 0, the primary scheduler resumes 
processing events as soon as it is restored but CA Workload Automation AE does 
not return to high-availability mode. To return to high-availability mode, restore the 
shadow scheduler. 
If the primary failback mode is set to 1 or 2 CA Workload Automation AE returns to 
high-availability mode. 
3. If high-availability mode was disabled because of a failback that was not automatic 
or manual, use the eventor command to restore the shadow scheduler. 
Notes: 
■ Automatic failbacks occur when primary failback mode is set to 2 and you 
restore the primary scheduler. You can control when a failback occurs by 
setting primary failback mode to 1 and initiating a manual failback only when 
you want the primary scheduler to resume processing events. 
■ The shadow scheduler process restarts itself in a dormant state when an 
automatic failback occurs or when you initiate a manual failback. The shadow 
scheduler does not stop running when you initiate a manual failback. In both 
cases, no action is required to return to high-availability mode. 
■ Failbacks also occur when the shadow scheduler fails. In this case, 
high-availability mode is disabled until you restore the shadow scheduler. 
The primary scheduler continues processing events, the shadow scheduler is restored 
but runs dormant, and CA Workload Automation AE returns to high-availability mode. 
Note: For more information about setting the Primary Failback Mode, see the UNIX 
Implementation Guide. 
Restore the Failed Scheduler on Windows 
If you are operating CA Workload Automation AE in high-availability mode and the 
active scheduler fails, high-availability is disabled. If you are operating in a 
highly-available cluster environment, the action that you need to take to return to a 
highly-available environment depends on your cluster management software. 
In highly-available cluster environments, manual intervention following a failover is 
usually not required. The cluster manager may still issue informational messages. When 
the cluster manager issues messages indicating that manual intervention is required, 
take one of the following actions: 
■ Review messages and follow the instructions that they provide. 
■ Consult the documentation for your cluster management software and the 
instructions for manually recovering the failed cluster node. 
■ Contact your network administrator. 
If you are operating in high-availability mode and high-availability is disabled because 
the active scheduler fails, return to high-availability mode by restoring the failed 
scheduler. 
Follow these steps: 
1. If the primary failback mode is set to Off and you are restoring the primary 
scheduler, stop the shadow scheduler. 
Note: By default, the primary failback mode is set to Off and this step is required. 
When the primary failback mode is set to Immediate or Dormat, restart the primary 
scheduler without stopping the shadow scheduler. 
a. Log in to a machine in the instance with a client installation as the root user. 
You can issue commands that execute client utilities. 
b. Click Start, Programs, CA, Workload Automation AE, Command Prompt 
(instance_name). 
The instance command prompt appears. 
c. Enter the following command: 
sendevent –E STOP_DEMON -v ROLE=S How to Maintain Highly-Available Environments 
2. If high-availability mode was disabled because a failover occurred, recover the 
primary scheduler. 
a. Log in to the machine where the scheduler is installed and click Start, 
Programs, CA, Workload Automation AE, Administrator. 
The Instance - CA Workload Automation AE Administrator window opens. 
b. Select an instance from the Instance drop-down list in the Settings pane, and 
then click the Services icon on the toolbar. 
The Services - CA Workload Automation AE Administrator window appears, 
displaying a list of services installed on the selected instance. 
c. Right click the Scheduler service, and select Start. 
Important! If the primary failback mode is set to 0ff, the primary scheduler 
resumes processing events as soon as it is restored but CA Workload 
Automation AE does not return to high-availability mode. To return to 
high-availability mode, restore the shadow scheduler. 
If the primary failback mode is set to Dormant or Immediate, CA Workload 
Automation AE returns to high-availability mode. 
3. If high-availability mode was disabled because a failback occurred, restore the 
shadow scheduler. 
Notes: 
■ Automatic failbacks occur when primary failback mode is set to Immediate and 
you restore the primary scheduler. You can control when a failback occurs by 
setting primary failback mode to Dormant and initiating a manual failback only 
when you want the primary scheduler to resume processing events. 
■ The shadow scheduler process restarts itself in a dormant state when an 
automatic failback occurs or when you initiate a manual failback. The shadow 
scheduler does not stop running when you initiate a manual failback. In both 
cases, no action is required to return to high-availability mode. 
■ Failbacks also occur when the shadow scheduler fails. In this case, 
high-availability mode is disabled until you restore the shadow scheduler. 
The primary scheduler continues processing events, the shadow scheduler is 
restored but runs dormant, and CA Workload Automation AE returns to 
high-availability mode. 
The scheduler recovers and CA Workload Automation AE returns to high-availability 
mode. 
Note: For more information about setting the Primary Failback Mode, see the Windows 
Recover the Failed Database on UNIX 
If you are using dual event server mode as your database failover solution and the 
scheduler initiates a database rollover, dual event server mode is disabled. 
Highly-available cluster environments do not function properly without a 
highly-available database. A database rollover does not disable high-availability mode, 
but the risk of down-time and data loss increases without a failover solution for your 
database. 
Operating without a failover solution to the database increases the risk of downtime 
and data loss. To mitigate this risk, restore the failed database. 
If you are operating with a clustered database and a database failure occurs, some 
cluster managers require manual intervention to recover the failed database. In this 
case, take one of the following actions: 
■ Review messages issued by the cluster manager and database vendor, and follow 
the instructions that the messages provide. 
■ Consult the documentation for your cluster management software and your 
database software, and the instructions for manually recovering from a database 
failure. 
■ Contact your database or network administrator. 
Notes: 
■ You can operate with a clustered database only when you have cluster 
management software installed and are using a cluster aware database. Some 
cluster managers restore a failed database automatically, so no action is required. 
In this case, your cluster manager issues messages indicating that the database is 
restored. 
■ If you have cluster management software installed, we recommend that you set up 
a highly-available cluster environment instead of configuring high-availability mode. 
■ When a rollover occurs, CA Workload Automation AE backs up the configuration file 
before commenting out the failed event server. You can use the backup file to 
restore pre-rollover configuration settings when you reconfigure dual event servers. 
If you are using dual event server mode as your database failover solution and the 
scheduler rolls over the database, CA Workload Automation AE begins operating in 
single event server mode. To return to dual event server mode, recover from the failure 
and reconfigure dual event servers. 
Follow these steps: 
1. Review the scheduler log file to determine the problem that caused the failure. 
2. Consult your database software documentation, and follow the instructions for 
manually resolving the problem that caused the database failure. 
The failed database is recovered, and you can reconfigure dual event servers. How to Maintain Highly-Available Environments 
3. Log in to CA Workload Automation AE machine in the instance with a client 
installation, and click Start, CA, Workload Automation AE, Command Prompt. 
The operating system command prompt opens. 
4. Enter the following command: 
sendevent -E STOP_DEMON -v ALL 
All server processes that are running on the instance stop. 
5. If you want to restore the pre-rollover configuration settings, delete the modified 
configuration file and rename the backup file. 
a. Delete $AUTOUSER/config.$AUTOSERV. 
b. Locate $AUTOUSER/config.$AUTOSERV.rollover and change the name to 
$AUTOUSER/config.$AUTOSERV. 
c. Repeat these actions on every server machine that is installed in the instance. 
6. If you want to specify new configuration settings when you restore the failed event 
server, modify the $AUTOUSER/config.$AUTOSERV file. 
a. Open the configuration file on the machine where the primary scheduler is 
installed and locate the following parameter: 
#EventServer_1|#EventServer_2 
b. Edit the parameter as follows: 
Event_Server_1|EventServer_2 
c. Specify the primary event server and the secondary event server: 
■ To make the new database the secondary event server, add the following 
parameter: 
EventServer_2=SYBASE_SVR:SYBASE_DB,DBPORT,DBHOST | 
ORACLE_SVR,DBPORT,DBHOST 
 SYBASE_DB,DBPORT,DBHOST 
 Identifies the Sybase database for the second event server. 
 ORACLE_SVR,DBPORT,DBHOST 
 Identifies the Oracle database for the second event server. 
■ To make the restored database the secondary event server, specify the 
active database in the EventServer_1 parameter and the restored database 
in the EventServer_2 parameter. 
■ To make the new database the primary event server, specify the existing 
database that is defined in the EventServer_1 parameter as the secondary 
event server by changing it to EventServer_2, then add the following 
parameter: 
EventServer_1=SYBASE_SVR:SYBASE_DB,DBPORT,DBHOST | 
ORACLE_SVR,DBPORT,DBHOST 
■ To make the restored database the primary event server, verify that the 
restored database is specified in the EventServer_1 parameter and the 
active database is specified in the EventServer_2 parameter. 
d. Specify the database reconnect behavior for the second event server by 
modifying the following parameter in the configuration file: 
DBEventReconnect=value, value2 
value 
Identifies the database reconnect behavior for the first event server. 
Limits: 0-99 
value2 
Identifies the database reconnect behavior for the second event server. 
Limits: 0-99 
Note: During typical installation, CA Workload Automation AE sets the 
reconnect value for the single event server to 50 by default. During a custom 
installation in which you enable dual event server mode, CA Workload 
Automation AE sets the reconnect value for both event servers to 50, 5 by 
default. Ensure that you add a reconnect value for the second event server 
when you configure CA Workload Automation AE to run in dual event server 
mode after running it in single event server mode. Optionally, you can modify 
the default reconnect value for the first event server. 
The secondary event server is configured on the primary scheduler machine. 
7. Repeat the secondary event server configuration on every server machine in the 
instance. Ensure that the event server information in the configuration file is the 
same on each of these machines. 
8. Run the CA Workload Automation AE bulk copy script. The script that you run 
depends on your database vendor. 
■ Oracle 
Open the $AUTOSYS/dbobj/ORA directory and run the following script: 
perl autobcpORA.pl source_server target_server source_userid 
source_password target_userid target_password dump_file oracle_directory 
■ Sybase 
Open the $AUTOSYS/dbobj/SYB directory and run the following script: 
perl autobcpSYB.pl source_server source_db target_server target_db 
source_userid source_password target_userid target_password dump_file 
blk_size How to Maintain Highly-Available Environments 
  
source_server 
Defines the name of the source Oracle System ID (for example, AEDB) or 
Sybase server name (for example, SourceServer). For Sybase, the source server 
name is defined in the interfaces file. 
source_db 
Defines the source Sybase database (for example, AEDB). 
source_userid 
Defines the user ID that is used to connect to the source Oracle System ID, or 
Sybase server. 
Note: On Oracle, use aebadmin as the source user ID. 
source_password 
Defines the password that corresponds to the user ID that is used to connect to 
the source Oracle System ID, or Sybase server. 
target_server 
Defines the target Oracle System ID (for example, AEDB2), or Sybase server 
name (for example, DestinationServer). For Sybase, the target server name is 
defined in the interfaces file. 
Note: For Oracle, the source server must be different from the target server. 
target_db 
Defines the target Sybase database (for example, AEDB2). 
Note: The autobcpDB script deletes all of the data in the target database and 
replaces it with the data in the source database. If you want to save the data in 
the target database, archive it before you run the autobcpDB script. 
target_userid 
Defines the user ID that is used to connect to the target Oracle System ID, or 
Sybase server. 
Note: On Oracle, use aedbadmin as the target user ID. 
target_password 
Defines the password that corresponds to the user ID that is used to connect to 
the target Oracle System ID, or Sybase server. 
dump_file 
Defines the temporary file that is used in the transfer of data from one 
database to the other database. 
Note: Specify a file that is local to the computer where this script is running. 
oracle_directory 
Defines the path to the Oracle home directory. 
blk_size 
(Optional) Specifies the number of rows that can be inserted from the 
dump_file to the destination database at a time. 
Default: 5000 
Note: The Sybase script uses the default value when you run it in the interactive 
mode, or when you do not specify the blk_size value. Do not specify a large value 
because the transaction log encounters problems when it becomes too full. 
The event servers are synchronized. 
9. Restart the server processes for the instance. If you are operating in a 
highly-available cluster environment, restart the services on the active node. 
a. Open the operating system command prompt, and enter the following 
commands: 
unisrvcntr start waae_sched.$AUTOSERV 
unisrvcntr start waae_server.$AUTOSERV 
b. Repeat this action on every server machine in the instance. If you are operating 
in a highly-available cluster environment, restart the components on the active 
node only. 
Note: In highly-available cluster environments, the scheduler and application 
server actively execute work on the active node only. The cluster manager 
prompts the scheduler or application server on one of the passive nodes to 
begin executing tasks only when it detects a failure of the same component on 
the active node. 
The database is restored. 
High-availability is restored. To maintain your highly-available environment, continue 
monitoring the scheduler log and recovering from scheduler and database failures when 
they occur. 
 How to Maintain Highly-Available Environments 
Recover the Failed Database on Windows 
If you are using dual event server mode as your database failover solution and the 
scheduler initiates a database rollover, dual event server mode is disabled. 
Highly-available cluster environments do not function properly without a 
highly-available database. A database rollover does not disable high-availability mode, 
but the risk of down-time and data loss increases without a failover solution for your 
database. 
Operating without a failover solution to the database increases the risk of downtime 
and data loss. To mitigate this risk, restore the failed database. 
If you are operating with a clustered database and a database failure occurs, some 
cluster managers require manual intervention to recover the failed database. In this 
case, take one of the following actions: 
■ Review messages issued by the cluster manager and database vendor, and follow 
the instructions that the messages provide. 
■ Consult the documentation for your cluster management software and your 
database software, and the instructions for manually recovering from a database 
failure. 
■ Contact your database or network administrator. 
Notes: 
■ You can operate with a clustered database only when you have cluster 
management software installed and are using a cluster aware database. Some 
cluster managers restore a failed database automatically, so no action is required. 
In this case, your cluster manager issues messages indicating that the database is 
restored. 
■ If you have cluster management software installed, we recommend that you set up 
a highly-available cluster environment instead of configuring high-availability mode. 
If you are using dual event server mode as your database failover solution and the 
scheduler rolls over the database, CA Workload Automation AE begins operating in 
single event server mode. To return to dual event server mode, recover from the failure 
and reconfigure dual event servers. 
Follow these steps: 
1. Review the scheduler log file to determine the problem that caused the failure. 
2. Consult your database software documentation, and follow the instructions for 
manually resolving the problem that caused the database failure. 
The failed database is recovered, and you can reconfigure dual event servers. 
3. Log in to CA Workload Automation AE machine in the instance with a client 
installation and click Start, Programs, CA, Workload Automation AE, Command 
Prompt (instance_name). 
The instance command prompt opens. 
4. Enter the following command: 
sendevent -E STOP_DEMON -v ALL 
All server processes that are running in the instance stop. 
5. Restore, enable, and reconfigure the failed database (event server). 
a. Review the scheduler log file to determine the problem that caused the failure. 
b. Consult your database software documentation, and follow the instructions for 
manually resolving the problem that caused the database failure. 
The status of the failed database changes and you can reconfigure dual event 
servers. 
c. Click the Event Server icon on the toolbar in any CA Workload Automation AE 
Administrator window. 
The Event Server - CA Workload Automation AE Administrator window appears. 
d. Clear the A Database Rollover Has Occurred check box, verify the event server 
configuration settings and make any desired modifications, and then click 
Apply. 
Important! Ensure that the configuration settings for Event Server 2 are 
identical to the configuration settings for Event Server 1. 
The failed event server is restored and the configuration settings are saved, but 
CA Workload Automation AE does not return to dual event server mode until 
you synchronize the event servers. 
6. Run the CA Workload Automation AE bulk copy script. The directory and script that 
you synchronize the event servers following a database failure are the same as the 
directory and script that you use to switch to dual event server mode when you did 
not enable dual event servers during installation. The script that you run depends 
on your database vendor. 
■ Oracle 
Open the %AUTOSYS%\dbobj\ORA directory and run the following script: 
perl autobcpORA.pl source_server target_server source_userid 
source_password target_userid target_password dump_file oracle_directory 
■ Sybase 
Open the %AUTOSYS%\dbobj\SYB directory and run the following script: 
perl autobcpSYB.pl source_server source_db target_server target_db 
source_userid source_password target_userid target_password dump_file 
blk_size How to Maintain Highly-Available Environments 
■ Microsoft SQL Server 
Open the %AUTOSYS%\dbobj\MSQ directory and run the following script: 
perl autobcpMSQ.pl source_server source_db target_server target_db 
source_userid source_password target_userid target_password dump_file 
The event servers are synchronized. 
7. Restart the server processes for the instance. If you are operating in a 
highly-available cluster environment, restart the services on the active node. 
a. Click Start, Programs, CA, Workload Automation AE, Administrator. 
The Instance - CA Workload Automation AE Administrator window opens. 
b. Select an instance from the Instance drop-down list in the Settings pane. 
c. Click the Services icon on the toolbar. 
A list of the services that are running on the instance appears. 
d. Right-click the scheduler service and select Start, then right-click the 
application server service and select start. 
e. If you are operating in a highly-available cluster environment, restart the 
components on the active node only. Otherwise, repeat these actions on every 
server machine in the instance. If you are operating in a highly-available cluster 
environment, restart the components on the active node only. 
Note: In highly-available cluster environments, the scheduler and application 
server actively execute work on the active node only. The cluster manager 
prompts the scheduler or application server on one of the passive nodes to 
begin executing tasks only when it detects a failure of the same component on 
the active node. 
The database is recovered and CA Workload Automation AE returns to dual event 
server mode. 
High-availability is restored. To maintain your highly-available environment, continue 
monitoring the scheduler log and recovering from scheduler and database failures when 
they occur. 

No comments:

Post a Comment