Disaster recovery plan - an overview.
Disaster recovery plan is the preparation to
keep the system up and running after a complete crash of hardware/software
system. It may not be completely automated and may need a downtime and manual
effort to retrieve the readiness of the system. But the effort and downtime
should be minimal as much as possible. This ensures the business continuity and
confidence in a software system.
In
the case of software we have,
1.
Application, called as front-end
through which users are interacting to the system. This includes UI component,
server component, middle ware, supporting libraries, scripts etc.
2.
Database, called as the back-end
in which all the application data are kept.
In the case of a
disaster we may lose all these hardware and software, and here we are
discussing a plan to make the software system up and running as just before to the
crash.
In the case of application we can keep the
required files in a server which is located in a different geographical region,
and usually there is no frequent change in those files except the software
upgrades. As a best practice, we need to keep the copies of all deployed code along
with the development code with the support of a version control tool.
In the case of database, the data is varying
and its frequency can be microsecond/second based on the usage/type of
software. We have to retrieve the database to the possible nearest point just
before the disaster.
Copy of a database on a particular time can
be saved as a backup file. This file can be archived and restored up on requirement.
But based on the data/usage/hardware capacity the time of backup-generation may
be varying from minutes to hours, and heavy database backups may be hundreds of
MBs to several TBs.
Back-up of database.
Backup of a
database is the snap shot of a database at a time. All the committed
transactions till that point will be available in a backup.
In
the case of postgresql we can generate two types of backup, physical and
logical.
Physical backup
Physical backup is the copy
of data stored by postgresql. Apart from the actual stored data, Postgresql
engine uses storage space to keep temporary processing data in binary format. This
particular data is not required to be backed-up during the physical backup.
During the server set-up and maintenance it is very important to keep the
required free space in the storage device/location for the smooth performance
of this database system.
Logical backup
Logical backup is a file
generated by by postgresql in the required format which can be opted during the
backup generation process. The logical backup cannot be generated if the
postgresql service is not running. The logical backup is not recommended as a
standalone backup in critical cases due to various reasons.
Transaction log-backup
Apart from these two types
of backup there is one more data extract, which is called transaction log
backup. This is a piece of data that can be extracted from a database server
which contains a set of transactions in binary format that was committed
through normal database operation. The file size of transaction log backup is
comparatively small. This file can be restored to another server to make it up
to date.
As a “DR” plan we should be ready with a copy
of database which is up to date.
Log Shipping
Log shipping is a well-known method to keep a
copy of database up to date as the actual database. In the perspective of log
shipping, the actual database is called as primary database and copy is called
as secondary.
As discussed, the database may get updated at
higher frequency, and it is not practical to generate the full database backup at
the same frequency. But we can generate the transaction backup from the primary
database and can add to the secondary database to keep the secondary database up
to date. As transaction backup is the extraction of transactions committed in
the database in the serialised order, restoration of transaction backup should
be done in the same order in which they are generated.
The
file based log-shipping is available since PostgreSQL 8.2. We can enable the
transaction log generation by few configuration settings. This transaction logs
can be shipped to the secondary and can be configured to restore the
transaction log file.
Initial setup
Log shipping can
be configured between two ‘postgresql servers of same version. As a best
practice, use same hardware configuration for these two servers, and install
same version of postgresql server. Create a sample user database in one server,
generate base-backup of the primary database and restore it to the secondary.
Continuous activities
1.
Generate the transaction log
backup.
2.
Move it to the secondary server
immediately.
3.
Set the permissions for the
file, if required.
4.
Restore it to the secondary one
by one in the serial order.
Steps for initial configuration of file
based log shipping.
1.
Identify two individual
postgresql servers which can communicate with each other. Make sure that you
have taken the required backup from each of these servers. The data of these
two servers may be lost/corrupt as part of this action.
2.
Identify the location where you
want to keep the log files in the server, and ensure the space availability.
Set environment variable $PGARCHIVE to the path. This needs to be set both on
the master and standby servers. Master should be able to write to this location
and standby should be able to read from it. Use another environment variable
&STANDBYNODE to identify the standby server from master.
3.
Change parameters in the “postgresql.conf”
file to generate the log files.
wal_level = 'archive'
archive_mode = on
archive_command = ‹scp %p $STANDBYNODE:$PGARCHIVE/%f›
The
first two configuration values will enable the log generation and the third one
will save the transaction log files to the
specified location. If there is any issue in directly saving the log file to
the standby server, we can save it to the local server and can ship to the standby
server using another job/tool. In such cases, we can use the below command as “archive_command”.
archive_command = ‹ cp -i %p ../standalone/archive/%f ›
Here
“standalone” folder and “data” folder are in the same hierarchical order. Permission for the log files should be set
immediately after shipping them to the destination for
restoration, otherwise the “PostgreSQL” engine won’t be able to read it, and will log an error.
If you are
planning to save the log files in the same server and then shipping to the
secondary, you should be ready with a scheduled script which can do the
log-file shipping on a fixed time interval. The time interval can be finalised
based on the network traffic and data sensitivity.
“scp” command
can be used in the script to do this and once shipped, the log files can be
removed or archived based on the plan.
4.
Start backup
psql -c "select pg_start_backup('base backup for log shipping')"
5.
Copy “data” files exclude
“pg_xlog” folder.
“rsync” or tar command can be used for this. If your
secondary server requires any security key, it should be set before using
“rsync” command. Issues may be faced with the “rsync” command due to some restrictions in environment.
As an alternative to “rsync” command, we can save the log
transaction files to the same server, “tar” them and send to the secondary. In the secondary, replace the “data” files
with the copied files from primary server. Permissions should be set for the
postgreSQL user.
The additional folders which are created to hold the
transaction backups in the primary should be maintained in the secondary server
as well.
6.
Stop backup
psql -c "select pg_stop_backup(), current_timestamp"
7.
Set the recovery.conf
parameters on the Standby server
standby_mode = 'on'
restore_command = ‹cp $PGARCHIVE/%f %p›
8. Start
the secondary server.
An another approach
1.
Generate the transaction-log
backup.
2.
Move it to the secured backup
server.
3.
Distribute it to the multiple
database servers.
4.
Restore it to the database.
5.
Remove the transaction logs
from the database servers.
Streaming Replication
What is streaming replication?
In Log shipping, the master
database WAL files will change based on the transactions and then the logs will
be shipped to the secondary server serially for replay. Apart from this,
Postgresql has another advanced feature released in 9.0 where the logs can be
directly send to the secondary through the normal database communication
channel. This method is more secure and minimizes the replication delay.
We can configure the
streaming replication along with the log-archival facility based on our
requirement. This will give an additional flexibility if we are working on
highly critical data.
Here we are discussing steps which
enable archiving along with streaming replication.
Steps to configure.
1.
Identify your Master
and Standby servers. Ensure that these two servers can connect through the postgresql-port.
2.
Create a user(repuser)
for replication in Master server.
CREATE USER repuser
SUPERUSER LOGIN CONNECTION LIMIT 1 ENCRYPTED PASSWORD 'MyPassword';
3. Set proper permission
for the “repuser” . The following entry to the pg_hba.conf file, sets access from any ip address (using
encrypted password authentication) to the
server( you may wish to consider more restrictive options).
host replication repuser <<ip
– of standby>> md5
4. Set logging options in postgresql.conf, in
both Master and Standby, so that we can collect more information regarding
replication connection attempts and associated failures.
log_connections = on
5. Set the below mentioned parameters in postgresql.conf.
max_wal_senders = 1
wal_keep_segments = 50
hot_standby = on
wal_level = hot_standby
archive_command = 'cd .' # We can use a script here
which can move the log files to an archiving location.
archive_mode = on
ls
./pg_xlog/00000* -lt | tail -n 1 | awk '{print $NF}' | xargs -i mv {}
../archived/
wal_keep_segments =10
6. Take the base backup of primary server.
psql -c
"select pg_start_backup('base backup for streaming replication')"
7. Copy “data” folder to secondary server.
“tar” command can
be used for this.
Eg: “tar” the
“data” folder and move the “tar”-ed file to the secondary database server.
8. Stop the secondary database server.
Service
postgresql-9.1 stop
9. Restore the “tar” file to the data folder of secondary
database.
This can be done using the
“tar” command.
10. The base backup in primary database should be stopped using
the below query.
psql -c
"select pg_stop_backup(), current_timestamp"
11. Create the recovery.conf file in the “data” folder of secondary
server.
standby_mode = on
primary_conninfo = 'host=primarydbserver
user=repuser password=mypassword'
trigger_file = '/tmp/postgresql.trigger.5432'
Ensure the file
permissions.
12. Start the secondary server.
Check the log.
I found out the hard way just what not having a database disaster recovery plan can do. Thinking ahead is always the smart thing to do! Thankfully I found a great service to aid me in the process.
ReplyDelete