Wednesday 11 February 2015

Postgresql Disaster recovery plan

Disaster recovery plan - an overview.

Disaster recovery plan is the preparation to keep the system up and running after a complete crash of hardware/software system. It may not be completely automated and may need a downtime and manual effort to retrieve the readiness of the system. But the effort and downtime should be minimal as much as possible. This ensures the business continuity and confidence in a software system.
          In the case of software we have,
1.                  Application, called as front-end through which users are interacting to the system. This includes UI component, server component, middle ware, supporting libraries, scripts etc.

2.                  Database, called as the back-end in which all the application data are kept.
In the case of a disaster we may lose all these hardware and software, and here we are discussing a plan to make the software system up and running as just before to the crash.
In the case of application we can keep the required files in a server which is located in a different geographical region, and usually there is no frequent change in those files except the software upgrades. As a best practice, we need to keep the copies of all deployed code along with the development code with the support of a version control tool.
In the case of database, the data is varying and its frequency can be microsecond/second based on the usage/type of software. We have to retrieve the database to the possible nearest point just before the disaster.
Copy of a database on a particular time can be saved as a backup file. This file can be archived and restored up on requirement. But based on the data/usage/hardware capacity the time of backup-generation may be varying from minutes to hours, and heavy database backups may be hundreds of MBs to several TBs.

Back-up of database.

         
Backup of a database is the snap shot of a database at a time. All the committed transactions till that point will be available in a backup.
          In the case of postgresql we can generate two types of backup, physical and logical.

 

Physical backup


Physical backup is the copy of data stored by postgresql. Apart from the actual stored data, Postgresql engine uses storage space to keep temporary processing data in binary format. This particular data is not required to be backed-up during the physical backup. During the server set-up and maintenance it is very important to keep the required free space in the storage device/location for the smooth performance of this database system.

Logical backup


Logical backup is a file generated by by postgresql in the required format which can be opted during the backup generation process. The logical backup cannot be generated if the postgresql service is not running. The logical backup is not recommended as a standalone backup in critical cases due to various reasons.

Transaction log-backup


Apart from these two types of backup there is one more data extract, which is called transaction log backup. This is a piece of data that can be extracted from a database server which contains a set of transactions in binary format that was committed through normal database operation. The file size of transaction log backup is comparatively small. This file can be restored to another server to make it up to date.
As a “DR” plan we should be ready with a copy of database which is up to date.

Log Shipping

       

Log shipping is a well-known method to keep a copy of database up to date as the actual database. In the perspective of log shipping, the actual database is called as primary database and copy is called as secondary.

As discussed, the database may get updated at higher frequency, and it is not practical to generate the full database backup at the same frequency. But we can generate the transaction backup from the primary database and can add to the secondary database to keep the secondary database up to date. As transaction backup is the extraction of transactions committed in the database in the serialised order, restoration of transaction backup should be done in the same order in which they are generated. 


The file based log-shipping is available since PostgreSQL 8.2. We can enable the transaction log generation by few configuration settings. This transaction logs can be shipped to the secondary and can be configured to restore the transaction log file.

Initial setup


Log shipping can be configured between two ‘postgresql servers of same version. As a best practice, use same hardware configuration for these two servers, and install same version of postgresql server. Create a sample user database in one server, generate base-backup of the primary database and restore it to the secondary.

Continuous activities


1.    Generate the transaction log backup.
2.    Move it to the secondary server immediately.
3.    Set the permissions for the file, if required.
4.    Restore it to the secondary one by one in the serial order.

Steps for initial configuration of file based log shipping.


1.    Identify two individual postgresql servers which can communicate with each other. Make sure that you have taken the required backup from each of these servers. The data of these two servers may be lost/corrupt as part of this action.

2.    Identify the location where you want to keep the log files in the server, and ensure the space availability. Set environment variable $PGARCHIVE to the path. This needs to be set both on the master and standby servers. Master should be able to write to this location and standby should be able to read from it. Use another environment variable &STANDBYNODE to identify the standby server from master.

3.    Change parameters in the “postgresql.conf” file to generate the log files. 

wal_level = 'archive'
archive_mode = on
archive_command = ‹scp %p $STANDBYNODE:$PGARCHIVE/%f›

The first two configuration values will enable the log generation and the third one will  save the transaction log files to the specified location. If there is any issue in directly saving the log file to the standby server, we can save it to the local server and can ship to the standby server using another job/tool. In such cases, we can use the below command as “archive_command”.

archive_command = ‹ cp -i %p ../standalone/archive/%f ›


Here “standalone” folder and “data” folder are in the same hierarchical order.  Permission for the log files should be set immediately after shipping them to the destination  for  restoration, otherwise the “PostgreSQL” engine won’t be able to  read it, and will log an error.

If you are planning to save the log files in the same server and then shipping to the secondary, you should be ready with a scheduled script which can do the log-file shipping on a fixed time interval. The time interval can be finalised based on the network traffic and data sensitivity.

“scp” command can be used in the script to do this and once shipped, the log files can be removed or archived based on the plan.

4.    Start backup
psql -c "select pg_start_backup('base backup for log shipping')"

5.    Copy “data” files exclude “pg_xlog” folder.
“rsync” or tar command can be used for this. If your secondary server requires any security key, it should be set before using “rsync” command. Issues may be faced with the “rsync” command  due to some restrictions in environment.

As an alternative to  “rsync” command, we can save the log transaction files to the same server, “tar” them and send to the secondary.  In the secondary, replace the “data” files with the copied files from primary server. Permissions should be set for the postgreSQL user.

The additional folders which are created to hold the transaction backups in the primary should be maintained in the secondary server as well.

6.    Stop backup

psql -c "select pg_stop_backup(), current_timestamp"

7.    Set the recovery.conf parameters on the Standby server
standby_mode = 'on'
restore_command = ‹cp $PGARCHIVE/%f %p›

8.   Start the secondary server.

An another approach


1.    Generate the transaction-log backup.
2.    Move it to the secured backup server.
3.    Distribute it to the multiple database servers.
4.    Restore it to the database.
5.    Remove the transaction logs from the database servers.

This approach of keeping multiple replica of a database is applicable in the for very critical applications. The CPU/data-read load can be distributed to different servers so that the primary database can be used more for normal user transactions. 


Streaming Replication


What is streaming replication?


In Log shipping, the master database WAL files will change based on the transactions and then the logs will be shipped to the secondary server serially for replay. Apart from this, Postgresql has another advanced feature released in 9.0 where the logs can be directly send to the secondary through the normal database communication channel. This method is more secure and minimizes the replication delay.
We can configure the streaming replication along with the log-archival facility based on our requirement. This will give an additional flexibility if we are working on highly critical data.
          Here we are discussing steps which enable archiving along with streaming replication.

Steps to configure.


1.    Identify your Master and Standby servers. Ensure that these two servers can connect through the postgresql-port.

2.    Create a user(repuser) for replication in Master server.
CREATE USER repuser SUPERUSER LOGIN CONNECTION LIMIT 1 ENCRYPTED PASSWORD 'MyPassword';
3.  Set proper permission for the “repuser” . The following entry to the pg_hba.conf  file, sets access from any ip address (using encrypted password authentication)  to the server( you may wish to consider more restrictive options).
host replication repuser <<ip – of standby>> md5
4.  Set logging options in postgresql.conf, in both Master and Standby, so that we can collect more information regarding replication connection attempts and associated failures.
log_connections = on
5. Set the below mentioned parameters in postgresql.conf.
max_wal_senders = 1   
wal_keep_segments = 50
hot_standby = on
wal_level = hot_standby

archive_command = 'cd .'  # We can use a script here which can move the log files to an archiving location.
archive_mode = on
ls ./pg_xlog/00000* -lt | tail -n 1 | awk '{print $NF}' | xargs -i mv {} ../archived/
wal_keep_segments =10

6. Take the base backup of primary server.
psql -c "select pg_start_backup('base backup for streaming replication')"

7. Copy “data” folder to secondary server.
“tar” command can be used for this.
Eg: “tar” the “data” folder and move the “tar”-ed file to the secondary database server.
8. Stop the secondary database server.
Service postgresql-9.1 stop
9. Restore the “tar” file to the data folder of secondary database.
                   This can be done using the “tar” command.
10. The base backup in primary database should be stopped using the below query.
psql -c "select pg_stop_backup(), current_timestamp"
11. Create the recovery.conf file in the “data” folder of secondary server.
                   standby_mode = on
primary_conninfo = 'host=primarydbserver user=repuser  password=mypassword'
trigger_file = '/tmp/postgresql.trigger.5432'

Ensure the file permissions.

12. Start the secondary server.
                   Check the log.