If the server is unavailable, switch to a second one: make the site more resilient through redundancy and replication

/sites/default/files/2023-03/website-hosting-concept-with-circuits_2.jpg

A website can go down because of a crash, a failure in the data center, communication channels or other reasons. And while for some projects it is a minor nuisance, for others it is a financial loss or reputational damage.

At Initlab, we know how to make your website more resilient - set up redundancy and replication. If the server is unavailable, a copy of the site on another server will step in and continue to work as if nothing had happened. Read our article about how database replication technology works and what it can be used for.

What is redundancy and replication

Redundancy is the creation of a server backup that is automatically kept up to date and activated in case of problems on the primary server. When designing redundancy, one of the main technologies is database replication.

Replication is the transfer of data from a master database server to a slave. In MySQL they are called master, where the data is written, and slave, where the data is transferred from the master.

Let's make it simple. There is a main website and several copies on different servers. The main site has a database - a folder where orders come in, new products are made and edits are made. All these changes are automatically transmitted to the copies of the site, but the information is stored only in the main site. It is also where all visitors go.

The master site is located on the master server. This is where all changes are made and from where data is sent to the slave servers, which pull updates from the master site. In other words, the sites are synchronized - the data in the databases are identical, as the settings on the servers are.

This is roughly how the cloud works. The main difference is that you can, for example, delete a file from a device, but it remains in the cloud. Here, on the other hand, there is complete synchronization - you delete a file on the main site - it is deleted from everywhere, add a new product - it appears everywhere.

When there is a failure on the server, the master site will transfer its authority to one of the copies. The database will go to the receiver - this is now the master site. This is to ensure that the site is accessible in the event of a server failure and no data is lost.

Why do you need a reservation

A server may be unavailable for a number of reasons: problems with network equipment in the data center, crashes (power, internet), etc. In addition to technical issues, there are cases when a client forgets to pay for a server. In all these situations, the mechanism is the same - a website from a working server comes to the fore instead of an unavailable one.

It is recommended to set up the site on servers in different data centers for intelligent redundancy. This is to reduce the chance of unavailability: if there is a fire or a power outage in one location, the other location is likely to be quiet and benign. If you share the same data center, you run the risk of catching the same problem on both servers. The number of servers connected to the site depends on the needs of the site and the client's wishes.

We have described the "unavailable-switching" mechanism in detail, but there is another application of this technology where the site is running on multiple servers in parallel.

If the load on one server is too heavy, user traffic is distributed to several servers at once. If one server fails, users will similarly be redirected to another, and edits/items will still be entered on the master.

This balancing act is great not only for improving resilience, but also for scaling the project. If you are preparing your online shop for a large flow of people - you need replication.

Switching between servers does not happen overnight. The system first needs to make sure that this is not a one-off failure and that the site has actually been unavailable for some time. In more complex infrastructure solutions, downtime can be kept to a minimum and set up to switch quickly at the slightest failure.

Which sites need redundancy

There are two reasons to engage in such a service: profit and reputation. For some companies, downtime will cost more than organising and maintaining redundancy and replication:

loss of orders from online shops;
loss of applications on sites with large orders, e.g. in the B2B segment;
loss of trust from potential clients or higher management.

Redundancy and replication is the first solution to improve resilience on high-traffic sites and to scale.

For example, one of our clients wanted users to always have access to their personal accounts and not lose any information they had entered. For this, the client asked for the first replication option to be set up - when a 'spare' site is launched in the event of a server failure.

It is very important for another client to keep the site up and running even under high load and in case of any kind of failure, so a request was made to set up load sharing between the servers.

Why replication support is needed and what it includes

Replication cannot be set up and left unattended. It will work until the first breakdown, after which the synchronization will start to fall apart. The package with this service necessarily includes administration, which includes automatic monitoring.

Synchronization. The main site transferred rights to the backup site after a server failure and therefore stopped working. When access is restored, it will lack the data that the new master site has already taken over. You will now need to change the site and server settings, as the transferring position has changed to the receiving position, and make sure that all files are synchronized.

Updates. When new versions of software packages are released, security vulnerabilities are often closed there. It is advisable to update them as needed, as the site or server may become vulnerable.

Monitoring. An automatic check once a minute collects data on a multitude of important parameters:

replication status: up and running, no synchronization failures;
the status of the master site server;
the relevance of the data on the backup server;
the state of the sites on their servers;
the status of memory, disk and processor resources;
the status of services running/running, the dynamics of requests to the server;
server and website availability.

If replication has stopped for some reason and the database on one server is lagging behind, monitoring will report this. The infrastructure should be back in sync as soon as possible, as long as the difference between the data is minimal and it does not take too much time.

Monitoring also gives information about the site. For example, how much time is left before the domain and SSL certificate are renewed? We also inform clients about this.

Any failure or problem can be reported to our service desk. In this way, the client can see what problems are occurring and we send a report to these tickets after analysis. If there are regular problems with a particular hosting operator and it cannot provide stable operation, this can be tracked and switched to a more reliable partner.

Replication is not universal and is customized for a specific client. On average, the Initlab team takes 16 hours to set up replication for a simple site.

Your project needs server redundancy and database replication if you want to increase the resilience of your site and/or scale your project. If site downtime = lost money for you, then you should consider this service.

We set up replication as part of server administration services on any CMS.

Server Administration

If the server is unavailable, switch to a second one: make the site more resilient through redundancy and replication

What is redundancy and replication

Why do you need a reservation

Which sites need redundancy

Why replication support is needed and what it includes

Add new comment