Skip to content

201311clustering_old

Jeroen Ticheler edited this page Nov 22, 2013 · 2 revisions

Web applications are used by multiple users. If the number of users increases the web application can get slow (or even stop). By setting up a second, third etc instance of a webserver running that application and guide all trafic through a 'load balancer', one is able to distribute the load on a single application over multiple servers. This is called clustering or scaling.

Right now, there is no standard way to cluster and scale GeoNetwork. There is a “readonly instance” implementation that can be used, but there is no efficient way to keep this “readonly instance” synchronized with the rest of GeoNetwork instances, which means that there will be inconsistencies between different instances.

We have three proposals on how to improve scalability in GeoNetwork.

Master server and slaves with daily synchronization

This is the solution currently used in some portals. There is one master server and several read-only instances that are database-restored daily to synchronize with the master instance. Changes will take up to a day to propagate to all instances. This solution means that there will be no developments on GeoNetwork's core, but an external tool will be implemented to automatically do the synchronization.

Master server with slaves with daily harvesting

This option is similar to the previous one, but using harvesting between the master and the slaves instead of removing and rebuilding the database and index. This is not so widely used because harvesting is sometimes slower than rebuilding the database and indexes. See the harvesting proposal

Balanced Servers with JMS

This is a non-merged old proposal. In this case, we have several GeoNetwork instances, all of them writable, and all of them connected to the same database. This way, all instances have the same features and data, as all of them are using the same storage (database).

On current proposal, every instance have its own Lucene index, but if we use a SOLR index, then all instances of GeoNetwork can use the same index too, which means that all instances will be sync on real time.

comparison table

The JMS solution seems to be the most complete and right way to cluster and scale GeoNetwork.

Clone this wiki locally