Enterprise Caching Techniques – Distributed Caching

Standalone caching is good solution when requests are for least items in the database. But what if the application has workload and all data is access all the time? Or what if we have the business need to store large volume of data in the cache? In such scenarios standalone caching may not be the right solution. Further stand-alone caching may not be useful if the application is running in the multiple-servers environment. This is when the need for distributed caching arises. The distributed caching is key factor behind many successfully deployed applications and therefore it is widely used. Distributed caching is now accepted as a key component of any scalable application architecture. Memcached which is one of the leading and the most popular distributed caching framework is used as part of highly scalable architectures at Facebook, Twitter, YouTube etc.

Distributed caching is a form of caching that allows the cache to span across multiple servers so that it can grow in size and in transactional capacity. It distributes data across multiple servers while still giving you a single logical view of the cache. The architecture it employs makes distributed caching highly scalable. However like any other distributed systems, distributed caching frameworks are inherently complex due to involvement of network elements. In fact slow network could be key performance and scalability barrier for distributed caching. The distributed caching is out-of-process caching service and from application standpoint it acts as L2 cache.

One of the widely used tool for distributed architecture is Memcache. With Memcache one can configure distributed architecture where two or more node can communicate with each other for data synchronization for any node failure. This n to n network communication can be more tuned for 1 to n where 1 node behaving as master and can hold copy of all data while slave always carry unique data. Both has its own advantage and problems. Below is cluster of same machine just for example point of view where two memcache instance can be executed on two different port. Code can tie them together to setup a cluster as given below.

Memcached –L 127.0.0.1 –P 1212
Memcached –L 127.0.0.1 –P 2705

This can be used with any memcached client API. Spyememcached is in java and examples are given here.

MemcachedClient c=new MemcachedClient(AddrUtil.getAddresses("127.0.0.1:1212 127.0.0.1:2705"));

Databases or IO operations are costly when there is millions of application transactions. Distributed caching is one of the solution to reduce load on backend and make them more scalable. The main idea behind distributed caching is to provide in memory data to hold large volume of the stored data. Distributed caching techniques have advanced and matured to manage massive data completely in memory. A database server usually requires a high-end machine whereas distributed caching performs well on lower cost machines (like those used for Web servers). This allows adding more small configured machines and scale-out application easily. Such caching can be included with small server network configuration as small memory entity which is small size of available memory for application.

Leave a Reply

Your email address will not be published. Required fields are marked *