Configuring tomcat jdbc connection pool for high concurrency

There will be case when you will require to tune your existing connection pool configuration to sustain more user load with your old product or you want to identify very first time that how many concurrent user load your new product can handle.

Being architect or application designer, high probability will be that you will target tuning of connection pool configurations first at your end.

Below are some crucial configurations with tomcat jdbc and apache dbcp one can use to get high concurrency with less to no database connection issues.

//I used this in my application to hit database server with 20 running threads to update 14 tables(average 10 fields per table).
//It was 238 insertion of 0.5MB of data(encrypted XML string) in a second.
//Encryption time is also considered in this otherwise database operation number can be higher than this.

poolProperties.setRemoveAbandonedTimeout(30); 
poolProperties.setMinEvictableIdleTimeMillis(5000); 
poolProperties.setTimeBetweenEvictionRunsMillis(1500); 

RemoveAbandonedTimeout – This is a timeout value. This should be the longest running query of your application however if your single connection object is used to fire multiple queries, value of timeout should be sum all those query execution. Keep this wide open to avoid ‘connection is already closed’ issue.
Above will not apply if you use ResetAbandonedTimer JDBC interceptor. In case of this, use timeout number as longest running single query.

poolProperties.setJdbcInterceptors(
	      "org.apache.tomcat.jdbc.pool.interceptor.ResetAbandonedTimer");

MinEvictableIdleTimeMillis – Minimum time a connection can stay idle before it gets evacuated by evacuation thread to free up the resources.

TimeBetweenEvictionRunsMillis – Evacuation thread will kick in every x milliseconds to evacuate idle or abandoned connection objects.

Please note: Performance is subject of tuning of multiple elements. It also depends on hardware where database files will get stored, network adapters and database server configuration itself. To get maximum throughput of any application it is necessary that all these elements are tuned/configured properly.

Enterprise Caching Techniques – Standalone Caching

Many application domains have more fetched concentric requirements and with very few store operations. Like in E-Commerce, where buyer’s search versus purchase ratio is 9:1 or sometime even wider. Such applications require additional layer of caching in their architecture. Caching is not something new and invented recently, it is there since era of hardware evolution started. What we see with any hardware architecture in form of L1 and L2 CPU cache, are caching mechanism and still in use. L1 and L2 reside in between of processor and RAM, and contain system critical information for processing. Fetching of data from those caches are faster as compare to RAM but size of it is quite small as compare to main memory. This further helps to bifurcate type of data and helps CPU to decide its storing location. Caching with enterprise application directly derives from that same concept. However here it is may be in same CPU or in different machines/nodes connected with parent with very high network cards. So mainly caching in enterprise application is divided into two parts i.e. Standalone Caching and Distribute Caching.

Standalone Caching

Sometime referred as embedded or in-process caching, is single virtual machine based technique of storing frequently asked data. Standalone caching acts as L1 cache from application perspective and resides in RAM.
The main purpose of using the standalone caching is to improve the performance of the business critical operations. The standalone caching has limited main memory as its disposal. Therefore only data that is frequently used and important for the business critical functions is cached. Standalone caching products are always used as a side-cache for an application’s data access layer. Sidecache refers to an architectural pattern in which the application manages the caching of data from a database or filesystem or from any source. In this scenario, cache is used to temporarily store the objects. Applications first checks existing copy of data and returns if present. When data is not present, it retrieves from data access layer and put into cache for next incoming request.
In caching, some mechanism is required to cope with invalid cached data, data which is updated and still not refreshed in cache. There are several techniques that can be used to deal with invalid data or to remove unused caches to free some memory for other in-demand data.

Such concerns can be handled with by writing API which can take care invalid cache.
The caching product like EHCache provides basic functionality to handle invalidate data. The application decides at what point cached data should be invalidated. Typically strategy employed is whenever data is updated at store, application invalidates the cached data. If copy of cached data is not vital to update on the spot, we can apply some other techniques which can periodically refresh cache by assigning some time based configuration. We can even combine both techniques for multi-server environment.

There are also some other ways to update and remove cached data. With TTL(time-to-live) or LRU(Least frequently used) configuration we can monitor individual cache and take action for them with the help of API.

Problem with Standalone cache is, it is very limited and only can be used with single node/machine architecture. Hence need for distributed cache arisen, next in same series.

java.util.regex.PatternSyntaxException: Dangling meta character

When you try to split any string with ? or * as below code.

String sqlParts[] = sql.split("?");

You will end up with unchecked PatternSyntaxException as given below.

java.util.regex.PatternSyntaxException: Dangling meta character '?' near index 0

Solution:
Avoid using dangling metacharacters like ‘?’, ‘+’ and ‘*’. Instead use it with escape sequences as like below.

String sqlParts[] = sql.split("\\?");

Memory Based Architecture For Enterprise Application – Introduction

We had this architecture discussion in one of the technical meetings in company recently and I was assigned to share all details on Memory Based Architecture. Sharing details from those sessions.

Memory, changing philosophy with enterprise applications and  Memory Based Architecture:

The main memory is high bandwidth and low latency component that can match performance of the processor in the computer. The bandwidth of main memory is around few GB per second as oppose to disk which is around hundred MB per second. The latency of main memory is in nanoseconds range where as that of disk is in milliseconds range. Traditionally main memory was considered as expensive resource and therefore it was scarcely used. However this perception that RAM is expensive component is now changing due to sharp drop in prices over past several years. Same time enterprise applications require more scalable and performance oriented use of those each chunk of available physical memory. Today they have enormous amount of such main memory cheaply available. Many applications are using memory in Gigabytes and Terabytes. The main memory empowers application architectures to achieve linear scalability and high performance. These qualities are extremely important to the modern enterprise applications for delivering guaranteed high performance under intensive and unpredictable workload.

As enterprises are using more memory, software vendors have flooded the market with several types of memory based products in order to size this new business opportunity. These products are targeted towards supporting various business use cases and architectural scenarios. This series is intended to introduce various memory based product categories along with business uses and architectural scenarios supported by them. 

When we think of any memory based products then high performance is the first thing that comes to our mind. Yes, high performance is primary reason why memory based products are used, but it is not the ‘only reason’. Many a times they are deployed to reduce IO operations over network or address the high latency issues with disk based products like databases. Typically with N Tier Architecture, properly design application code can easily scale out by adding more application servers. However the main scalability barrier is disk based database which is centrally access by all the clustered application servers. Here memory based products are typically deployed to overcome scalability bottleneck pose by disk based database and make application servers more scalable. Thus following can be considered as primary scenarios for any memory based product.

– Improve application performance
– Reduce network & disk IO Operations
– Overcome scalability barriers & make application servers more scalable.

The memory based products can be broadly classified as Caching(Standalone & Distributed Caching), In Memory Data Grid (IMDG), Main Memory Database (MMDB) and Application Platforms that enables Space Based Architecture and covered in great details under this series.

Double Checked Locking With Thread

While implementing a circular non-blocking queue, we observed a pattern which could boost a performance.

When you have condition based thread safe block, it is always advisable to check same condition before and after lock.

Before lock will avoid threads to get locked unnecessarily and after lock will ensure that any state change within synchronized block does not violate protocol.

In detail, for example, there are ‘n’ threads entering into this piece of code and say queue(ConcurrentLinkedQueue) is full, If queue is not full we should not lock threads, hence check at Line 1, All thread will wait at line 2. First thread (whichever will get CPU based on OS priority/roundrobin mechanism for processes) will enter and modify queue at line 4. Because of this, queue may not satisfy condition for other waiting threads, so thread entering second should not run code if condition is not satisfied, hence check at line 3. We could omit line 1 actually but adding it will have performance boost especially when condition is important to pass before entering to synchronized block.

   if (this. linkedThreadSafeQueue.size() == this.maxLimit) {
      synchronized(this.linkedThreadSafeQueue) {
          if (this. linkedThreadSafeQueue.size() == this. maxLimit) {
                        //Modify queue
          }
      }
   }

This helps when developer is dealing with singleton (condition based) and multithread implementation where he does not want to initialize same object again and also not to lock thread unnecessarily.

Git – A distributed SCM

Git is a distributed source code management system. A very powerful, fast and reliable way of managing source code. I started working with Git in 2010 and I found it better than any other non distributed SCM. I thought of sharing all basic Git operation which one can use and become productive very fast.

Please note : This article does not focus on Git vs SVN or CVS. This gives information on most useful commands with Git which can help a person to quick start with it.

Most useful commands with Git:

SSH operation (Key generation): Use this to allow your repository to securely communicate with Github, Beanstalk or some other hosting platform or even with your team mate.

$ ssh-keygen -C "email-id" -t rsa

Git configuration

git config --global user.email email-id
git config --global user.name "Dharmesh Borad"

Repository operation

1> How to create Git repository : $ git init

2> Add remote repository : $ git remote add github git@github.url.com:/platform.git

Branching and Merging operation with Git

1> Creating branch under this repository

$ git branch branchname

2> Listing all branch under this repository

$ git branch

3> To switch between branches. Below command will switch to newly created branch i.e. “branchname”.

$ git checkout branchname

4> To merge with master branch (branchname -> master)

$ git checkout master

$ git merge branchname

5> To delete/remove branch

$ git branch -d branchname

6> To delete branch on remote machine(origin alias of remote machine/repository)

$ git push origin :branchname

Add operation

1> Adding files to repository index for further commit

$ git add filename

2> Adding folder to repository index for further commit. This will add files from that folder also.

$ git add foldername/

Remove/Delete operation

1> To remove file/folder form .git index but to not remove physically

$ git rm –cached filename

2> To remove files from index as well as from physical storage. Use –r for recursive remove.

$ git rm -r myfolder/

$ git rm myfile

Commit operation

1> Committing to local repository. –a is for all and to skip staging area. –m is for message.

$ git commit -a -m "commit message"

Remote operation

1> Pulling from repository

$ git pull remotename branchname

2> Pushing to repository

$ git push remotename branchname

Help operation

1> To get help on any command

$ git help commandname

To fetch branch from remote repository

git remote show origin

1> If the remote branch which you want to checkout is under “New remote branches” and not under “Tracked remote branches” then you need to fetch them first from remote

git fetch branchane

2> Now it will allow to create branch out of fetched one

git checkout -b local-name origin/remote-name

In case if you are looking for some very specific operation please write me back. I will try to give details in my free time.