Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It didn't reach the limit of max-pool-size, but there was an maxPoolSize exhausted exception #23483

Closed
micRo-wlY opened this issue Jun 3, 2021 · 5 comments · Fixed by #24879

Comments

@micRo-wlY
Copy link

micRo-wlY commented Jun 3, 2021

Hi!

We hava an application that uses jdbc connection pooling.And uses the following settings:
max-pool-size:200
pool-resize-quantity:2
max-wait-time-in-millis:5000

It has been working fine for a long time, but recently we get the following exception while trying to concurrently access.

[java.sql.SQLException:Error in allocating a connection. Cause: In-use connections equal max-pool-size and expired max-wait-time. Cannot allocate more connections.]

The number of concurrent requests is 100.
Actually,It didn't exceed the limit of max-pool-size 200.

By analyzing the ConnectionPool code,we find a problem in ConnectionPool#resizePoolAndGetNewResource.

In order to understand the problem simply,suppose uses the following settings:
max-pool-size:10
pool-resize-quantity:1
max-wait-time-in-millis:5000

and,the problem flow is as follows:

  1. now there are five connections in the jdbc connection pool and all these connections are in-used

  2. one request comes to get connection(by Thread A)

  3. because there's still space to create a connection(current connections < max-pool-size),
    Thread A starts to execute the following resizePoolAndGetNewResource.
    In **1 method create 1 new connection, and add connection to connection pool.

  4. thread A before executing **2 method (take the connection off from connection pool)
    thread B just comes to get connection,and just takes the connection off from connection pool.

  5. when thread A executes **2 method,the connection was tooken by thread B, returns null.
    and max-wait-time-in-millis is timeout,in **3 method,the error will happen.

[java.sql.SQLException:Error in allocating a connection. Cause: In-use connections equal max-pool-size and expired max-wait-time. Cannot allocate more connections.]

Environment Details

  • GlassFish Version:5.1.0
  • JDK version:1.8.0_201
  • OS:ALL
  • Database:ALL

Questions about this case

Now,we hava one question:

When a thread execute the resizePoolAndGetNewResource, should this thread get the connection first?
If YES,the code of resizePoolAndGetNewResource has a bug.
If NO,the error message [Cause: In-use connections equal max-pool-size and expired max-wait-time.] is not right.

Related source

com.sun.enterprise.resource.pool.ConnectionPool

	private ResourceHandle resizePoolAndGetNewResource(ResourceAllocator alloc) throws PoolingException {
        //Must be called from the thread holding the lock to this pool.
        ResourceHandle result = null;
        int numOfConnsToCreate = 0;
        if (ds.getResourcesSize() < steadyPoolSize) {
            // May be all invalid resources are destroyed as
            // a result no free resource found and no. of resources is less than steady-pool-size
            numOfConnsToCreate = steadyPoolSize - ds.getResourcesSize();
        } else if (ds.getResourcesSize() + resizeQuantity <= maxPoolSize) {
            //Create and add resources of quantity "resizeQuantity"
            numOfConnsToCreate = resizeQuantity;
        } else if (ds.getResourcesSize() < maxPoolSize) {
            // This else if "test condition" is not needed. Just to be safe.
            // still few more connections (less than "resizeQuantity" and to reach the count of maxPoolSize)
            // can be added
            numOfConnsToCreate = maxPoolSize - ds.getResourcesSize();
        }
        if (numOfConnsToCreate > 0) {
             createResources(alloc, numOfConnsToCreate); **1
             result = getMatchedResourceFromPool(alloc); **2
        } else if (ds.getFreeListSize() > 0) {
            //pool cannot create more connections as it is at max-pool-size.
            //If there are free resources at max-pool-size, then none of the free resources
            //has matched this allocator's request (credential). Hence purge free resources
            //of size <=resizeQuantity
            if (purgeResources(resizeQuantity) > 0) {
                result = resizePoolAndGetNewResource(alloc);
            }
        }
        return result;
    }

    public ResourceHandle getResource(ResourceSpec spec, ResourceAllocator alloc, Transaction txn)
            throws PoolingException, RetryableUnavailableException  {
        //Note: this method should not be synchronized or the
        //      startTime would be incorrect for threads waiting to enter

       /*
        * Here are all the comments for the method put together for
        * easy reference.
        *  1.
           // - Try to get a free resource. Note: internalGetResource()
           // will create a new resource if none is free and the max has
           // not been reached.
           // - If can't get one, get on the wait queue.
           // - Repeat this until maxWaitTime expires.
           // - If maxWaitTime == 0, repeat indefinitely.

           2.
           //the doFailAllConnectionsProcessing method would already
           //have been invoked by now.
           //We simply go ahead and create a new resource here
           //from the allocator that we have and adjust the resources
           //list accordingly so as to not exceed the maxPoolSize ever
           //(i.e if steadyPoolSize == maxPoolSize )
           ///Also since we are creating the resource out of the allocator
           //that we came into this method with, we need not worry about
           //matching
        */
        ResourceHandle result = null;

        long startTime = System.currentTimeMillis();
        long elapsedWaitTime;
        long remainingWaitTime = 0;

        while (true) {
            if (gateway.allowed()) {
                //See comment #1 above
                JavaEETransaction jtx = ((JavaEETransaction) txn);
                Set resourcesSet = null;
                if(jtx != null){
	                resourcesSet = jtx.getResources(poolInfo);
		        }
                //allow when the pool is not blocked or at-least one resource is
                //already obtained in the current transaction.
                if (!blocked || (resourcesSet != null && resourcesSet.size() > 0)) {
                    try {
                        result = internalGetResource(spec, alloc, txn);
                    } finally {
                        gateway.acquiredResource();
                    }
                }
            }
            if (result != null) {
                // got one, return it
                if (poolLifeCycleListener != null) {
                    poolLifeCycleListener.connectionAcquired(result.getId());
                    elapsedWaitTime = System.currentTimeMillis() - startTime;
                    poolLifeCycleListener.connectionRequestServed(elapsedWaitTime);
                    if (_logger.isLoggable( Level.FINE) ) {
                        _logger.log(Level.FINE, "Resource Pool: elapsed time " +
                                "(ms) to get connection for [" + spec + "] : " +
                                elapsedWaitTime);
                    }
                }
                //got one - seems we are not doing validation or matching
                //return it
                break;
            } else {
                // did not get a resource.
                if (maxWaitTime > 0) {
                    elapsedWaitTime = System.currentTimeMillis() - startTime;
                    if (elapsedWaitTime < maxWaitTime) {
                        // time has not expired, determine remaining wait time.
                        remainingWaitTime = maxWaitTime - elapsedWaitTime;
                    } else {
                        if (!blocked) {
                            // wait time has expired
                            if (poolLifeCycleListener != null) {
                                poolLifeCycleListener.connectionTimedOut();
                            }
                            String msg = localStrings.getStringWithDefault(
                                    "poolmgr.no.available.resource",
                                    "No available resource. Wait-time expired.");
                            ** throw new PoolingException(msg); **3
                        }
                    }
                }
@caim2015
Copy link

caim2015 commented Jul 2, 2021

Hi,can anyone give some thoughts on this issue?

@zhanglx1987
Copy link

zhanglx1987 commented Aug 4, 2021

The jdbc connection pool still has space to create new connections, but the request failed because of this problem(failed to get connection).
Do somebody have any ideas? Suggestions and comments would be greatly appreciated.

@github-actions
Copy link

github-actions bot commented Aug 5, 2022

This issue has been marked as inactive and old and will be closed in 7 days if there is no further activity. If you want the issue to remain open please add a comment

@github-actions github-actions bot added the Stale label Aug 5, 2022
@dmatej dmatej removed the Stale label Aug 5, 2022
@github-actions
Copy link

github-actions bot commented Aug 6, 2023

This issue has been marked as inactive and old and will be closed in 7 days if there is no further activity. If you want the issue to remain open please add a comment

@github-actions github-actions bot added the Stale label Aug 6, 2023
@dmatej dmatej removed the Stale label Aug 6, 2023
escay added a commit to escay/glassfish that referenced this issue Mar 26, 2024
…ty fix by using

a lock between two methods to ensure they are not called at the same
time. Also shows in a unit test the problem of not calling
notifyWaitingThreads when connection problems occurred.
@escay
Copy link
Contributor

escay commented Mar 26, 2024

I have written a unit test to try and reproduce this behavior.

About reported item 1 and 2
In my test the exact failure does not happen in **1 versus **2, but the lines right before it:

  • Method resizePoolAndGetNewResource is called
  • Value numOfConnsToCreate is calculated, the numOfConnsToCreate is for example 1.
  • another thread returned a resource to the pool via method "freeUnenlistedResource"
  • numOfConnsToCreate is 1 and a new resource is added to the pool, while the pool already has a resource free to be used again.
  • Pool size is increased with 1 and becomes larger than the maximum allowed pool size: error, could lead to real database pool problems where not enough resources are available. Pool increase is often at least 2 bigger due to the default "pool-resize-quantity" value.

See attached test case:
com.sun.enterprise.resource.pool.ConnectionPoolTest.basicConnectionPoolMultiThreadedTest()
line 272 randomly fails if ConnectionPool code is not altered. I can reproduce this about 70% of the time with a maximum pool size of 5, a bit less when max pool size is 30, and quite often when maximum pool size is set to 1.
Code:
assertTrue(resources.size() <= maxConnectionPoolSize, "failed, size=" + resources.size());

Line 272 does not fail when I make sure that freeUnenlistedResource and getResourceFromPool are never called at the same moment. See hack in line 1106 of the attached altered ConnectionPool class and the console output. Undo the lock and the console output shows the random behaviour of adding resources to the pool, while threads should wait for one to become available in the pool because pool is maxed out.
Code:

    // TODO: Replace this dirty fix, this fix shows the relation between 'freeResource' calls and 'getResourceFromPool' calls.
    // If these two are called at the same time, the connection pool can grow larger as the maximum connection pool size.
    private Object freeResourceObject = new Object();

Note: I also tested with the old /connectors-runtime/src/main/java/com/sun/enterprise/resource/pool/datastructure/ListDataStructure.java instead of the RWLockDataStructure. Both show the same behaviour.

Example log of succesful test with max pool size 5 and resize with 1:

resizePoolAndGetNewResource dataStructureSize: 0, numOfConnsToCreate: 1, reason: 2
resizePoolAndGetNewResource dataStructureSize: 1, numOfConnsToCreate: 1, reason: 2
resizePoolAndGetNewResource dataStructureSize: 2, numOfConnsToCreate: 1, reason: 2
resizePoolAndGetNewResource dataStructureSize: 3, numOfConnsToCreate: 1, reason: 2
resizePoolAndGetNewResource dataStructureSize: 4, numOfConnsToCreate: 1, reason: 2
Adding resource: 2
Adding resource: 1
Adding resource: 5
Adding resource: 3
Adding resource: 4
Reusing resource: 5
Reusing resource: 1
Reusing resource: 2
Reusing resource: 3
Reusing resource: 4
Reusing resource: 5
Reusing resource: 1
etc.

Failing output (without a fix / using current ConnectionPool code):

resizePoolAndGetNewResource dataStructureSize: 0, numOfConnsToCreate: 1, reason: 2
resizePoolAndGetNewResource dataStructureSize: 1, numOfConnsToCreate: 1, reason: 2
resizePoolAndGetNewResource dataStructureSize: 2, numOfConnsToCreate: 1, reason: 2
resizePoolAndGetNewResource dataStructureSize: 3, numOfConnsToCreate: 1, reason: 2
resizePoolAndGetNewResource dataStructureSize: 4, numOfConnsToCreate: 1, reason: 2
Adding resource: 2
Adding resource: 1
Adding resource: 5
Adding resource: 3
Adding resource: 4
Reusing resource: 5
Reusing resource: 1
Reusing resource: 2
Reusing resource: 3
Reusing resource: 4
Reusing resource: 5
Adding resource: 6
Reusing resource: 5
resizePoolAndGetNewResource dataStructureSize: 4, numOfConnsToCreate: 1, reason: 2
Error -> pool should not have been resized. Resource 6 should not have been added.

About reported item 3
Another thing I found in case of failing connections is: connections are freed when calling resourceErrorOccurred, BUT the waiting threads are not informed that a resource is removed from the pool. So these connections time out.
See attached test case:
com.sun.enterprise.resource.pool.ConnectionPoolTest.resourceErrorOccurredMultiThreadedTest()
which now has to manually call connectionPool.notifyWaitingThreads();
This does lead to item **3 as reported in the original issue.

Branch with unit tests:
master...escay:glassfish:issue_23483_unittest

escay added a commit to escay/glassfish that referenced this issue Mar 27, 2024
…better max pool

size logic.

Add lock, add notifyWaitingThreads call for better max pool size logic.
No longer getResources while the pool is being resized. And inform
waiting threads when resources are marked as failed.
escay added a commit to escay/glassfish that referenced this issue Mar 28, 2024
…better max pool

size logic. Fix incorrect javadoc.
arjantijms added a commit that referenced this issue Mar 30, 2024
Fixes #23483 Add lock, add notifyWaitingThreads call for better max pool size logic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants