LATEST TOPICS

Sometimes things don’t work as expected in Oracle

In today’s post, I am going to discuss an issue that I had faced recently. One of my application team sent me an e-mail stating that they are not able to connect to the database and are receiving following error intermittently.

tns_err

Application team provided the information that, they are using the following connection string.

##----
##---- DB connection string used by App team 
##----
(DESCRIPTION =
	(ADDRESS = (PROTOCOL = TCP)(HOST =rac01.oraclebuffer.com)(PORT = 1521))
	(ADDRESS = (PROTOCOL = TCP)(HOST =rac02.oraclebuffer.com)(PORT = 1521))
		(CONNECT_DATA =
			(SERVER = DEDICATED)
			(SERVICE_NAME = prod_app.oraclebuffer.com)
		)
)

As a first step, I immediately checked the status of the service and could see it was up running fine on both the cluster instances.

##----
##---- Service status from clusterware
##----
oracle@rac01:/home/oracle: srvctl status service -d orpproda
Service prod_app is running on instance(s) orpproda1, orpproda2

I could also see, the service is registered with the listener and in READY status on both the cluster nodes.

##----
##---- Service registration status from listener (node 1)
##----
orpproda1@rac01:/home/oracle: lsnrctl status

LSNRCTL for Linux: Version 11.1.0.7.0 - Production on 10-DEC-2015 03:24:13

Copyright (c) 1991, 2008, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.1.0.7.0 - Production
Start Date                16-OCT-2015 22:10:53
Uptime                    54 days 6 hr. 13 min. 20 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /app/oracle/product/11.1.0/db/network/admin/listener.ora
Listener Log File         /oraadmin/diag/tnslsnr/rac01/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.171.10.56)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.171.10.55)(PORT=1521)))
Services Summary...
Service "prod_app.oraclebuffer.com" has 2 instance(s).
  Instance "orpproda1", status READY, has 2 handler(s) for this service...
  Instance "orpproda2", status READY, has 1 handler(s) for this service...
The command completed successfully

##----
##---- Service registration status from listener (node 2)
##----
oracle@rac02:~> lsnrctl status

LSNRCTL for Linux: Version 11.1.0.7.0 - Production on 10-DEC-2015 05:37:01

Copyright (c) 1991, 2008, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.1.0.7.0 - Production
Start Date                03-NOV-2015 12:53:45
Uptime                    36 days 16 hr. 43 min. 15 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /app/oracle/product/11.1.0/db/network/admin/listener.ora
Listener Log File         /oraadmin/diag/tnslsnr/rac02/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.171.10.58)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.171.10.57)(PORT=1521)))
Services Summary...
Service "prod_app.oraclebuffer.com" has 2 instance(s).
  Instance "orpproda1", status READY, has 1 handler(s) for this service...
  Instance "orpproda2", status READY, has 2 handler(s) for this service...
The command completed successfully

I was not seeing any issue with the service status. As a next step, I tried to look into the listener log file and could see (from first node) the connections are indeed failing with the TNS errors.

##----
##---- Connection errors from listener log
##----
09-DEC-2015 10:50:16 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=prod_app.oraclebuffer.com)(CID=(PROGRAM=C:\oracle\product\11.2.0\client_1\sqldeveloper\sqldeveloper\bin\sqldeveloperW.exe)(HOST=DEV01-PTL-D)(USER=DEV01))(INSTANCE_NAME=orpproda1)) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.201.35.106)(PORT=54124)) * establish * prod_app.oraclebuffer.com * 12520
TNS-12520: TNS:listener could not find available handler for requested type of server

09-DEC-2015 11:27:03 * (CONNECT_DATA=(SID=orpproda1)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.171.10.56)(PORT=31280)) * establish * orpproda1 * 12519
TNS-12519: TNS:no appropriate service handler found

At that point, I had decided to check the service state from listener and following is what I had found from there

##----
##---- Service state from listener (node 1)
##----
orpproda1@rac01:/home/oracle: lsnrctl services |grep -A 4 prod_app
Service "prod_app.oraclebuffer.com" has 2 instance(s).
  Instance "orpproda1", status READY, has 2 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:915 refused:0 state:blocked ##---> preventing incoming connections
         REMOTE SERVER
		 
##----
##---- Service state from listener (node 2)
##----		 
oracle@rac02:~> lsnrctl services |grep -A 4 prod_app
Service "prod_app.oraclebuffer.com" has 2 instance(s).
  Instance "orpproda1", status READY, has 1 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:59 refused:0 state:ready
         REMOTE SERVER
 

As we can see from the above output, the service is showing in blocked state for first instance (orpproda1). This means even though the service is up and running and registered with listener, listener will block any incoming connection for this service as it’s in blocked state.

Now, the question comes, why the service is showing in blocked state? Well, the most possible reason could be due to database resource limitations like the number of processes are exhausted and the database is not in a state to accept any new connection. When this type of resource limitations arise, listener blocks the service to prevent any incoming connection to go the underlying database.

There are few MOS articles available which explains these behaviour too. MOS Doc ID 240710.1 is one of those articles which talks about this behaviour.

Considering this fact, I had immediately checked the database resource utilization and could find the status as follows

---//
---// resource limit per instance
---//
SQL> select inst_id,name,value from gv$parameter where name='processes';

   INST_ID NAME                 VALUE
---------- -------------------- ----------
         1 processes            1000
         2 processes            1000

---//
---// resource utilization per instance
---//
SQL>  select inst_id,count(*) from gv$session group by inst_id;

   INST_ID   COUNT(*)
---------- ----------
         1         77
         2         87

As we can see from the output, our database resource utilization is very low and we shouldn’t have received that TNS error. I was not able to determine the root cause and due to which was not able to fix the issue.

At this point I was sure, the problem in hand is not related to service status nor related to the database resource utilization as the service is running and registered with listener. Further, we have adequate resource available to accept any new database connection.

I was suspecting, the listener may not be aware of the current state of the service (database resource utilization) and that is why it is blocking any connections coming for that service. I decided to check from listener log about the last time when the service status was updated by PMON.

When, I looked out for that information. I could see the service status was not updated for the first cluster instance since a long period.

##---
##--- service_update count from listener log (for instance 1)
##---
oracle@rac01:/oraadmin/diag/tnslsnr/rac01/listener/trace: grep -i service_update listener.log | grep orpproda1  | grep 09-DEC-2015 | wc -l
0  ##-->> (PMON is not performing service update)

##---
##--- service_update count from listener log (for instance 2)
##---
oracle@rac01:/oraadmin/diag/tnslsnr/rac01/listener/trace: grep -i service_update listener.log | grep orpproda2 | grep 09-DEC-2015 | wc -l
1696		 

At this point, I was almost sure that this is an issue related to dynamic registration of the service by PMON. Looks like PMON is not at all performing dynamic service registration (update) for the first instance (orpproda1).

It is possible that the database processes were exhausted at an earlier time (as indicated by the MOS note) which caused the listener to block incoming connections. However, once the resource utilization came back to normal, listener should have unblocked the service based on the current utilization report submitted by PMON; which did not happen in this case.

I just went ahead and performed that service registration on behalf of PMON (looks like he is busy with some other work, or might have forgot about the existence of listener)

---//
---// manual service registration
---//
SQL> alter system register;

System altered.

---//
---// service state from listener 
---//
SQL> !lsnrctl services |grep -A 4 prod_app
Service "prod_app.oraclebuffer.com" has 2 instance(s).
  Instance "orpproda1", status READY, has 2 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:915 refused:0 state:ready ---> changed from blocked to ready
         REMOTE SERVER

Our problem is resolved. The moment, I registered the service, the service state changed from blocked to ready. I could also observe, PMON has recovered its memory (may be was in coma) and doing periodic dynamic service updates.

##---
##--- service_update count from listener log (for instance 1)
##---
oracle@rac01:/oraadmin/diag/tnslsnr/rac01/listener/trace: grep -i service_update listener.log | grep orpproda1  | grep 10-DEC-2015 | wc -l
89 ##-->> (PMON is now performing periodic service update)

##---
##--- service_update count from listener log (for instance 2)
##---
oracle@rac01:/oraadmin/diag/tnslsnr/rac01/listener/trace: grep -i service_update listener.log | grep orpproda2 | grep 10-DEC-2015 | wc -l
120

Although this issue is resolved, the root cause of this issue was related to excessive connections being made to the database. This was identified by reviewing the listener logs. Following is a hourly report providing the count of connections established to the database

tns_err_conn

By looking at this report, it is evident that the current setting for the processes parameter (1000 processes) is not adequate to handle the incoming database traffic. It seems that on many occasions the number of connections reached to the maximum allowed limit and caused the listener to block the service. However, what happened here is; although the connections were released and process count came down far below the allowed threshold, PMON did not perform the service_update and resulted in a permanent service blockage.

To address the root cause we need to increase the value of processes parameter as suggested by MOS notes, which would make room for new incoming connections and prevent blocking of services by listener.

I was interested in knowing as to why PMON was not performing the periodic service update. However, unfortunately I did not have adequate trace/logs handy to dig in to that route any further.

Hope, you find this post useful in resolving similar connection issue !

2 Comments
  1. vinit
%d bloggers like this:
Visit Us On LinkedinVisit Us On TwitterVisit Us On Google PlusCheck Our Feed