Recently, I have come across an issue in 11gR2 RAC, where the GI file system GRID_HOME was mostly consumed by a single file called crfclust.bdb
This single file itself was about 30 GB of size.
[oracle@my-lab01 ~]$ du -h /app/grid/126.96.36.199/crf/db/my-lab01/crfclust.bdb
[oracle@my-lab01 ~]$ df -h /app/grid
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 50G 44G 5.1G 89% /app/grid
So, what is this file about?
crfclust.bdb is a Cluster Health Monitor (CHM) file, which collects the stats of Cluster as well as the OS statistics by means of the Cluster Health Monitor Service ora.crf.
This file has a default size of 1 GB. However, it may grow beyond the default size if a high retention period is defined. If the file is growing beyond the default size even with a low retention period, it is most likely a bug and one such bug is Bug 10165314.
To permanently fix the issue and to prevent the CHM file to grow beyond the default size, we can apply the patch 10165314.
For an immediate workaround, we can delete the CHM file (crfclust.bdb) as follows:
Step 1. Stop the Cluster Health Monitor resource ora.crf as grid owner
[root@my-lab01 ~]$/app/grid/188.8.131.52/bin/crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'my-lab01'
CRS-2677: Stop of 'ora.crf' on 'my-lab01' succeeded
Step 2. Remove the huge CHM files
The file can only be removed by the root user as it is owned by root.
[root@my-lab01 ~]$cd /app/grid/184.108.40.206/crf/db/my-lab01
[root@my-lab01 my-lab01]$rm crfclust.bdb
Step 3. Start the Cluster Health Monitor resource ora.crf
[root@my-lab01 ~]$/app/grid/220.127.116.11/bin/crsctl start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'my-lab01'
CRS-2676: Start of 'ora.crf' on 'my-lab01' succeeded