"Tim Bradshaw" <tfb+google@tfeb.org> wrote in message news:38e1494d-81be-4e88-bb2a-5201875522a6@j20g2000hsi.googlegroups.com...
On Dec 21, 7:15 pm, Chimpanzee <govi...@gmail.com> wrote:
James,
If this a design,is there a workaround to it.There is a log rotation
process happening on /home, that keeps the last 14 days logs(which are
roughly about 25M size) and removes the remaining
Typically things which rotate logs need to cooperate with the
application which is writing the logs: either it needs to notice that
the logs have been rotated and close & reopen the logs, or there needs
to be some way of the log rotator notifying the application. This is
often done by sending it a signal, and log rotators can generally do
this & other things. See logadm(1M).
If you don't do this you run two risks: one is this mysterious space
usage from deleted files which are still open, but a more serious risk
is that the logs getting written to these deleted files are
inaccessible, which means you may lose critical information.
As a workaround, the OP can try copy-and-truncate log rotation (cp logfile logfile.whatever ; > logfile) but should be aware there is an obvious risk of missing log entries between the copy and the truncation operations. Iirc logadm can do this too.
However, there may be something else going on. The OP states there is already a log rotation process. If this was designed in association with the programmers, then either there is a bug that means it does not work properly, or there is some other process holding the file open (monitoring perhaps, or one of the programmers tail-ing the log).
James Carlson 3 January 2008 21:35:59 [ permanent link ]
Chimpanzee <govindo@gmail.com> writes:
If this a design,is there a workaround to it.There is a log rotation
process happening on /home, that keeps the last 14 days logs(which are
roughly about 25M size) and removes the remaining
The "work-around" is proper design in the application. If log files are being written and rotated, it's pretty common for the application to have some sort of close-and-reopen mechanism. It's usually hooked to SIGHUP, but I've seen other types. If it has no such mechanism, then you'll probably need to talk to the application author.
The way such a mechanism is used is that the log file is moved aside, a new one is created (if necessary; syslogd needs this), and then "pkill -HUP" is run on the application. At this point, the rotated log file is closed and can be compressed, archived, deleted, or whatever you need to do.
Is restarting the application only option?
That's a crude but effective option ... you'll need to read the application documentation or get in touch with the application author to figure out better answers.
-- James Carlson, Solaris Networking <james.d.carlson@sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
Thanks for your responses,however I wasn`t sure if it was caused by log rotation.Therefore I did some search on internet and found a pattern similar to my problem.
a. Finding an Unlinked Open File =================================
A pesky variant of a file that is filling a file system is an unlinked file to which some process is still writing. When a process opens a file and then unlinks it, the file's resources remain in use by the process, but the file's directory entries are removed. Hence, even when you know the directory where the file once resided, you can't detect it with ls.
This can be an administrative problem when the unlinked file is large, and the process that holds it open continues to write to it. Only when the process closes the file will its resources, particularly disk space, be released.
Lsof can help you find unlinked files on local disks. It has an option, +L, that will list the link counts of open files. That helps because an unlinked file on a local disk has a zero link count. Note: this is NOT true for NFS files, accessed from a remote server.
You could use the option to list all files and look for a zero link count in the NLINK column -- e.g.,
$lsof +L COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME ... less 25366 abe txt VREG 6,0 40960 1 76319 /usr/... ...
less 25366 abe 3r VREG 6,0 17360 0 98768 / (/dev/
sd0a)
Better yet, you can specify an upper bound to the +L option, and lsof will select only files that have a link count less than the upper bound. For example:
$ lsof +L1 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME less 25366 abe 3r VREG 6,0 17360 0 98768 / (/dev/ sd0a)
You can use lsof's -a (AND) option to narrow the link count search to a particular file system. For example, to look for zero link counts on the /home file system, use:
$ lsof -a +L1 /home
I ran the command and had large number of files being reported
James Carlson 7 January 2008 16:29:04 [ permanent link ]
Chimpanzee <govindo@gmail.com> writes:
Thanks for your responses,however I wasn`t sure if it was caused by
log rotation.Therefore I did some search on internet and found a
pattern similar to my problem.
a. Finding an Unlinked Open File
[...]
Any thoughts?
That's exactly the sort of symptoms you described before, and the scenario that several of us have suggested as the probable cause of the problem you've seen.
-- James Carlson, Solaris Networking <james.d.carlson@sun.com> Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677