The timeout feature is added to "freeze ioctl" to solve a deadlock
when the freezer accesses a frozen filesystem. And new ioctl
to reset the timeout period is added to extend the timeout period.
For example, the freezer resets the timeout period to 10 seconds every 5
seconds. In this approach, even if the freezer causes a deadlock by
accessing the frozen filesystem, it will be solved by the timeout
in 10 seconds and the freezer will be able to recognize that
at the next reset of timeout period.
o Freeze the filesystem
int ioctl(int fd, int FIFREEZE, long *timeout_sec)
fd: The file descriptor of the mountpoint
FIFREEZE: request code for the freeze
timeout_sec: the timeout period in seconds
If it's 0 or 1, the timeout isn't set.
This special case of "1" is implemented to keep
the compatibility with XFS applications.
Return value: 0 if the operation succeeds. Otherwise, -1
o Reset the timeout period
int ioctl(int fd, int FIFREEZE_RESET_TIMEOUT, long *timeout_sec)
fd:file descriptor of mountpoint
FIFREEZE_RESET_TIMEOUT: request code for reset of timeout period
timeout_sec: new timeout period in seconds
Return value: 0 if the operation succeeds. Otherwise, -1
Error number: If the filesystem has already been unfrozen,
errno is set to EINVAL.
Signed-off-by: Takashi Sato <t-sato@yk.jp.nec.com>
Signed-off-by: Masayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com>
---
drivers/md/dm.c | 2 -
fs/block_dev.c | 2 +
fs/buffer.c | 44 ++++++++++++++++++++++++---
fs/ioctl.c | 71 ++++++++++++++++++++++++++++++++++++++++++--
fs/super.c | 37 ++++++++++++++++++++++
fs/xfs/xfs_fsops.c | 2 -
include/linux/buffer_head.h | 4 +-
include/linux/fs.h | 8 ++++
8 files changed, 159 insertions(+), 11 deletions(-)
diff -uprN -X linux-2.6.27-rc5.org/Documentation/dontdiff ...And as with all previous posting I still fundamentally disagree about the need of this functionality. We don't need a timeout for freezing. --
I agree with Christoph here, I think that the timeout is unneeded. Regards, Ric --
Hi, I think that your concern is that the freezer cannot recognize the occurrence of a timeout and it continues the backup process and the backup data is corrupted finally. If the freezer can recognize it by the unfreeze ioctl's errono, will your concern be solved? If so, I will implement it. Cheers, Takashi --
I think that is certainly part a big part of my concern. Also note that the timeout seems to be quite low relative to say the standard timeout for a SCSI device (30 seconds worst case). In general, I am quite supportive of the patch series and think that this is a great addition. Thanks! Ric --
Hi Ric and Christoph, Thank you for your comments. Christoph, do you have any comments about this solution? If it's OK, I will change the freeze patch so that the unfreeze ioctl sets ETIMEDOUT to errno when the timeout occurs. Cheers, Takashi --
On Fri, 26 Sep 2008 17:52:35 +0900, Takashi Sato said: That would also address my concerns about merging patches 8 and 10 of the other patch series (because patch 10 wouldn't be needed then)...
What timeout should happen? the freeze ioctl must not return until the filesystem is a clean state and all writes are blocked. --
The suggestion was that *UN*freeze would return ETIMEDOUT if the filesystem had already unfrozen itself, I think. That way you know that the snapshot you just took is worthless, at least. I'm still not really sold on the timeout, but I think the above was the intent. -Eric --
But why would the filesystem every unfreeze itself? That defeats the whole point of freezing it. --
I agree. Was just trying to clarify the above point. But there have been what, 12 submissions now, with the unfreeze timeout in place so it's a persistent theme ;) Perhaps a demonstration of just how easy (or not easy) it is to deadlock a filesystem by freezing the root might be in order, at least. And even if it is relatively easy, I still maintain that it is the administrator's role to not inflict damage on the machine being administered. There are a lot of potentially dangerous tools at root's disposal; why this particular one needs a nanny I'm still not quite sure. -Eric --
Eric Sandeen wrote: > Christoph Hellwig wrote: >> But why would the filesystem every unfreeze itself? That defeats the >> whole point of freezing it. > > I agree. Was just trying to clarify the above point. > > But there have been what, 12 submissions now, with the unfreeze timeout > in place so it's a persistent theme ;) > > Perhaps a demonstration of just how easy (or not easy) it is to deadlock > a filesystem by freezing the root might be in order, at least. > > And even if it is relatively easy, I still maintain that it is the > administrator's role to not inflict damage on the machine being > administered. There are a lot of potentially dangerous tools at root's > disposal; why this particular one needs a nanny I'm still not quite sure. Since this patch hit fsdev, there have been an equal number of supporters and opponents of the timeout. I'm not opposed to the timeout on the API, but I don't think it is needed if we have a system configurable timeout (default is no timeout) that can be changed by an admin. My experience is that a timeout is not needed protect against a stupid admin or against software bugs. The justification for a timeout as far as I am concerned is so the admin can log in and reset hung hardware. If we think there is no chance of forcing the external device that went to sleep to respond so the system can continue to be used, then I don't think a timeout has any valid use. My timeout desire is based on some past SAN behavior and I'm OK if people argue those devices should just be fixed. But we argued the same thing and were ignored because bad device behavior did not stop people from buying them. jim --
Can you docuument what administrator must not do for freezing to be safe? What if so much dirty data accumulates on frozen filesystem that there's not enough memory for the unfreeze tool? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html --
Hi, I think we need the timeout for the case someone dirties so much data with mmap, hence the freeze process is swapped out and cannot unfreeze. Cheers, Takashi --
That is not supposed to happen. That's why write blocks early on a frozen filesystem (the shared mmap write path is currently missing the check, but that's a rather small patch) --
