File System Watcher And Large File Volumes
File system watcher listens to the file system change notifications and raises events when a directory, or file in a directory, changes. A very useful tool since it notifies exactly at the moment when a file or directory is created, changed or deleted, without having to do polling on that directory. It can watch sub-directories too.
If that wasn’t enough it also provides filters. For instance u just want notifications about text files then you can provide a filter “*.txt”, now event will be raised only for text files.
The Problem
However there is a downside. The file system watcher is not entirely reliable for working with large volumes of files. The reason for this is that there is a fixed buffer allocated to each file system watcher which is used to store the details such as file location for each file that raises an event. However when a large number of files raise an event then this buffer gets full.
Default size of the buffer is 4 KB, so obviously the immediate solution would be to increase this size. But again there is a downside the buffer is located in a portion of memory which cannot be swapped, in other words the memory will be allocated by default whether it is use or not. So it not advisable to not allocate large memory sizes.
The Solution
If you Google for the solution to this problem you will find a large number of articles. In this article however we take an entirely different approach.
Now the buffer discussed above holds the information for each event until the event is handled. The idea is to handle the event as quickly as possible so that the buffer is cleared.
But that’s easier said than done. To do some complex processing on the file would take some time, some CPU cycles, and the buffer will have to hold the data till the processing is over.
So instead of we do not write the proceeding logic in the handler instead the handler simply queues the file in a work queue. This queue is processed by an entirely independent thread. So the actual work done by the handler is just queuing and the control moves out of the handler thus releasing the buffer.
Declaring the Watcher
File system watcher is a control available for winforms and windows services. So a new file system watcher can be dragged from the toolbox and used by configuring the control. I have defined file system watcher for a console application and declared a file created event handler for the same.
The event handler is supposed to do some operation on the file. I chose a Simple file encryption operation using File.Encrypt() which takes considerable amount of time and resources. If we were to perform this operation in the handler itself it will work fine for few files but as the volumes increase we will start encountering all sorts of unwanted problems (try it out). Instead the file is added to a FileProcessor queue.
CreateTestFiles() method simply generates 10000 empty files. The file system buffer does not actually store the content of the file so the test files need not contain any data.
The File Processor Class
FileProcessor is the class which maintains a queue of files to be processed and controls a worker thread which processes each file in the queue.
We need to ensure that only one file instance of this class is created, the reason being that there should be a single queue of files to be processed. In this example we create an object in the main method and use that to queue the files. In cases where an instance of this class will be accessed from multiple locations using a singleton implementation of this class is advisable.
Multiple threads can be defined or a thread pool can be used if it optimizes your performance. However since that is not the scope of this article to keep the example simple a single thread is defined for processing the files.
When the QueueFile() method is called for the first time it queues the file , initializes the thread to call Work() method and starts the thread. In the work method the first file is retrieved from the queue and processed. In case all files have been processed then the thread is sent into wait mode.
On each subsequent call to the QueueFile() method the thread state is checked if the thread is active then file is queued and control exits the block. If the thread is in wait mode then it is set again to start processing the new file.
In windows an encrypted file is shown in green color (using default folder settings). This can help in verifying all 10000 files are encrypted.
Disclaimer
This is not a fool proof method, but it does considerably increase the limit of number of files that can be processed. I tested this method for 10000 files which is a huge number for a 4KB buffer and it works fine. This result is entirely hardware dependent. However 10000 is not at all the limit.
Download Sample
FileSystemWatcherSample.zip
Thanks,
It’s a great idea to postpone time consuming processes to be done later in another thread. So FileSystemWatcher does not face any problem because of buffer limitation.
Thanks for the post.
Very helpful and usefull
cheers
Thanx!!!!
This is very usefull for me. I search a lot of site and was looking for Qeueu/Deque FIFO etc….
Then i found this post and it was the answer i was looking for.
I used wanted to to some filehandling on files dropped in a “HotFolder” and now i can
Thanks for the great explanation and sample code too!
I would like to build on this and remove duplicate file names, as I want to copy each file name from source to a dest folder. ( I don’t care about deletions).
What is the best way to remove duplicate file names or events? Examine and modify the queue or do it before in the queue? Would love to see a code example.
thanks.
Great post , saved me a lot of time. I ran into some problems which I solved by modifying your code.
For really long running processes and files quickly dropped into folder there is a possibility to go in wait when some file are still in the queue
if (filepath != null)
ProcessFile(filepath);
else if ((workQueue.Count ==0)
// If no files left to process then wait
waitHandle.WaitOne();
Thanks again
Hi there, I used this code and it works great.
I modified it so instead of passing a string it passes a custom class to the queue.
I used this code in a windows service. How do I / Should I unload the worker queue / threads in the stop event and what should I include in the start event then? thanks.
This is the code I had to use to get the worker thread to stop automatically when the main service quit/stopped.
I had to make an adjustment to the QueueInput()
I am not 100% sure about the or statement but it seems to work…
public void QueueInput(HotSynchUnit.RcdFSWFile rcd)
{
workQueue.Enqueue(rcd);
// Initialize and start thread when first file is added
if (workerThread == null)
{
workerThread = new Thread(new ThreadStart(Work));
workerThread.IsBackground = true; // IF IsBackground = True THEN the thread will auto terminate when the process does but a manual call is better.
workerThread.Start();
}
else if (workerThread.ThreadState == (ThreadState.WaitSleepJoin | ThreadState.Background)) // If thread is waiting then start it
{
waitHandle.Set();
}
}
tried using this with large file (~50MB)
DId not work!
when I reached the method to encrypt the file it threw an exception saying the file was still being written.
how can I handle this error?
Hi chuck,
I think you misunderstood the term large volume. What I am emphasizing in this post is how you can handle more than one file when they come in a bulk, not one file which is large in size.
Anyways to clarify the problem you are facing, the event is invoked as the file is created in the filesystem. However since your file is large it is taking time to transfer. and the encrypt method is invoked even before the file is completely transfered. You can try reading the filestream when it is completely transfered, by checking if it still being locked by any other process.
Hi Rohan,
I have a folder with about 2,000 files in it. These files range from about 50 KB to 150 KB. So they aren’t very big in size
I move all the files into the folder being watched. Even with these small files I get the locked IOException from VS. Am I doing something wrong??
ah! now I see! thanks!
What do you think is the best way to handle a situation when I have a folder that already has files in it, so they are n’t being triggered by the ‘filesystemwatcher’.
Ex. I have a folder with 10 files and then I start the application running in that folder, there are 10 files but they have already been created.
Large files…
well for usage with large files, I used timers. I did not use you r code instead i used pyinotify in Ubuntu. What i did was, for each file that was put in to the queue, i started a timer. ( 10 files – 10 timers). and everytime an operation is done on the same file ( copying a large file invokes lots of file operations on the same file) I reset the timer. So after the timer expires ( for this there is no operation on the file for atleast 10 secs – which is the timer value) the file is added to another queue which is processed.
I guess you guys can understand what I am saying..
Any reply s are favoured..
Hi,
I looking for the best way to kill the app if I want to run this code as a service. Right now, I’m stopping the watcher from putting more files into the queue then I wait until all the files have been processed. Do I need to kill the worker thread if the queue is empty?
I would like to thank you for the idea – it helped me a lot in my project!
A very cooooool article and to understand FileSystemWatcher much better. I have already implemented in a similar way. My service works fine but some times i get Object referenced not found. Once i get this error service do not works fine. After restarting the service, the again service works fine. Please suggest!!!1
Thanks in Advance.
Mahendra