Csharp Code Samples

Rohan’s Blog – C# .NET Programming
  • rss
  • Home
  • About
  • Search
  • Contact Me

File System Watcher And Large File Volumes

Rohan Warang | February 23, 2009

File system watcher listens to the file system change notifications and raises events when a directory, or file in a directory, changes. A very useful tool since it notifies exactly at the moment when a file or directory is created, changed or deleted, without having to do polling on that directory. It can watch sub-directories too.

If that wasn’t enough it also provides filters. For instance u just want notifications about text files then you can provide a filter “*.txt”, now event will be raised only for text files.

The Problem

However there is a downside. The file system watcher is not entirely reliable for working with large volumes of files. The reason for this is that there is a fixed buffer allocated to each file system watcher which is used to store the details such as file location for each file that raises an event. However when a large number of files raise an event then this buffer gets full.

Default size of the buffer is 4 KB, so obviously the immediate solution would be to increase this size. But again there is a downside the buffer is located in a portion of memory which cannot be swapped, in other words the memory will be allocated by default whether it is use or not. So it not advisable to not allocate large memory sizes.

The Solution

If you Google for the solution to this problem you will find a large number of articles. In this article however we take an entirely different approach.

Now the buffer discussed above holds the information for each event until the event is handled. The idea is to handle the event as quickly as possible so that the buffer is cleared.

But that’s easier said than done. To do some complex processing on the file would take some time, some CPU cycles, and the buffer will have to hold the data till the processing is over.

So instead of we do not write the proceeding logic in the handler instead the handler simply queues the file in a work queue. This queue is processed by an entirely independent thread. So the actual work done by the handler is just queuing and the control moves out of the handler thus releasing the buffer.


Declaring the Watcher

File system watcher is a control available for winforms and windows services. So a new file system watcher can be dragged from the toolbox and used by configuring the control. I have defined file system watcher for a console application and declared a file created event handler for the same.

Show Code

class Program
{
    static string folder = @"C:\TestFolder";
    static FileProcessor processor;

    static void Main(string[] args)
    {
        processor = new FileProcessor();
        InitializeWatcher();
        CreateTestFies();

        Console.WriteLine(String.Format("Watching {0} ...", folder));
        Console.ReadKey();
    }

    static FileSystemWatcher watcher;

    static void InitializeWatcher()
    {
        watcher = new FileSystemWatcher();
        watcher.Path = folder;

        watcher.Created += new FileSystemEventHandler(watcher_Created);
        watcher.EnableRaisingEvents = true;
        watcher.Filter = "*.txt";
    }

    static void CreateTestFies()
    {
        int count = 0;
        while (count < 10000)
        {
            count++;
            using (FileStream fs = new FileStream(String.Format(
                @"{0}\{1}.txt", folder, count), FileMode.Create)) { }
        }
    }

    static void watcher_Created(object sender, FileSystemEventArgs e)
    {
        // Original Operation
        // File.Encrypt(e.FullPath);

        processor.QueueInput(e.FullPath);
    }
}

The event handler is supposed to do some operation on the file. I chose a Simple file encryption operation using File.Encrypt() which takes considerable amount of time and resources. If we were to perform this operation in the handler itself it will work fine for few files but as the volumes increase we will start encountering all sorts of unwanted problems (try it out). Instead the file is added to a FileProcessor queue.

CreateTestFiles() method simply generates 10000 empty files. The file system buffer does not actually store the content of the file so the test files need not contain any data.

The File Processor Class

FileProcessor is the class which maintains a queue of files to be processed and controls a worker thread which processes each file in the queue.

We need to ensure that only one file instance of this class is created, the reason being that there should be a single queue of files to be processed. In this example we create an object in the main method and use that to queue the files. In cases where an instance of this class will be accessed from multiple locations using a singleton implementation of this class is advisable.

Multiple threads can be defined or a thread pool can be used if it optimizes your performance. However since that is not the scope of this article to keep the example simple a single thread is defined for processing the files.

When the QueueFile() method is called for the first time it queues the file , initializes the thread to call Work() method and starts the thread. In the work method the first file is retrieved from the queue and processed. In case all files have been processed then the thread is sent into wait mode.

On each subsequent call to the QueueFile() method the thread state is checked if the thread is active then file is queued and control exits the block. If the thread is in wait mode then it is set again to start processing the new file.

In windows an encrypted file is shown in green color (using default folder settings). This can help in verifying all 10000 files are encrypted.

Show Code

class FileProcessor
{
    private Queue<string> workQueue;
    private Thread workerThread;
    private EventWaitHandle waitHandle;

    public FileProcessor()
    {
        workQueue = new Queue<string>();
        waitHandle = new AutoResetEvent(true);
    }

    public void QueueInput(string filepath)
    {
        workQueue.Enqueue(filepath);

        // Initialize and start thread when first file is added
        if (workerThread == null)
        {
            workerThread = new Thread(new ThreadStart(Work));
            workerThread.Start();
        }

        // If thread is waiting then start it
        else if (workerThread.ThreadState == ThreadState.WaitSleepJoin)
        {
            waitHandle.Set();
        }
    }

    private void Work()
    {
        while (true)
        {
            string filepath = RetrieveFile();

            if (filepath != null)
                ProcessFile(filepath);
            else
                // If no files left to process then wait
                waitHandle.WaitOne();
        }
    }

    private string RetrieveFile()
    {
        if (workQueue.Count > 0)
            return workQueue.Dequeue();
        else
            return null;
    }

    private void ProcessFile(string filepath)
    {
        // Some processing done on the file
        File.Encrypt(filepath);
    }
}

Disclaimer

This is not a fool proof method, but it does considerably increase the limit of number of files that can be processed. I tested this method for 10000 files which is a huge number for a 4KB buffer and it works fine. This result is entirely hardware dependent. However 10000 is not at all the limit.

Download Sample
FileSystemWatcherSample.zip

Categories
Optimization
Tags
File System Watcher, Threading, Volume
Comments rss
Comments rss
Trackback
Trackback

« Data Transfer Using Self Hosted WCF Service Threading On Multi-Core CPUs »

8 Responses to “File System Watcher And Large File Volumes”

  1. How to Upload Nvu Files With Cyberduck | NotWeb Network says:
    March 30, 2009 at 9:05 am

    [...] File System Watcher And Large File Volumes | C# Code Samples [...]

  2. Ahmadreza says:
    May 21, 2009 at 12:59 pm

    Thanks,
    It’s a great idea to postpone time consuming processes to be done later in another thread. So FileSystemWatcher does not face any problem because of buffer limitation.

  3. Kalyan says:
    July 23, 2009 at 3:47 pm

    Thanks for the post.
    Very helpful and usefull

    cheers

  4. Jos says:
    August 17, 2009 at 6:52 pm

    Thanx!!!!

    This is very usefull for me. I search a lot of site and was looking for Qeueu/Deque FIFO etc….

    Then i found this post and it was the answer i was looking for.

    I used wanted to to some filehandling on files dropped in a “HotFolder” and now i can ;-)

  5. dv says:
    September 29, 2009 at 8:35 pm

    Thanks for the great explanation and sample code too!
    I would like to build on this and remove duplicate file names, as I want to copy each file name from source to a dest folder. ( I don’t care about deletions).

    What is the best way to remove duplicate file names or events? Examine and modify the queue or do it before in the queue? Would love to see a code example.
    thanks.

  6. vwemil says:
    October 22, 2009 at 9:33 pm

    Great post , saved me a lot of time. I ran into some problems which I solved by modifying your code.
    For really long running processes and files quickly dropped into folder there is a possibility to go in wait when some file are still in the queue

    if (filepath != null)
    ProcessFile(filepath);
    else if ((workQueue.Count ==0)
    // If no files left to process then wait
    waitHandle.WaitOne();
    Thanks again

  7. dv says:
    November 17, 2009 at 2:32 am

    Hi there, I used this code and it works great.
    I modified it so instead of passing a string it passes a custom class to the queue.

    I used this code in a windows service. How do I / Should I unload the worker queue / threads in the stop event and what should I include in the start event then? thanks.

  8. dv says:
    November 19, 2009 at 10:31 pm

    This is the code I had to use to get the worker thread to stop automatically when the main service quit/stopped.

    I had to make an adjustment to the QueueInput()
    I am not 100% sure about the or statement but it seems to work…

    public void QueueInput(HotSynchUnit.RcdFSWFile rcd)
    {
    workQueue.Enqueue(rcd);

    // Initialize and start thread when first file is added
    if (workerThread == null)
    {
    workerThread = new Thread(new ThreadStart(Work));
    workerThread.IsBackground = true; // IF IsBackground = True THEN the thread will auto terminate when the process does but a manual call is better.
    workerThread.Start();
    }
    else if (workerThread.ThreadState == (ThreadState.WaitSleepJoin | ThreadState.Background)) // If thread is waiting then start it
    {
    waitHandle.Set();
    }
    }

Leave a Reply

Click here to cancel reply.

Spam protection by WP Captcha-Free

Subscribe

dZone

Categories

  • Optimization
  • Tutorials

Admin

  • Log in
Creative Commons License rss Comments rss valid xhtml 1.1 design by jide powered by Wordpress get firefox