File Name Case Sensitivity

Posted by Max at 14-01-2015

One of the annoyances when writing software for multiple platforms is the differences in case sensitivity of the files systems. If you’re not familiar with the term “case sensitive” it simply means that the capitalization in the file name is important. On Windows the capitalization is not important, but Linux and Mac are both case sensitive.

When we ported Natural Selection 2 to Linux we ran into the case sensitivity issue. Knowing this would be an issue, we were reasonably careful during development to keep the case of file names consistent, but there were a few instances where assets used the wrong case to refer to another file. We decided the best option was to make our Linux version work the same as Windows since there were already a number of popular mods and we didn’t want to break them by enforcing case sensitivity on any platforms.

We made the Linux version case insensitive by scanning the directory for a matching file name — ignoring case — if we tried to open a file and it failed. We added in warnings when this happened so that they could be fixed up, but everything worked as expected. Since Future Perfect is a new project without the same legacy, we decided a better route is to make our file operations on Windows case sensitive.

To make Windows case sensitive, we check if the capitalization in the supplied file name matches the actual capitalization on disk before we open the file. The main challenge is the Windows doesn’t provide a nice elegant way of recovering the proper capitalization of the file name. The most common technique I’ve heard suggested is to utilize the DOS compatible “short” file name. First the test file name is converted to the short name and then that is converted back to a long name:

bool GetIsCaseCorrect(const WCHAR* fileName)
{
    bool result = false;
    // Correct case by converting to short path and back to long
    WCHAR shortFileName[_MAX_PATH];
    if (GetShortPathName(fileName, shortFileName, _MAX_PATH) != 0)
    {
        wchar_t correctFileName[_MAX_PATH];
        GetLongPathName(shortFileName, correctFileName, _MAX_PATH);
        result = wcscmp(fileName, correctFileName) != 0;
    }
    return result;
}

I implemented this method and found that it’s not very fast. Each call (which happens every time a file is opened) took 0.21 ms on my computer which has a SSD. I found that just calling GetLongPathName on the original file name also gave me the correct result, but only reduced the time to 0.13 ms. I’m guessing there are probably some special cases that necessitate the conversion to the short file name first, but I’m not sure of the exact reason.

This set me on a path to find a faster way and I tried a few different methods.

Windows Vista introduced the GetFinalPathNameByHandle function which you can use after opening a file to recover the file name. Using this took 0.18 ms per call which isn’t any better than the long file name version.

The FindFirstFileEx function has a parameter for case sensitivity, but the flag is only respected if the setting is already enabled for the drive. It does however return the proper case for the file name part of the path and is much faster than the other methods I tried (0.022 ms). But, without being able to check the directory part of the path it’s an incomplete solution.

I considered calling FindFirstFileEx on all of the sub-directories in the path, but before I got around to that I found a better method.

Windows contains a number of “undocumented” functions which — desipite the name — are well known and used in applications. NtQueryInformationFile is one of those functions that can be used to get various pieces of information about a file on disk. And, one of those pieces of information is the file name! The one problem is that it doesn’t give you the volume label (i.e. “C:”). The are additinonal function calls you can make to recover that, but they add an extra cost and aren’t really necessary for my purposes. The NtQueryInformationFile implementation took 0.028 ms per call and looks like this:

bool GetIsCaseCorrect(const WCHAR* fileName)
{

    struct FILE_NAME_INFORMATION
    {
        ULONG FileNameLength;
        WCHAR FileName[_MAX_PATH + 1];
    };
    typedef NTSTATUS (NTAPI *_NtQueryInformationFile)(HANDLE,
        PIO_STATUS_BLOCK, PVOID, ULONG, FILE_INFORMATION_CLASS);
    
    HANDLE hFile = CreateFile(fileName, GENERIC_READ,
        FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
        nullptr,
        OPEN_EXISTING,
        FILE_ATTRIBUTE_NORMAL,
        nullptr);

    bool result = false;
    if (hFile != INVALID_HANDLE_VALUE)
    {
        static _NtQueryInformationFile NtQueryInformationFile = nullptr;
        if (NtQueryInformationFile == nullptr)
        {
            HMODULE hDll = LoadLibraryW(L"ntdll.dll");
            NtQueryInformationFile = (_NtQueryInformationFile)GetProcAddress(hDll,
                "NtQueryInformationFile");
        }

        IO_STATUS_BLOCK iosb;
        FILE_NAME_INFORMATION nameInformation;

        NTSTATUS status = NtQueryInformationFile(hFile, &iosb, &nameInformation,
            sizeof(nameInformation), (FILE_INFORMATION_CLASS)9); // FileNameInformation
        CloseHandle(hFile);

        if (status == 0)
        {
            nameInformation.FileName[nameInformation.FileNameLength / sizeof(WCHAR)] = 0;
            // Convert the slashes to our standardized format.
            for (int i = 0; nameInformation.FileName[i] != 0; ++i)
            {
                if (nameInformation.FileName[i] == L'\\')
                {
                    nameInformation.FileName[i] = L'/';
                }
            }

            // Skip the volume label on our test file name.
            const WCHAR* start = wcschr(fileName, L':');
            if (start == nullptr)
            {
                start = fileName;
            }
            else
            {
                // Skip the ':'
                ++start;
            }
            result = wcscmp(start, nameInformation.FileName) == 0;
        }
    }

    return result;

}

There are a number of different failure points in this function. The biggest one is opening the file. It’s quite possible that the file is already opened by another process (or even our own) which can cause it to fail. To make the operation more robust, I changed the function to fall back to the original short file name method if there are any errors.