• Win32 debug heap assertion after reading a cached filter filter while

    From Rob Swindell@1:103/705 to GitLab issue in main/sbbs on Wednesday, March 11, 2026 20:24:58
    open https://gitlab.synchro.net/main/sbbs/-/issues/1099

    If a new (possibly just the first) TCP client connection comes into a server (any of Synchronet's TCP servers) while the server is already in the process of terminating, but waiting for child threads to terminate (i.e. at the top of its local `cleanup()` function), a Microsoft debug heap assertion may be thrown upon destruction of a filter/trashcan file object.
    ```
    free_dbg_nolock(void * const block, const int block_use) (debug_heap.cpp:996) _free_dbg(void * block, int block_use) (debug_heap.cpp:1030)
    free(void * block) (free.cpp:35)
    strListFreeStrings(char * * list) (str_list.c:634)
    strListFree(char * * * list) (str_list.c:640)
    filterFile::~filterFile() (/filterfile.hpp:42)
    [External Code] (Unknown Source:0)
    cleanup(int code) (/mailsrvr.cpp:6113)
    mail_server(void * arg) (/mailsrvr.cpp:6632)
    [External Code] (Unknown Source:0)
    [Frames below may be incorrect and/or missing, no symbols loaded for sbbsctrl.exe] (Unknown Source:0)
    ```

    The larger the configured "semaphore check frequency" and the more frequent incoming connections, the more easy it was to reproduce the issue. I could reproduce it on any/all of Synchronet's TCP servers, though I tended to use the mail server for testing experimental changes.

    The issue has never been seen/reproduced in Win32-release builds or non-Windows builds.

    Here's an fruitful Gemini discussion about the issue: https://gemini.google.com/share/aac98154728e

    Deuce had Claude take a look at the issue and provided a couple commits that did not resolve the issue:

    - commit 2fb010d6c3f20027e8b54d2a925e837b52314ea2
    - commit 61695ba1a683ef1a7a50223c5ca72d06df50ed9e

    Although the issue looks and smells like heap corruption, running sbbs.exe or sbbsctrl.exe under Microsoft's Application Verifier didn't find any heap corruption (with heap page "full" verification enabled), even when the assertion/exception occurred.

    I failed to get sbbs.exe or sbbsctrl.exe to run successfully with Address Sanitizer enabled (out of memory errors trying to realloc a 60KB buffer to load main.ini for parsing).
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Wednesday, March 11, 2026 20:31:12
    https://gitlab.synchro.net/main/sbbs/-/issues/1099#note_8555

    Here's an example mail server log (breadcrumbs) from a reproduction case:
    ```
    3/9 07:39:31p Synchronet Mail Server Version 3.21d Debug
    3/9 07:39:31p Compiled master/4fe0bb198 Mar 09 2026 17:35 with MSC 1944
    3/9 07:39:31p Initializing on Mon Mar 9 19:39:31 2026 with options: 1c665d
    3/9 07:39:31p Loading configuration files from s:\sbbs\ctrl\
    3/9 07:39:32p MQTT connecting to broker 192.168.1.2:1883
    3/9 07:39:32p SMTP Transfer Agent listening on socket 0.0.0.0 port 25
    3/9 07:39:32p SMTP Transfer Agent listening on socket :: port 25
    3/9 07:39:32p SMTP Submission Agent listening on socket 0.0.0.0 port 587
    3/9 07:39:32p SMTP Submission Agent listening on socket :: port 587
    3/9 07:39:32p POP3 Server listening on socket 0.0.0.0 port 110
    3/9 07:39:32p Mail Server thread started
    3/9 07:39:33p Mail Server terminate
    3/9 07:39:34p 0000 Waiting for 1 child threads to terminate
    3/9 07:39:34p 1292 SMTP [158.94.208.215] Connection accepted on 71.95.196.34 port 25 from port 51260
    3/9 07:39:39p 1292 SMTP [158.94.208.215] Hostname: <no name>
    3/9 07:39:39p 1292 SMTP [158.94.208.215] !CLIENT BLOCKED in s:\sbbs\text\ip.can since Mar 9 01:46:33 2026 for 3 FAILED LOGIN ATTEMPTS in 59.2m using SMTP (1 total)
    3/9 07:39:39p 0000 Done waiting for child threads to terminate
    3/9 07:39:39p 1848 SMTP Transfer Agent closing socket 0.0.0.0 port 25
    3/9 07:39:39p 1224 SMTP Transfer Agent closing socket :: port 25
    3/9 07:39:39p 1388 SMTP Submission Agent closing socket 0.0.0.0 port 587
    3/9 07:39:39p 1752 SMTP Submission Agent closing socket :: port 587
    3/9 07:39:39p 1384 POP3 Server closing socket 0.0.0.0 port 110
    3/9 07:39:39p #### Mail Server thread terminated (1 connections served, 1 ip-filtered, 0 messages received, 2 concurrent clients)

    ```
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Thursday, March 12, 2026 12:40:45
    https://gitlab.synchro.net/main/sbbs/-/issues/1099#note_8566

    As mentioned in commit 9168bd69, if we I a call to `strListFree(&list);` in `filterFile::reset()`, the crash scenario is possible again.

    For on experiment, I added a call to `ip_can.filter()` in directly in `cleanup()`, manually forcing the sequence of the reproduction case with another client thread/connection being involved and that effectively "fixes" the issue - it *never* happens with that change. I was kind of hoping for the opposite effect.
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Thursday, March 12, 2026 12:41:27
    https://gitlab.synchro.net/main/sbbs/-/issues/1099#note_8568

    i.e.
    ```
    static void cleanup(int code)
    {
    int i;

    if (protected_uint32_value(thread_count) > 1) {
    lprintf(LOG_INFO, "0000 Waiting for %d child threads to terminate", protected_uint32_value(thread_count) - 1);
    while (protected_uint32_value(thread_count) > 1) {
    mswait(100);
    }
    lprintf(LOG_INFO, "0000 Done waiting for child threads to terminate");
    }
    ip_can.listed("");
    free_cfg(&scfg);

    ```
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Thursday, March 12, 2026 12:44:11
    https://gitlab.synchro.net/main/sbbs/-/issues/1099#note_8568

    i.e.
    ```
    static void cleanup(int code)
    {
    int i;

    if (protected_uint32_value(thread_count) > 1) {
    lprintf(LOG_INFO, "0000 Waiting for %d child threads to terminate", protected_uint32_value(thread_count) - 1);
    while (protected_uint32_value(thread_count) > 1) {
    mswait(100);
    }
    lprintf(LOG_INFO, "0000 Done waiting for child threads to terminate");
    }
    ip_can.listed("");
    free_cfg(&scfg);

    ```
    With this change, it never crashes in ip_can.reset(), though it can/does still in the *other* filter file `.reset()` calls.
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Thursday, March 12, 2026 12:55:40
    https://gitlab.synchro.net/main/sbbs/-/issues/1099#note_8566

    As mentioned in commit 9168bd69, if we I a call to `strListFree(&list);` in `filterFile::reset()`, the crash scenario is possible again.

    For on experiment, I added a call to `ip_can.filter()` in directly in `cleanup()`, manually forcing the sequence of the reproduction case with another client thread/connection being involved and that effectively "fixes" the issue - it ~~*never*~~ still happens with that change. Though not as deterministically as I was hoping for.
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Thursday, March 12, 2026 12:56:13
    https://gitlab.synchro.net/main/sbbs/-/issues/1099#note_8568

    i.e.
    ```
    static void cleanup(int code)
    {
    int i;

    if (protected_uint32_value(thread_count) > 1) {
    lprintf(LOG_INFO, "0000 Waiting for %d child threads to terminate", protected_uint32_value(thread_count) - 1);
    while (protected_uint32_value(thread_count) > 1) {
    mswait(100);
    }
    lprintf(LOG_INFO, "0000 Done waiting for child threads to terminate");
    }
    ip_can.listed("");
    free_cfg(&scfg);

    ```
    With this change, it ~~never ~~ still crashes in `ip_can.reset()` on occasion. --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Thursday, March 12, 2026 19:09:39
    https://gitlab.synchro.net/main/sbbs/-/issues/1099#note_8572

    Tried to create a small reproduction case for this issue, unsuccessfully.

    Created xpdev/strlisttest.c:
    ```
    #include "str_list.h"
    #include "threadwrap.h"
    #include "genwrap.h"

    str_list_t list;

    static char* process_findstr_item(size_t index, char *str, void* cbdata)
    {
    SKIP_WHITESPACE(str);
    truncnl(str);
    return c_unescape_str(str);
    }

    void thread(void* arg) {
    const char* fname = arg;
    FILE *fp;
    printf("Reading %s\n", fname);
    if ((fp = fopen(fname, "r")) == NULL)
    return;

    list = strListReadFile(fp, NULL, 1000);
    strListModifyEach(list, process_findstr_item, /* cbdata: */ NULL);

    fclose(fp);
    int count;
    COUNT_LIST_ITEMS(list, count);
    printf("Read %d items\n", count);
    }

    int main(int argc, char ** argv) {

    for (int i = 1; i < argc; ++i) {
    _beginthread(thread, 0, argv[i]);
    SLEEP(2000);
    printf("Freeing list\n");
    strListFree(&list);
    printf("Done freeing list\n");
    }

    return 0;
    }
    ```

    Built it for Win32-debug with this command-line:
    ```
    cl /Zi /fsanitize=address -DHAS_STDINT_H strlisttest.c str_list.c genwrap.c xpprintf.c
    ```

    And ran the result like this:
    ```
    strlisttest s:\sbbs\text\ip.can s:\sbbs\text\host.can s:\sbbs\text\ip.can s:\sbbs\text\host.can s:\sbbs\text\ip-silent.can
    ```

    No errors or exceptions of any kind reported.
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Thursday, March 12, 2026 19:20:24
    https://gitlab.synchro.net/main/sbbs/-/issues/1099#note_8573

    Claude's 3rd analysis of the issue/cause was:
    ```
    <Deuce> ## Context
    <Deuce> Rounds 1–2 (committed as 2fb010d6c3, 61695ba1a6) fixed post-`thread_down()`
    <Deuce> access to shared resources. The crash persists: MSVC debug heap assertion fires
    <Deuce> in `strListFreeStrings` when `cleanup()` deletes `ip_can` in mailsrvr.cpp.
    <Deuce> The user's consistent observation: **"it only crashes when a new connection
    <Deuce> thread is initialized after the terminate_server flag is set."** <Deuce> ### Analysis
    <Deuce> The thread_count synchronization in `cleanup()` is correct — it atomically
    <Deuce> waits for all child threads to finish via `std::atomic_load` (Windows) before
    <Deuce> deleting filter objects. The actual heap corruption source is likely a buffer
    <Deuce> overflow somewhere in the SMTP/POP3 client thread code. It is only **detected**
    <Deuce> during `cleanup()` because that's when `ip_can`'s strings are freed and the
    <Deuce> debug heap guard bytes are checked.
    <Deuce> **Why only with a late connection?** In normal operation, the corruption (if
    <Deuce> present) is never detected because `ip_can` is never freed until shutdown. With
    <Deuce> a late connection during shutdown, the thread allocates memory near `ip_can`'s
    <Deuce> strings on the heap, and whatever overflow occurs corrupts the guard bytes of
    <Deuce> those adjacent allocations.
    ```
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Friday, March 13, 2026 18:52:45
    https://gitlab.synchro.net/main/sbbs/-/issues/1099#note_8576

    Tried to create a small C++ reproduction case for this issue, unsuccessfully.

    Created xpdev/strlisttest.cpp:
    ```
    #include "str_list.h"
    #include "threadwrap.h"
    #include "genwrap.h"
    #include "dirwrap.h"

    #include <atomic>
    #include <mutex>
    #include <time.h>

    static char* process_findstr_item(size_t index, char *str, void* cbdata)
    {
    SKIP_WHITESPACE(str);
    truncnl(str);
    return c_unescape_str(str);
    }

    str_list_t findstr_list(const char* fname)
    {
    FILE* fp;
    str_list_t list;

    if ((fp = fopen(fname, "r")) == NULL)
    return NULL;

    list = strListReadFile(fp, NULL, 1000);
    strListModifyEach(list, process_findstr_item, /* cbdata: */ NULL);
    printf("Read %s\n", fname);

    fclose(fp);

    return list;
    }

    class filterFile {
    public:
    filterFile() {
    pthread_mutex_init(&mutex, nullptr);
    }
    filterFile(const char* fname) : filterFile() {
    init(fname);
    }
    void init(const char* fname) {
    strlcpy(this->fname, fname, sizeof this->fname);
    }
    filterFile(const filterFile&) = delete;
    filterFile& operator=(const filterFile&) = delete;
    ~filterFile() {
    strListFree(&list);
    pthread_mutex_destroy(&mutex);
    }
    void reset() {
    fread_count = 0;
    total_found = 0;
    timestamp = 0;
    lastftime_check = 0;
    strListFree(&list);
    }
    std::atomic<uint> fread_count{};
    std::atomic<uint> total_found{};
    time_t fchk_interval{1}; // seconds
    char fname[MAX_PATH + 1]{};
    bool listed(const char* str1, const char* str2 = nullptr, struct trash* details = nullptr) {
    bool result;
    time_t now = time(nullptr);
    if (fchk_interval) {
    pthread_mutex_lock(&mutex);
    if ((now - lastftime_check) >= fchk_interval) {
    lastftime_check = now;
    time_t latest = fdate(fname);
    if (latest > timestamp) {
    strListFree(&list);
    list = findstr_list(fname);
    timestamp = latest;
    ++fread_count;
    }
    }
    result = false; //trash_in_list(str1, str2, list, details);
    pthread_mutex_unlock(&mutex);
    }
    if (result)
    ++total_found;
    return result;
    }
    private:
    str_list_t list{};
    pthread_mutex_t mutex{};
    time_t lastftime_check{};
    time_t timestamp{};

    };

    filterFile filter;

    void thread(void* arg) {
    filter.listed("");
    }

    int main(int argc, char ** argv) {

    for (int i = 1; i < argc; ++i) {
    filter.init(argv[i]);
    _beginthread(thread, 0, nullptr);
    SLEEP(2000);
    printf("Freeing list\n");
    filter.reset();
    printf("Done freeing list\n");
    }

    return 0;
    }
    ```
    Build it with this command-line replicating all the options to be sbbs.dll Win32-debug and adding address-sanitizer:
    ```
    cl /GS /analyze- /W3 /Zc:wchar_t /Zi /Gm- /Od /Zc:inline /fp:precise /D "_DEBUG" /D "WIN32" /D "_LIB" /D "LINK_LIST_THREADSAFE" /D "WINVER=0x600" /D "_WIN32_WINNT=0x600" /D "HAS_INTTYPES_H" /D "HAS_STDINT_H" /D "XPDEV_THREAD_SAFE" /D "_CRT_SECURE_NO_DEPRECATE" /D "_CRT_NONSTDC_NO_DEPRECATE" /D "_VC80_UPGRADE=0x0600" /D "_MBCS" /errorReport:prompt /WX- /Zc:forScope /RTC1 /std:c17 /arch:IA32 /Gd /Oy- /MTd /FC /EHsc /nologo /diagnostics:column /fsanitize=address -DHAS_STDINT_H strlisttest.cpp str_list.c genwrap.c xpprintf.c dirwrap.c threadwrap.c
    ```

    The resulting executable runs just fine:
    ```
    C:\sbbs\src\xpdev>strlisttest s:\sbbs\text\ip.can s:\sbbs\text\host.can s:\sbbs\text\ip.can s:\sbbs\text\host.can s:\sbbs\text\ip-silent.can
    Read s:\sbbs\text\ip.can
    Freeing list
    Done freeing list
    Read s:\sbbs\text\host.can
    Freeing list
    Done freeing list
    Read s:\sbbs\text\ip.can
    Freeing list
    Done freeing list
    Read s:\sbbs\text\host.can
    Freeing list
    Done freeing list
    Read s:\sbbs\text\ip-silent.can
    Freeing list
    Done freeing list
    ```

    I'm out of ideas for now.
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Friday, March 13, 2026 19:47:36
    https://gitlab.synchro.net/main/sbbs/-/issues/1099#note_8577

    I tried a dynamic allocation version of this C++ test file as well, with no reproduction:
    ...
    ```
    filterFile* filter;

    void thread(void* arg) {
    filter->listed("");
    }

    int main(int argc, char ** argv) {

    for (int i = 1; i < argc; ++i) {
    filter = new filterFile(argv[i]);
    _beginthread(thread, 0, nullptr);
    SLEEP(2000);
    printf("Freeing list\n");
    delete filter;
    printf("Done freeing list\n");
    }

    return 0;
    }
    ```
    --- SBBSecho 3.37-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)