Skip to content

Refactor CH thread management#594

Open
Roytak wants to merge 9 commits intodevelfrom
ch-lock-fix
Open

Refactor CH thread management#594
Roytak wants to merge 9 commits intodevelfrom
ch-lock-fix

Conversation

@Roytak
Copy link
Copy Markdown
Collaborator

@Roytak Roytak commented Apr 13, 2026

Change CH threads from detached to joinable (there was a memory leak where there wasn't enough time to free the thread when the server is stopping). The server now joins all the CH threads before ending.

Also fix multiple instances of incorrectly checking the return value of nc_rwlock_lock().

Added a function nc_session_server_ch_client_dispatch_stop_if_dispatched() that joins these CH threads on config change or server destroy.

Added a WRN log to inform the user that even though he created new CH clients, they won't be dispatched because required CBs were not set, previously this was done silently and threads of clients that were deleted weren't being stopped because of that.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors Call Home (CH) thread lifecycle management to avoid shutdown-time leaks by switching from detached threads to joinable threads and introducing a centralized “stop if dispatched” helper used during config changes and server teardown. It also tightens lock/return-value handling in several CH-related paths and adds a warning when CH clients cannot be dispatched due to missing callbacks.

Changes:

  • Make CH client threads joinable and introduce nc_session_server_ch_client_dispatch_stop_if_dispatched() to stop+join them during config updates and nc_server_destroy().
  • Improve Call Home reconcile behavior (stop deleted clients even if dispatch callbacks are missing) and add a WRN log when new CH clients won’t be dispatched.
  • Minor session transport/logging improvements and library version bump.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/session.c Adjusts CH lock handling during session free and improves an SSH EOF timeout warning message.
src/session_server.h Restructures NC_ENABLED_SSH_TLS preprocessor guards around SSH-related declarations.
src/session_server.c Implements CH thread stop+join helper, switches CH threads to joinable, and updates server destroy teardown ordering.
src/session_p.h Adds CH thread pthread_t storage and declares the new internal stop helper.
src/server_config.c Updates CH client reconciliation to use the centralized stop helper and adds a warning when dispatch callbacks are missing.
CMakeLists.txt Bumps micro version / soversion.
Comments suppressed due to low confidence (1)

src/session_server.c:3864

  • With threads now joinable and stoppable via thread_running, there is a race where a stop can be requested before the new thread executes and sets data->thread_running = 1 (the thread currently sets it unconditionally on startup). In that case, the stop request can be overwritten and pthread_join() in the stop helper may block indefinitely. Consider initializing arg->thread_running under arg->cond_lock before pthread_create() and updating the thread start logic to not flip it back to running if a stop was already requested.
    pthread_cond_init(&arg->cond, NULL);
    pthread_mutex_init(&arg->cond_lock, NULL);

    /* CH THREADS WR LOCK */
    if (nc_rwlock_lock(&server_opts.ch_threads_lock, NC_RWLOCK_WRITE, NC_CH_THREADS_LOCK_TIMEOUT, __func__) != 1) {
        rc = -1;
        goto cleanup;
    }
    ch_threads_lock = NC_RWLOCK_WRITE;

    /* Append the thread data pointer to the global array.
     * Pointer instead of struct so that server_opts.ch_thread_lock does not have to be
     * locked everytime the arg is accessed */
    LY_ARRAY_NEW_GOTO(NULL, server_opts.ch_threads, new_item, rc, cleanup);
    *new_item = arg;

    /* create the CH thread */
    if ((r = pthread_create(&arg->tid, NULL, nc_ch_client_thread, arg))) {
        ERR(NULL, "Creating a new thread failed (%s).", strerror(r));
        /* remove from the array */
        LY_ARRAY_DECREMENT_FREE(server_opts.ch_threads);
        rc = -1;
        goto cleanup;
    }

    /* arg is now owned by the thread */
    arg = NULL;


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Roytak added 2 commits April 14, 2026 09:20
Store thread IDs in ch_thread_arg to allow proper pthread_join when
stopping threads. Add nc_session_server_ch_client_dispatch_stop_if_dispatched()
to centralize thread stopping logic, fixing several locking issues:
- Only unlock ch_lock if it was successfully acquired in nc_session_free()
- Use config_update_lock to prevent data race in nc_server_destroy()
- Properly handle config lock unlocking during thread join to avoid deadlock
Ubuntu 22.04 uses libssh-dev v0.9.6, which contains a bug where multiple
threads call strtok, which isn't present in v0.10.6 that ubuntu 24.04
uses.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors server-side Call Home (CH) thread lifecycle to avoid shutdown-time leaks by switching CH threads from detached to joinable and ensuring threads are stopped/joined on config changes and server destroy. Also tightens several lock-return checks and adds a warning when CH clients cannot be auto-dispatched due to missing callbacks.

Changes:

  • Make CH client threads joinable, store pthread_t in thread args, and add a helper to stop+join a dispatched CH thread.
  • Rework CH dispatch reconciliation to still stop deleted clients even when dispatch callbacks are unset, and warn when new clients won’t be dispatched.
  • Update shutdown/config/session logic and CI/build metadata (version bump + Ubuntu runner update).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/session_server.h Adjusts NC_ENABLED_SSH_TLS preprocessor boundaries around SSH/TLS-specific APIs.
src/session_server.c Implements joinable CH threads, adds stop+join helper, and changes server destroy cleanup flow.
src/session_p.h Extends CH thread argument with stored pthread_t and exposes internal stop+join helper.
src/session.c Fixes lock/unlock handling and improves an SSH EOF timeout warning message.
src/server_config.c Refactors CH dispatch reconcile to stop deleted threads reliably and warn about missing callbacks.
CMakeLists.txt Bumps library micro version and micro soversion.
.github/workflows/ci.yml Updates CI matrix to run on ubuntu-24.04.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Roytak added 2 commits April 14, 2026 13:29
Change return type from void to int to allow callers to handle
cleanup failures. Also update nc_server_notif_cert_expiration_thread_stop
to return error code so that nc_server_destroy can properly propagate
errors.
Move Call Home thread data from global server_opts.ch_threads array
to individual ch_client->thread pointer. This simplifies thread
management by eliminating the separate ch_threads_lock and array
manipulation. Each ch_client now directly owns its thread data,
making the lifecycle management more straightforward.
When ch sess is established, only the status of the sess was checked in
a loop, which isn't enough when we just delete a CH client from the
config without the client gracefully closing first.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors Call Home (CH) thread lifecycle management to avoid leaked detached threads by making CH threads joinable and joining/stopping them on server shutdown and config changes. It also tightens several lock-return-value checks and improves logging when CH dispatch callbacks are missing.

Changes:

  • Convert CH client threads from detached to joinable, store thread IDs, and add stop+join logic during config reconciliation and server destroy.
  • Fix/standardize error handling around nc_rwlock_lock() return values and adjust related control flow.
  • Extend CH tests to cover deleting a CH client while a session is established; update CI runner OS and bump version.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_ch.c Adds a new CH regression test for deleting a client while a session is RUNNING; introduces additional ports and barrier synchronization.
src/session_server.h Updates public API signatures (nc_server_destroy(), cert-exp thread stop) and adjusts NC_ENABLED_SSH_TLS header scoping.
src/session_server.c Implements joinable CH threads, stop+join helper, updates shutdown/config locking sequence, and updates cert-exp thread stop to return status.
src/session_p.h Adds pthread_t tid to CH thread args and stores CH thread pointer in ch_client config structures; updates internal prototypes.
src/session.c Tweaks CH lock handling during session free and improves an SSH EOF warning message.
src/server_config.c Refactors CH client dispatch reconciliation to use per-client stored thread pointers and adds warning when dispatch callbacks are unset.
CMakeLists.txt Bumps micro version/SOVERSION micro.
.github/workflows/ci.yml Moves CI jobs from Ubuntu 22.04 to 24.04.
Comments suppressed due to low confidence (2)

src/server_config.c:5384

  • nc_server_config_reconcile_chclients_dispatch() decides whether a CH client is “already running” by checking new_ch_client->thread, but in the nc_server_config_setup_data() path the new config is built from scratch and thread is never populated for existing clients. As a result, existing clients will be treated as new and dispatched again, while the old threads keep running (and their thread pointers are lost after the config swap), leading to duplicate CH threads and threads that can no longer be stopped via config changes. Consider matching new_cfg clients to old_cfg by name and copying/migrating old_ch_client->thread into new_ch_client->thread before PHASE 1 dispatch decisions.
    if (dispatch_new_clients) {
        /* only dispatch if all required CBs are set */
        LY_ARRAY_FOR(new_cfg->ch_clients, struct nc_ch_client, new_ch_client) {
            if (new_ch_client->thread) {
                /* already running */
                continue;
            }

            /* this is a new Call Home client, dispatch it */
            rc = _nc_connect_ch_client_dispatch(new_ch_client, server_opts.ch_dispatch_data.acquire_ctx_cb,
                    server_opts.ch_dispatch_data.release_ctx_cb, server_opts.ch_dispatch_data.ctx_cb_data,
                    server_opts.ch_dispatch_data.new_session_cb, server_opts.ch_dispatch_data.new_session_cb_data);

src/session_server.c:3890

  • The comments in nc_connect_ch_client_dispatch() still say /* CONFIG READ LOCK */ and /* CONFIG READ UNLOCK */, but the code now acquires/releases a WRITE lock. Updating these comments would avoid confusion when reasoning about lock requirements.
    /* CONFIG READ LOCK */
    if (nc_rwlock_lock(&server_opts.config_lock, NC_RWLOCK_WRITE, NC_CONFIG_LOCK_TIMEOUT, __func__) != 1) {
        return -1;
    }

    /* check ch client existence */
    ch_client = nc_server_ch_client_get(client_name);
    NC_CHECK_ERR_GOTO(!ch_client, rc = -1; ERR(NULL, "Call Home client \"%s\" not found.", client_name), cleanup);

    /* requires config wr lock */
    rc = _nc_connect_ch_client_dispatch(ch_client, acquire_ctx_cb, release_ctx_cb, ctx_cb_data,
            new_session_cb, new_session_cb_data);

cleanup:
    /* CONFIG READ UNLOCK */
    nc_rwlock_unlock(&server_opts.config_lock, __func__);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants