Skip to content

fix: login race condition and JWT FAIL_UNAUTHORIZED blocking callers#2600

Merged
abdulraqeeb33 merged 1 commit intofeat/identity_verification_5.8from
fix/login-race-and-jwt-blocking
Apr 9, 2026
Merged

fix: login race condition and JWT FAIL_UNAUTHORIZED blocking callers#2600
abdulraqeeb33 merged 1 commit intofeat/identity_verification_5.8from
fix/login-race-and-jwt-blocking

Conversation

@abdulraqeeb33
Copy link
Copy Markdown
Contributor

@abdulraqeeb33 abdulraqeeb33 commented Apr 7, 2026

Summary

  • Non-suspend login() race condition: Split LoginHelper into switchUser() (synchronous local model swap) + enqueueLogin() (async network call). The non-suspend login() now calls switchUser() synchronously on the calling thread, then fires off enqueueLogin() in the background via suspendifyOnIO (FF ON) or Thread { runBlocking } (FF OFF). This ensures tags target the correct user without blocking the main thread or causing ANRs. loginSuspend() still uses the full login() which awaits the backend operation.
  • JWT FAIL_UNAUTHORIZED / FAIL_PAUSE_OPREPO hanging callers: Wake enqueueAndWait waiters with false before re-queuing operations. Re-queued items use waiter = null so future retries (after updateUserJwt) don't reference stale waiters. This prevents loginSuspend from hanging forever when a JWT is invalid or the op repo pauses.

Files Changed

  • LoginHelper.kt — Split into switchUser() (instant, synchronous model swap under lock) + enqueueLogin() (suspend, network call) + login() (both, for loginSuspend)
  • OneSignalImp.ktlogin() calls switchUser() synchronously then dispatches enqueueLogin() respecting the FF flag; logout() unchanged
  • OperationRepo.ktFAIL_UNAUTHORIZED and FAIL_PAUSE_OPREPO branches wake waiters before re-queuing with waiter = null

Test Results

All tests run on emulator (Medium Phone AVD - API 16) with fresh app data cleared between each run.

Test code

Non-suspend (called from onCreate on main thread):

OneSignal.login("alice")            // or OneSignal.login("alice", VALID_JWT)
OneSignal.User.addTag("alice1", "alice1")

OneSignal.login("bob")              // or OneSignal.login("bob", INVALID_JWT)
OneSignal.User.addTag("bob2", "bob2")

OneSignal.login("chris")            // or OneSignal.login("chris", VALID_JWT)
OneSignal.User.addTag("chris3", "chris3")

Suspend (called from CoroutineScope(Dispatchers.IO)):

OneSignal.loginSuspend("alice")     // or OneSignal.loginSuspend("alice", VALID_JWT)
OneSignal.User.addTag("alice1", "alice1")

OneSignal.loginSuspend("bob")       // or OneSignal.loginSuspend("bob", INVALID_JWT)
OneSignal.User.addTag("bob2", "bob2")

OneSignal.loginSuspend("chris")     // or OneSignal.loginSuspend("chris", VALID_JWT)
OneSignal.User.addTag("chris3", "chris3")

Full test matrix

# API JWT FF (threading) Result
1 login() None OFF PASSalice1→alice, bob2→bob, chris3→chris
2 login() VALID / INVALID / VALID OFF PASS — tags correct, no blocking
3 loginSuspend() None OFF PASS — each login waits ~6s for backend, tags correct
4 loginSuspend() VALID / INVALID / VALID OFF PASS — bob's invalid JWT did not block chris
5 login() None ON PASSalice1→alice, bob2→bob, chris3→chris
6 login() VALID / INVALID / VALID ON PASS — tags correct, no blocking
7 loginSuspend() None ON PASS — each login waits ~6s for backend, tags correct
8 loginSuspend() VALID / INVALID / VALID ON PASS — bob's invalid JWT did not block chris

Example logcat output (non-suspend, FF OFF)

after login(alice) tags={alice1=alice1}
after login(bob) tags={bob2=bob2}
after login(chris) tags={chris3=chris3}

Example logcat output (suspend, FF ON)

after loginSuspend(alice) tags={alice1=alice1}    // ~6s after start
after loginSuspend(bob) tags={bob2=bob2}          // ~6s later
after loginSuspend(chris) tags={chris3=chris3}    // ~6s later

Notes

  • No ANR or crash in any test — switchUser() is synchronous but instant (local model swap only), network calls are dispatched to background.
  • JWT tests (2, 4, 6, 8): The flow is correct (bob's login does not block chris), but the FAIL_UNAUTHORIZED / waiter-wake path was not exercised because IV was not enforced by the backend on the test app. To fully validate the 401 waiter fix, run against an app with identity verification enabled on the dashboard.

Test plan

  • Non-suspend login() + addTag() in immediate succession — each tag lands on correct user (FF OFF)
  • Non-suspend login() + addTag() in immediate succession — each tag lands on correct user (FF ON)
  • Non-suspend login() with VALID/INVALID/VALID JWT — tags correct, no blocking (FF OFF)
  • Non-suspend login() with VALID/INVALID/VALID JWT — tags correct, no blocking (FF ON)
  • loginSuspend() + addTag() sequential — each call waits for backend, tags correct (FF OFF)
  • loginSuspend() + addTag() sequential — each call waits for backend, tags correct (FF ON)
  • loginSuspend() with VALID/INVALID/VALID JWT — bob does not block chris (FF OFF)
  • loginSuspend() with VALID/INVALID/VALID JWT — bob does not block chris (FF ON)
  • No ANR or crash when calling login() from main thread
  • FAIL_UNAUTHORIZED waiter-wake path with IV enabled on backend (401 triggers wake + re-queue)
  • updateUserJwt() causes re-queued operations to execute after JWT refresh

Fixes SDK-4338

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR makes / blocking (via ) and changes failure semantics in the operation repo — these are non-trivial concurrency changes that warrant a human look, particularly around main-thread ANR exposure for the non-suspend API.

Extended reasoning...

Overview\nThe PR touches two files: (login/logout threading model) and (waiter wake-up on FAIL_UNAUTHORIZED and FAIL_PAUSE_OPREPO). Both changes are small in line count but meaningful in behavioral impact.\n\n### Security risks\nNo security-sensitive paths (auth, crypto, permissions) are directly altered. The JWT invalidation path in OperationRepo is adjusted, but the fix only changes when waiters are notified — the actual JWT invalidation logic is untouched.\n\n### Level of scrutiny\nHigh. The / change from fire-and-forget / to converts these methods from non-blocking to fully blocking. If a developer calls on the Android main thread (which is possible with the deprecated API), will block that thread for the full duration of , which involves operation repo enqueueing and potentially network waits — a classic ANR vector. The OperationRepo changes alter the contract of : callers now receive immediately on FAIL_UNAUTHORIZED/FAIL_PAUSE_OPREPO rather than waiting for the eventual retry result. This is intentional per the PR description, but changes observable behavior for any caller that inspects the return value.\n\n### Other factors\nNo automated bug reports were filed. The PR description is clear and the intent is sound (fix a real hang/race condition), but the threading changes are in production-critical paths where subtle mistakes can cause ANRs or silent operation drops. Human review is appropriate here.

@abdulraqeeb33 abdulraqeeb33 force-pushed the fix/login-race-and-jwt-blocking branch 5 times, most recently from 4b98693 to ac70e95 Compare April 7, 2026 16:50
@abdulraqeeb33 abdulraqeeb33 changed the title Fix login race condition and JWT FAIL_UNAUTHORIZED blocking callers fix: login race condition and JWT FAIL_UNAUTHORIZED blocking callers Apr 7, 2026
@abdulraqeeb33 abdulraqeeb33 force-pushed the fix/login-race-and-jwt-blocking branch from ac70e95 to f816aa7 Compare April 7, 2026 17:27
Copy link
Copy Markdown
Contributor

@nan-li nan-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit
tested login flows with JWT and without, and login flows where JWT=OFF can fail with

OneSignal.loginSuspend("alice")
OneSignal.loginSuspend("1") // will fail
OneSignal.loginSuspend("chris")

* Full login: switches user models then waits for the backend operation.
* Used by [loginSuspend] where the caller can afford to suspend.
*/
suspend fun login(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need this method? Since in OneSignalImp.login(), we call switchUser() and enqueueLogin() manually, could loginsuspend() also call these 2 manually - as there's added confusion that LoginHelper.login()is only called byloginSuspend()but notlogin()`

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good one @nan-li - got rid of it.

1. Split LoginHelper into switchUser() + enqueueLogin() so the
   non-suspend login() can synchronously swap user models (instant,
   no network) while the backend create-user call fires in the
   background. This prevents ANRs on main thread while ensuring
   tags target the correct user after each login() call.
   loginSuspend() also calls switchUser() + enqueueLogin() directly
   (no wrapper method) for clarity.

2. Wake enqueueAndWait waiters on FAIL_UNAUTHORIZED and
   FAIL_PAUSE_OPREPO before re-queuing operations. Re-queued items
   use waiter=null so future retries don't reference stale waiters.
   This prevents loginSuspend from hanging forever when a JWT is
   invalid or the op repo pauses.

Fixes SDK-4338

Made-with: Cursor
@abdulraqeeb33 abdulraqeeb33 force-pushed the fix/login-race-and-jwt-blocking branch from f816aa7 to f47c974 Compare April 8, 2026 20:54
@abdulraqeeb33
Copy link
Copy Markdown
Contributor Author

Broken tests are unrelated. will be fixed in the next PR #2605

@abdulraqeeb33 abdulraqeeb33 merged commit a292935 into feat/identity_verification_5.8 Apr 9, 2026
4 of 5 checks passed
@abdulraqeeb33 abdulraqeeb33 deleted the fix/login-race-and-jwt-blocking branch April 9, 2026 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants