Overcoming Concurrency Limits: From OS Threads to Virtual Threads, Async/Await, and Coroutines

    Overcoming Concurrency Limits: From OS Threads to Virtual Threads, Async/Await, and Coroutines

    12/06/2025

    Introduction

    In a previous post Process vs Thread, we went through how we can use platform threads to improve performance through concurrent task execution. However, as modern applications demand increasingly higher levels of concurrency, traditional OS threads have significant limitations that can bottleneck scalability.

    Imagine trying to serve 10,000 simultaneous users with traditional threading—you'd quickly exhaust system resources! This post explores different approaches that modern programming languages and frameworks provide to overcome these limitations: virtual threads, async/await, coroutines, and reactive programming.

    The Limits of Traditional Threading

    Think of traditional threading like a restaurant where each table (request) must have a dedicated waiter (thread) standing by, even when customers are reading the menu or talking. Most of the time, waiters are idle, yet they're still consuming the restaurant's limited space and resources. When all waiters are occupied, new customers must wait outside, even though the kitchen (CPU) isn't busy!

    Memory and Performance Issues

    OS threads are managed by the operating system and come with significant overhead:

    • Memory consumption: Each thread requires 1-2 MB of stack space
    • Context switching overhead: Switching between threads is expensive
    • Limited scalability: Most systems can handle only 1,000-2,000 threads
    • Resource waste: In most business applications, threads spend ~95% of time blocked on I/O operations

    In typical business applications, threads are frequently blocked waiting for:

    • Database queries
    • Network requests
    • File system operations
    • External API calls

    This creates a paradox: while your CPU remains underutilized, you cannot create more threads to handle additional requests due to memory limitations.

    Memory Usage Memory Limit Request 1 Request 2 Request 3 Request 4 Request 5 Request 6 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 ⚠️ CPU Usage: Low Blocked on I/O Thread-per-request: Each request gets a thread. Most threads blocked, memory grows, CPU underutilized, OS thread limit reached. New requests rejected No more threads

    In the thread-per-request model, each incoming request is assigned a dedicated thread. When these threads get blocked (often waiting for I/O), memory usage increases rapidly. Even though the CPU is mostly idle, the application eventually hits the OS thread/memory limit, preventing it from handling more requests.

    Modern Concurrency Models

    1. Virtual Threads in Java

    Virtual threads represent a paradigm shift in Java concurrency. Introduced as a preview feature in Java 19 and finalized in Java 21, they're lightweight threads managed entirely by the JVM rather than the operating system.

    How Virtual Threads Work

    Think of virtual threads like ride-sharing for OS threads. Instead of each passenger (virtual thread) needing their own car (OS thread), multiple passengers can share cars efficiently. When a passenger gets out to run an errand (I/O operation), the car becomes available for other passengers.

    Key mechanisms:

    • Continuation capture: When blocked, virtual thread state is serialized and stored as Continuation in heap memory
    • Carrier thread release: Blocked virtual threads release their OS thread immediately
    • Automatic mounting: When I/O completes, virtual threads are remounted on available OS threads

    Virtual Thread Lifecycle

    Created Mounted (Running) Unmounted (Blocked) Terminated start() I/O block I/O complete finish OS Thread Pool (Carrier Threads) Thread 1 Thread 2 Thread 3 VT1 VT2 VT3 Unmounted Virtual Threads (Waiting for I/O) VT4 VT5 VT6 VT7 Virtual Thread Lifecycle Virtual threads mount/unmount from OS threads automatically, allowing millions of concurrent tasks with minimal memory overhead and efficient resource utilization.

    Virtual Thread Examples

    // Simple Virtual Thread Creation public class VirtualThreadExample { public static void main(String[] args) throws InterruptedException { // Method 1: Using Thread.startVirtualThread() Thread vThread1 = Thread.startVirtualThread(() -> { System.out.println("Virtual thread 1: " + Thread.currentThread()); }); // Method 2: Using Thread.ofVirtual() Thread vThread2 = Thread.ofVirtual() .name("custom-virtual-thread") .start(() -> { System.out.println("Virtual thread 2: " + Thread.currentThread()); }); // Method 3: Using ExecutorService try (var executor = Executors.newVirtualThreadPerTaskExecutor()) { for (int i = 0; i < 10000; i++) { final int taskId = i; executor.submit(() -> { // Simulate I/O operation try { Thread.sleep(1000); System.out.println("Task " + taskId + " completed on " + Thread.currentThread()); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } }); } } // ExecutorService auto-closes } }

    To enable virtual thread in a Spring Boot application, just add the below property in application.properties:

    spring.threads.virtual.enabled=true

    2. Async/Await (JavaScript, Python, C#)

    Async/await represents a programming model that makes asynchronous code look and behave more like synchronous code, while still maintaining non-blocking execution. Think of it as a smart waiter who takes multiple orders simultaneously—when one table's order is cooking, they serve other tables instead of standing idle.

    The Event Loop Architecture in JavaScript

    Event Loop Architecture Main Thread (JavaScript Engine) Executes synchronous code I/O Operations (Web APIs / Threads) Network, File, DB operations Event Loop (Scheduler) Monitors & dispatches tasks Callback Queue (Ready Tasks) Completed async operations Call Stack (Current Execution) Active function calls 1. Offload async task 2. Task completes 3. Check queue 4. Execute callback 5. Resume 🚀 Non-blocking: Main thread never waits for I/O operations, enabling high concurrency

    How Async/Await Works Step by Step

    1. Function encounters await: Execution pauses, but doesn't block the thread
    2. Async operation starts: I/O operation runs in background (separate thread or event-driven)
    3. Thread continues: Main thread processes other tasks from the event queue
    4. Operation completes: Result is placed in callback queue
    5. Event loop checks: When main thread is free, it picks up the completed task
    6. Function resumes: Execution continues from where it left off with the result

    JavaScript (Node.js) Example

    // Advanced async/await patterns in JavaScript const fs = require('fs').promises; const https = require('https'); // Sequential vs Parallel execution async function fetchUserData(userId) { console.log(`Fetching data for user ${userId}...`); // Sequential - takes ~3 seconds total const profile = await fetchUserProfile(userId); const preferences = await fetchUserPreferences(userId); const history = await fetchUserHistory(userId); return { profile, preferences, history }; } async function fetchUserDataParallel(userId) { console.log(`Fetching data for user ${userId} in parallel...`); // Parallel - takes ~1 second total (longest operation) const [profile, preferences, history] = await Promise.all([ fetchUserProfile(userId), // 1 second fetchUserPreferences(userId), // 0.5 seconds fetchUserHistory(userId) // 0.8 seconds ]); return { profile, preferences, history }; }

    Python (asyncio)

    import asyncio import time from typing import List async def fetch_data(user_id: int) -> dict: """Simulate async API call""" await asyncio.sleep(0.5) # Simulate network delay return {'user_id': user_id, 'data': f'User {user_id} data'} async def process_users_sequential(user_ids: List[int]) -> List[dict]: """Process users one by one (slow)""" results = [] for user_id in user_ids: result = await fetch_data(user_id) results.append(result) return results async def process_users_concurrent(user_ids: List[int]) -> List[dict]: """Process users concurrently (fast)""" tasks = [fetch_data(user_id) for user_id in user_ids] return await asyncio.gather(*tasks) async def main(): user_ids = [1, 2, 3, 4, 5] # Sequential processing start = time.time() await process_users_sequential(user_ids) sequential_time = time.time() - start # Concurrent processing start = time.time() await process_users_concurrent(user_ids) concurrent_time = time.time() - start print(f"Sequential: {sequential_time:.2f}s") print(f"Concurrent: {concurrent_time:.2f}s") print(f"Speedup: {sequential_time/concurrent_time:.1f}x") # Run the example asyncio.run(main())

    C# (.NET)

    using System; using System.Net.Http; using System.Threading.Tasks; public class AsyncExamples { private static readonly HttpClient httpClient = new HttpClient(); // Basic async/await pattern public async Task<string> GetUserAsync(int userId) { var response = await httpClient.GetAsync($"https://api.example.com/users/{userId}"); return await response.Content.ReadAsStringAsync(); } // Parallel execution public async Task<string[]> GetMultipleUsersAsync(int[] userIds) { var tasks = userIds.Select(id => GetUserAsync(id)); return await Task.WhenAll(tasks); } // With timeout public async Task<string> GetUserWithTimeoutAsync(int userId, int timeoutSeconds = 5) { using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(timeoutSeconds)); return await GetUserAsync(userId).WaitAsync(cts.Token); } }

    3. Coroutines (Kotlin)

    Coroutines in Kotlin represent a paradigm where functions can be suspended and resumed, allowing for cooperative multitasking. Unlike preemptive threading where the OS forcibly switches between threads, coroutines voluntarily yield control at specific suspension points.

    Think of coroutines like a dance floor where dancers (coroutines) take turns using the space. When one dancer needs to step out briefly (suspension point), they politely yield the floor to others, then smoothly resume their dance when ready.

    Kotlin Coroutines Architecture Coroutine Scheduler Manages coroutine execution Coroutine A Running Coroutine B Running Coroutine C Running Coroutine D Running Coroutine E Suspended Coroutine F Suspended Coroutine G Suspended Suspend Suspend Resume Key Features: 💡 Lightweight (100k+ coroutines) • 🔄 Cooperative (yield voluntarily) • ⚡ Structured concurrency • 🧵 Thread-safe

    Key characteristics of Kotlin coroutines:

    • Lightweight: Can create millions of coroutines with minimal memory overhead
    • Cooperative scheduling: Coroutines yield control at suspension points (like delay(), await())
    • Structured concurrency: Parent-child relationships ensure proper cleanup and cancellation
    • Thread-safe: Built-in mechanisms prevent common concurrency issues
    • Sequential-looking code: Write asynchronous code that reads like synchronous code

    The scheduler efficiently manages execution by moving coroutines between active and suspended states. When a coroutine hits a suspension point (like waiting for network I/O), it yields the thread to other coroutines, maximizing resource utilization.

    // Kotlin Coroutines Example import kotlinx.coroutines.* // Suspend functions can be paused and resumed suspend fun fetchUser(userId: Int): String { delay(500) // Non-blocking delay return "User $userId" } suspend fun fetchUserData(userId: Int): Map<String, String> { delay(300) return mapOf("email" to "user$userId@example.com") } fun main() = runBlocking { println("Starting coroutines") // Run tasks concurrently with async val userDeferred = async { fetchUser(123) } val dataDeferred = async { fetchUserData(123) } // Wait for both results val user = userDeferred.await() val userData = dataDeferred.await() println("$user: $userData") // Launch a separate coroutine val job = launch { println("Background work starting") delay(200) println("Background work complete") } job.join() // Wait for completion println("All done!") }

    4. Go Coroutines (Goroutines)

    Go's concurrency model is built around goroutines, which are lightweight threads managed by the Go runtime. Goroutines allow developers to write concurrent code without dealing with low-level thread management. Goroutines are functions that can run concurrently with other functions. They are extremely lightweight, allowing you to spawn thousands of them without significant overhead. The Go runtime manages scheduling and execution, making it easy to write concurrent programs.

    Go Goroutine Architecture Go Scheduler M:N Multiplexer Maps M goroutine to N OS threads OS Thread 1 Processor (P) OS Thread 2 Processor (P) OS Thread 3 Processor (P) OS Thread N Processor (P) G1 G2 G3 G4 G5 G6 Global Run Queue Waiting go routine G7 G8 G9 G10 G11 G12 Channels Go Routine Communication Data Data Data Data Data Work Stealing Running Running Running Key Features 🪶 Lightweight (2KB stack) • ⚡ Fast context switching • 🔄 Work stealing • 📡 Channel communication 🎯 Cooperative scheduling • 🚀 Millions of go routine • 🧵 M:N threading model

    Go's Goroutine Model:

    • M:N Threading: Maps millions of goroutines (M) to a small number of OS threads (N)
    • Work Stealing: Idle processors steal work from busy ones for load balancing
    • Cooperative: Goroutines yield at function calls, channel operations, and blocking system calls
    • Channel Communication: Type-safe message passing between goroutines
    • Lightweight: Each goroutine starts with only 2KB stack (grows as needed)
    • No Manual Thread Management: Runtime automatically handles scheduling and load distribution
    // Go Coroutines (Goroutines) Example package main import ( "fmt" "time" ) func process(id int) { fmt.Printf("Started goroutine %d\n", id) time.Sleep(100 * time.Millisecond) // Simulate some work fmt.Printf("Finished goroutine %d\n", id) } func main() { // Launch multiple goroutines for i := 1; i <= 5; i++ { go process(i) // Non-blocking, returns immediately } // Wait to see results (in production, use sync.WaitGroup) time.Sleep(200 * time.Millisecond) fmt.Println("All goroutines completed") }

    5. Reactive Programming

    Reactive programming treats data as streams of events that flow through your application. Instead of pulling data when you need it, you react to data as it arrives. Think of it like a newspaper subscription—instead of going to the store every day to check for new issues, the newspaper is delivered to you when it's available.

    Reactive frameworks like RxJava and Spring Reactor excel at handling asynchronous, event-driven workloads with built-in backpressure management.

    How Reactive Streams Work

    Reactive Stream Processing Data Source Publisher DB, API, Events D1 D2 D3 D4 D5 D6 Filter x > 10 Map x * 2 Buffer Size: 3 FlatMap Async Subscriber 1 Database Subscriber 2 Analytics Subscriber 3 Cache Non-Blocking Callback Chain onNext() onError() onComplete() request(n) Backpressure Management 📊 Subscriber controls data flow rate • 🛡️ Prevents overwhelming • ⚡ Dynamic throttling • 🔄 Feedback loops 📊 Data Flow Direction 🔄 Event-driven • ⚡ Non-blocking • 🌊 Stream-based • 📈 Composable • 🎛️ Backpressure-aware

    Non-Blocking Callback Mechanism

    In reactive programming, data flows through streams where each stage processes events asynchronously:

    1. Publisher emits data: Sources like databases, APIs, or user events push data into the stream
    2. Operators transform data: Each transformation (filter, map, buffer) happens without blocking
    3. Subscribers react to events: Multiple consumers can process the same stream independently
    4. Callbacks handle flow control: onNext() for new data, onError() for failures, onComplete() for stream end
    5. Backpressure prevents overflow: Subscribers signal how much data they can handle via request(n)

    The key advantage is that no thread ever blocks. When an operation needs to wait (like a database query), the thread is released to handle other work. When the result arrives, a callback is triggered to continue processing.

    This architecture enables handling millions of concurrent streams with just a handful of threads, making it ideal for high-throughput, low-latency applications like real-time analytics, IoT data processing, and high-frequency trading systems.

    Key Features of Reactive Programming:

    1. Asynchronous processing: Operations don't block the execution thread
    2. Event-driven: Responds to events as they happen
    3. Backpressure: Consumers can regulate how much data producers send
    4. Functional composition: Chain operations on streams easily
    5. Non-blocking: Efficiently uses resources while waiting for I/O

    Example with Spring WebFlux/Reactor

    // Spring WebFlux Reactive REST API Example @RestController public class UserController { private final UserRepository userRepository; public UserController(UserRepository userRepository) { this.userRepository = userRepository; } @GetMapping("/users") public Flux<User> getAllUsers() { return userRepository.findAll(); } @GetMapping("/users/{id}") public Mono<User> getUserById(@PathVariable String id) { return userRepository.findById(id) .switchIfEmpty(Mono.error(new UserNotFoundException(id))); } @PostMapping("/users") public Mono<User> createUser(@RequestBody User user) { return userRepository.save(user); } @GetMapping("/users/search") public Flux<User> searchUsers(@RequestParam String query) { return userRepository.findByNameContaining(query) .filter(user -> user.isActive()) .flatMap(user -> enrichUserData(user)) .timeout(Duration.ofSeconds(3)); } private Mono<User> enrichUserData(User user) { return Mono.just(user) .zipWith(fetchUserPreferences(user.getId())) .map(tuple -> { User u = tuple.getT1(); u.setPreferences(tuple.getT2()); return u; }); } private Mono<UserPreferences> fetchUserPreferences(String userId) { // Simulate fetching from another service return WebClient.create("https://preferences-service") .get() .uri("/preferences/{id}", userId) .retrieve() .bodyToMono(UserPreferences.class) .onErrorResume(e -> Mono.just(new UserPreferences())); } }

    In this example, all database and network operations are non-blocking. The application can handle thousands of concurrent requests with a small number of threads. When one operation is waiting for I/O, the thread can be used to process other requests. Backpressure ensures that if a client is slow at consuming data, the server won't overwhelm it with more data than it can handle.

    Conclusion

    In this article, we explored four powerful concurrency models: Java Virtual Threads, Async Await, Kotlin Coroutines, and Go Goroutines. Each model has its strengths and is suited for different types of applications.


    For more in-depth tutorials on Java, Spring, and modern software development, check out my content:

    🔗 Blog: https://codewiz.info
    🔗 YouTube: https://www.youtube.com/@Code.Wizzard
    🔗 LinkedIn: https://www.linkedin.com/in/code-wiz-740370302/
    🔗 Medium: https://medium.com/@code.wizzard01
    🔗 Github: https://github.com/CodeWizzard01

    Summarise

    Transform Your Learning

    Get instant AI-powered summaries of YouTube videos and websites. Save time while enhancing your learning experience.

    Instant video summaries
    Smart insights extraction
    Channel tracking