Overcoming Concurrency Limits: From OS Threads to Virtual Threads, Async/Await, and Coroutines

12/06/2025

Introduction

In a previous post Process vs Thread, we went through how we can use platform threads to improve performance through concurrent task execution. However, as modern applications demand increasingly higher levels of concurrency, traditional OS threads have significant limitations that can bottleneck scalability.

Imagine trying to serve 10,000 simultaneous users with traditional threading—you'd quickly exhaust system resources! This post explores different approaches that modern programming languages and frameworks provide to overcome these limitations: virtual threads, async/await, coroutines, and reactive programming.

The Limits of Traditional Threading

Think of traditional threading like a restaurant where each table (request) must have a dedicated waiter (thread) standing by, even when customers are reading the menu or talking. Most of the time, waiters are idle, yet they're still consuming the restaurant's limited space and resources. When all waiters are occupied, new customers must wait outside, even though the kitchen (CPU) isn't busy!

Memory and Performance Issues

OS threads are managed by the operating system and come with significant overhead:

Memory consumption: Each thread requires 1-2 MB of stack space
Context switching overhead: Switching between threads is expensive
Limited scalability: Most systems can handle only 1,000-2,000 threads
Resource waste: In most business applications, threads spend ~95% of time blocked on I/O operations

In typical business applications, threads are frequently blocked waiting for:

Database queries
Network requests
File system operations
External API calls

This creates a paradox: while your CPU remains underutilized, you cannot create more threads to handle additional requests due to memory limitations.

In the thread-per-request model, each incoming request is assigned a dedicated thread. When these threads get blocked (often waiting for I/O), memory usage increases rapidly. Even though the CPU is mostly idle, the application eventually hits the OS thread/memory limit, preventing it from handling more requests.

Modern Concurrency Models

1. Virtual Threads in Java

Virtual threads represent a paradigm shift in Java concurrency. Introduced as a preview feature in Java 19 and finalized in Java 21, they're lightweight threads managed entirely by the JVM rather than the operating system.

How Virtual Threads Work

Think of virtual threads like ride-sharing for OS threads. Instead of each passenger (virtual thread) needing their own car (OS thread), multiple passengers can share cars efficiently. When a passenger gets out to run an errand (I/O operation), the car becomes available for other passengers.

Key mechanisms:

Continuation capture: When blocked, virtual thread state is serialized and stored as Continuation in heap memory
Carrier thread release: Blocked virtual threads release their OS thread immediately
Automatic mounting: When I/O completes, virtual threads are remounted on available OS threads

Virtual Thread Lifecycle

Virtual Thread Examples


// Simple Virtual Thread Creation
public class VirtualThreadExample {
    public static void main(String[] args) throws InterruptedException {
        // Method 1: Using Thread.startVirtualThread()
        Thread vThread1 = Thread.startVirtualThread(() -> {
            System.out.println("Virtual thread 1: " + Thread.currentThread());
        });
        
        // Method 2: Using Thread.ofVirtual()
        Thread vThread2 = Thread.ofVirtual()
            .name("custom-virtual-thread")
            .start(() -> {
                System.out.println("Virtual thread 2: " + Thread.currentThread());
            });
        
        // Method 3: Using ExecutorService
        try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
            for (int i = 0; i < 10000; i++) {
                final int taskId = i;
                executor.submit(() -> {
                    // Simulate I/O operation
                    try {
                        Thread.sleep(1000);
                        System.out.println("Task " + taskId + " completed on " + 
                                         Thread.currentThread());
                    } catch (InterruptedException e) {
                        Thread.currentThread().interrupt();
                    }
                });
            }
        } // ExecutorService auto-closes
    }
}

To enable virtual thread in a Spring Boot application, just add the below property in application.properties:


spring.threads.virtual.enabled=true

2. Async/Await (JavaScript, Python, C#)

Async/await represents a programming model that makes asynchronous code look and behave more like synchronous code, while still maintaining non-blocking execution. Think of it as a smart waiter who takes multiple orders simultaneously—when one table's order is cooking, they serve other tables instead of standing idle.

The Event Loop Architecture in JavaScript

How Async/Await Works Step by Step

Function encounters await: Execution pauses, but doesn't block the thread
Async operation starts: I/O operation runs in background (separate thread or event-driven)
Thread continues: Main thread processes other tasks from the event queue
Operation completes: Result is placed in callback queue
Event loop checks: When main thread is free, it picks up the completed task
Function resumes: Execution continues from where it left off with the result

JavaScript (Node.js) Example


// Advanced async/await patterns in JavaScript
const fs = require('fs').promises;
const https = require('https');

// Sequential vs Parallel execution
async function fetchUserData(userId) {
    console.log(`Fetching data for user ${userId}...`);
    
    // Sequential - takes ~3 seconds total
    const profile = await fetchUserProfile(userId);
    const preferences = await fetchUserPreferences(userId);
    const history = await fetchUserHistory(userId);
    
    return { profile, preferences, history };
}

async function fetchUserDataParallel(userId) {
    console.log(`Fetching data for user ${userId} in parallel...`);
    
    // Parallel - takes ~1 second total (longest operation)
    const [profile, preferences, history] = await Promise.all([
        fetchUserProfile(userId),    // 1 second
        fetchUserPreferences(userId), // 0.5 seconds  
        fetchUserHistory(userId)     // 0.8 seconds
    ]);
    
    return { profile, preferences, history };
}

Python (asyncio)


import asyncio
import time
from typing import List

async def fetch_data(user_id: int) -> dict:
  """Simulate async API call"""
  await asyncio.sleep(0.5)  # Simulate network delay
  return {'user_id': user_id, 'data': f'User {user_id} data'}

async def process_users_sequential(user_ids: List[int]) -> List[dict]:
  """Process users one by one (slow)"""
  results = []
  for user_id in user_ids:
    result = await fetch_data(user_id)
    results.append(result)
  return results

async def process_users_concurrent(user_ids: List[int]) -> List[dict]:
  """Process users concurrently (fast)"""
  tasks = [fetch_data(user_id) for user_id in user_ids]
  return await asyncio.gather(*tasks)

async def main():
  user_ids = [1, 2, 3, 4, 5]
  
  # Sequential processing
  start = time.time()
  await process_users_sequential(user_ids)
  sequential_time = time.time() - start
  
  # Concurrent processing  
  start = time.time()
  await process_users_concurrent(user_ids)
  concurrent_time = time.time() - start
  
  print(f"Sequential: {sequential_time:.2f}s")
  print(f"Concurrent: {concurrent_time:.2f}s")
  print(f"Speedup: {sequential_time/concurrent_time:.1f}x")

# Run the example
asyncio.run(main())

C# (.NET)


using System;
using System.Net.Http;
using System.Threading.Tasks;

public class AsyncExamples
{
  private static readonly HttpClient httpClient = new HttpClient();
  
  // Basic async/await pattern
  public async Task<string> GetUserAsync(int userId)
  {
    var response = await httpClient.GetAsync($"https://api.example.com/users/{userId}");
    return await response.Content.ReadAsStringAsync();
  }
  
  // Parallel execution
  public async Task<string[]> GetMultipleUsersAsync(int[] userIds)
  {
    var tasks = userIds.Select(id => GetUserAsync(id));
    return await Task.WhenAll(tasks);
  }
  
  // With timeout
  public async Task<string> GetUserWithTimeoutAsync(int userId, int timeoutSeconds = 5)
  {
    using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(timeoutSeconds));
    return await GetUserAsync(userId).WaitAsync(cts.Token);
  }
}

3. Coroutines (Kotlin)

Coroutines in Kotlin represent a paradigm where functions can be suspended and resumed, allowing for cooperative multitasking. Unlike preemptive threading where the OS forcibly switches between threads, coroutines voluntarily yield control at specific suspension points.

Think of coroutines like a dance floor where dancers (coroutines) take turns using the space. When one dancer needs to step out briefly (suspension point), they politely yield the floor to others, then smoothly resume their dance when ready.

Key characteristics of Kotlin coroutines:

Lightweight: Can create millions of coroutines with minimal memory overhead
Cooperative scheduling: Coroutines yield control at suspension points (like delay(), await())
Structured concurrency: Parent-child relationships ensure proper cleanup and cancellation
Thread-safe: Built-in mechanisms prevent common concurrency issues
Sequential-looking code: Write asynchronous code that reads like synchronous code

The scheduler efficiently manages execution by moving coroutines between active and suspended states. When a coroutine hits a suspension point (like waiting for network I/O), it yields the thread to other coroutines, maximizing resource utilization.


// Kotlin Coroutines Example
import kotlinx.coroutines.*

// Suspend functions can be paused and resumed
suspend fun fetchUser(userId: Int): String {
  delay(500) // Non-blocking delay
  return "User $userId"
}

suspend fun fetchUserData(userId: Int): Map<String, String> {
  delay(300)
  return mapOf("email" to "user$userId@example.com")
}

fun main() = runBlocking {
  println("Starting coroutines")
  
  // Run tasks concurrently with async
  val userDeferred = async { fetchUser(123) }
  val dataDeferred = async { fetchUserData(123) }
  
  // Wait for both results
  val user = userDeferred.await()
  val userData = dataDeferred.await()
  
  println("$user: $userData")
  
  // Launch a separate coroutine
  val job = launch {
    println("Background work starting")
    delay(200)
    println("Background work complete")
  }
  
  job.join() // Wait for completion
  println("All done!")
}

4. Go Coroutines (Goroutines)

Go's concurrency model is built around goroutines, which are lightweight threads managed by the Go runtime. Goroutines allow developers to write concurrent code without dealing with low-level thread management. Goroutines are functions that can run concurrently with other functions. They are extremely lightweight, allowing you to spawn thousands of them without significant overhead. The Go runtime manages scheduling and execution, making it easy to write concurrent programs.

Go's Goroutine Model:

M:N Threading: Maps millions of goroutines (M) to a small number of OS threads (N)
Work Stealing: Idle processors steal work from busy ones for load balancing
Cooperative: Goroutines yield at function calls, channel operations, and blocking system calls
Channel Communication: Type-safe message passing between goroutines
Lightweight: Each goroutine starts with only 2KB stack (grows as needed)
No Manual Thread Management: Runtime automatically handles scheduling and load distribution


// Go Coroutines (Goroutines) Example
package main

import (
  "fmt"
  "time"
)

func process(id int) {
  fmt.Printf("Started goroutine %d\n", id)
  time.Sleep(100 * time.Millisecond) // Simulate some work
  fmt.Printf("Finished goroutine %d\n", id)
}

func main() {
  // Launch multiple goroutines
  for i := 1; i <= 5; i++ {
    go process(i) // Non-blocking, returns immediately
  }
  
  // Wait to see results (in production, use sync.WaitGroup)
  time.Sleep(200 * time.Millisecond)
  fmt.Println("All goroutines completed")
}

5. Reactive Programming

Reactive programming treats data as streams of events that flow through your application. Instead of pulling data when you need it, you react to data as it arrives. Think of it like a newspaper subscription—instead of going to the store every day to check for new issues, the newspaper is delivered to you when it's available.

Reactive frameworks like RxJava and Spring Reactor excel at handling asynchronous, event-driven workloads with built-in backpressure management.

How Reactive Streams Work

Non-Blocking Callback Mechanism

In reactive programming, data flows through streams where each stage processes events asynchronously:

Publisher emits data: Sources like databases, APIs, or user events push data into the stream
Operators transform data: Each transformation (filter, map, buffer) happens without blocking
Subscribers react to events: Multiple consumers can process the same stream independently
Callbacks handle flow control: onNext() for new data, onError() for failures, onComplete() for stream end
Backpressure prevents overflow: Subscribers signal how much data they can handle via request(n)

The key advantage is that no thread ever blocks. When an operation needs to wait (like a database query), the thread is released to handle other work. When the result arrives, a callback is triggered to continue processing.

This architecture enables handling millions of concurrent streams with just a handful of threads, making it ideal for high-throughput, low-latency applications like real-time analytics, IoT data processing, and high-frequency trading systems.

Key Features of Reactive Programming:

Asynchronous processing: Operations don't block the execution thread
Event-driven: Responds to events as they happen
Backpressure: Consumers can regulate how much data producers send
Functional composition: Chain operations on streams easily
Non-blocking: Efficiently uses resources while waiting for I/O

Example with Spring WebFlux/Reactor


// Spring WebFlux Reactive REST API Example
@RestController
public class UserController {
    private final UserRepository userRepository;
    
    public UserController(UserRepository userRepository) {
        this.userRepository = userRepository;
    }
    
    @GetMapping("/users")
    public Flux<User> getAllUsers() {
        return userRepository.findAll();
    }
    
    @GetMapping("/users/{id}")
    public Mono<User> getUserById(@PathVariable String id) {
        return userRepository.findById(id)
            .switchIfEmpty(Mono.error(new UserNotFoundException(id)));
    }
    
    @PostMapping("/users")
    public Mono<User> createUser(@RequestBody User user) {
        return userRepository.save(user);
    }
    
    @GetMapping("/users/search")
    public Flux<User> searchUsers(@RequestParam String query) {
        return userRepository.findByNameContaining(query)
            .filter(user -> user.isActive())
            .flatMap(user -> enrichUserData(user))
            .timeout(Duration.ofSeconds(3));
    }
    
    private Mono<User> enrichUserData(User user) {
        return Mono.just(user)
            .zipWith(fetchUserPreferences(user.getId()))
            .map(tuple -> {
                User u = tuple.getT1();
                u.setPreferences(tuple.getT2());
                return u;
            });
    }
    
    private Mono<UserPreferences> fetchUserPreferences(String userId) {
        // Simulate fetching from another service
        return WebClient.create("https://preferences-service")
            .get()
            .uri("/preferences/{id}", userId)
            .retrieve()
            .bodyToMono(UserPreferences.class)
            .onErrorResume(e -> Mono.just(new UserPreferences()));
    }
}

In this example, all database and network operations are non-blocking. The application can handle thousands of concurrent requests with a small number of threads. When one operation is waiting for I/O, the thread can be used to process other requests. Backpressure ensures that if a client is slow at consuming data, the server won't overwhelm it with more data than it can handle.

Conclusion

In this article, we explored four powerful concurrency models: Java Virtual Threads, Async Await, Kotlin Coroutines, and Go Goroutines. Each model has its strengths and is suited for different types of applications.

For more in-depth tutorials on Java, Spring, and modern software development, check out my content:

🔗 Blog: https://codewiz.info
🔗 LinkedIn: https://www.linkedin.com/in/code-wiz-740370302/
🔗 Medium: https://medium.com/@code.wizzard01
🔗 Github: https://github.com/CodeWizzard01

Process vs Thread: Understanding Concurrency in Modern Applications

A comprehensive guide to understanding processes and threads, their differences, advantages, and when to use each. Includes practical examples with Java multithreading.

Java Spring Boot Concurrency: Parallel Task Execution Guide

Learn parallel task execution in Java and Spring Boot using CompletableFuture, @Async, Virtual Threads, and Structured Concurrency for better performance.