If you are building AI-powered applications, having access to multiple Large Language Models (LLMs) can significantly enhance your application's capabilities. Each LLM has its unique strengths, pricing models, and characteristics. Spring AI, Spring's official AI framework, makes it remarkably easy to integrate and switch between different LLM providers seamlessly.
In this blog, we'll build a Spring Boot application that integrates three LLMs: OpenAI, Ollama (running Mistral), and Google Gemini.
Before diving into the implementation, let's go through some of the reasons why you might want to integrate multiple LLMs in your application:
Cost Optimization : Different LLM providers offer varying pricing models. You might use OpenAI's GPT-4 for complex reasoning tasks while using Ollama's local models for simple queries to reduce costs.
Performance Characteristics: Each model excels in different areas. For example OpenAI's models are great for general-purpose tasks, while Ollama's models can be deployed locally for privacy-sensitive applications.
Reliability and Fallbacks: Having multiple providers ensures your application remains functional even if one service experiences downtime or rate limiting.
Specialized Use Cases: Different models might be optimized for specific domains like code generation, creative writing, or technical documentation.
Let's start by setting up our Spring Boot project with the necessary dependencies.
First, create a new Spring Boot project and add the following dependencies to your pom.xml
:
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>3.5.3</version> <relativePath/> </parent> <groupId>com.codewiz</groupId> <artifactId>multillm</artifactId> <version>0.0.1-SNAPSHOT</version> <name>multillm</name> <description>Multi-LLM Integration with Spring AI</description> <properties> <java.version>24</java.version> <spring-ai.version>1.0.0</spring-ai.version> </properties> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-starter-model-ollama</artifactId> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-starter-model-openai</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency> </dependencies> <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-bom</artifactId> <version>${spring-ai.version}</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> </project>
Next, configure the application properties in src/main/resources/application.properties
:
spring.application.name=multillm # OpenAI Configuration spring.ai.openai.api-key=${OPENAI_API_KEY} spring.ai.openai.chat.options.model=gpt-4.1-nano # Ollama Configuration spring.ai.ollama.chat.options.model=mistral:7b # Gemini Configuration (using OpenAI-compatible endpoint) gemini.api.key=${GEMINI_API_KEY} gemini.api.url=https://generativelanguage.googleapis.com/v1beta/openai gemini.api.completions.path=/chat/completions gemini.model.name=gemini-2.5-flash # Server Configuration server.port=8100
Note: Make sure to set your environment variables:
export OPENAI_API_KEY="your-openai-api-key" export GEMINI_API_KEY="your-gemini-api-key"
Let us first create an enum to represent the different LLM types:
package com.codewiz.multillm.dto; public enum LLMType { OPENAI("openai"), OLLAMA("ollama"), GEMINI("gemini"); private final String value; LLMType(String value) { this.value = value; } public String getValue() { return value; } }
Now let us add a Chat Response DTO
package com.codewiz.multillm.dto; public class ChatResponse { private String response; private String llm; private String originalMessage; private long timestamp; // Getters and setters }
Create a simple REST controller to handle chat requests:
@RestController public class ChatController { private final ChatService chatService; @Autowired public ChatController(ChatService chatService) { this.chatService = chatService; } @GetMapping("/chat") public ResponseEntity<ChatResponse> chat( @RequestParam String message, @RequestParam String llm) { String response = chatService.chat(llm, message); return ResponseEntity.ok(new ChatResponse(response, llm, message)); } }
The service layer handles the logic for routing requests to different LLM providers:
@Service public class ChatService { private final ChatClient openAIChatClient; private final ChatClient ollamaChatClient; private final ChatClient geminiChatClient; @Autowired public ChatService(OpenAiChatModel openAiChatModel, OllamaChatModel ollamaChatModel, @Qualifier("geminiChatClient") ChatClient geminiChatClient) { this.openAIChatClient = ChatClient.create(openAiChatModel); this.ollamaChatClient = ChatClient.create(ollamaChatModel); this.geminiChatClient = geminiChatClient; } public String chat(String llmName, String message) { var chatClient = getChatModel(LLMType.valueOf(llmName.toUpperCase())); return chatClient.prompt() .user(message) .call() .content(); } private ChatClient getChatModel(LLMType llmName) { return switch (llmName) { case OPENAI -> openAIChatClient; case OLLAMA -> ollamaChatClient; case GEMINI -> geminiChatClient; }; } }
OpenAI and Ollama integration is straightforward with Spring AI's starter dependencies:
Spring AI provides native support for OpenAI through the spring-ai-starter-model-openai
dependency. The configuration is handled automatically using the properties we defined:
spring.ai.openai.api-key=${OPENAI_API_KEY} spring.ai.openai.chat.options.model=gpt-4.1-nano
Ollama integration is equally simple with the spring-ai-starter-model-ollama
dependency. Ollama runs locally, so you'll need to:
ollama pull mistral:7b
spring.ai.ollama.chat.options.model=mistral:7b
Once you add the dependencies and configure the properties, Spring AI automatically loads the ChatModel
bean for both OpenAI and Ollama.
Google Gemini doesn't have a dedicated Spring AI starter, but we can leverage Gemini's OpenAI-compatible API endpoint using the existing OpenAI dependency.
Create a configuration class to set up Gemini using the OpenAI client:
@Configuration public class LLMConfig { @Bean @Qualifier("geminiChatClient") public ChatClient geminiChatModel( OpenAiChatModel baseChatModel, @Value("${gemini.api.key}") String apiKey, @Value("${gemini.api.url}") String geminiUrl, @Value("${gemini.api.completions.path}") String completionsPath, @Value("${gemini.model.name}") String modelName) { var geminiApi = OpenAiApi.builder() .baseUrl(geminiUrl) .completionsPath(completionsPath) .apiKey(apiKey) .build(); var customModel = baseChatModel.mutate() .openAiApi(geminiApi) .defaultOptions(OpenAiChatOptions.builder().model(modelName).build()) .build(); return ChatClient.create(customModel); } }
This configuration:
Now that our application is complete, let's test it using HTTPie. First, start your application:
./mvnw spring-boot:run
# Test OpenAI and get model information http GET localhost:8100/chat \ message=="What model are you? Please provide your name, version, and key capabilities." \ llm==openai
Sample Response:
{ "llm": "openai", "originalMessage": "What model are you? Please provide your name, version, and key capabilities.", "response": "I am ChatGPT, based on the GPT-4 architecture developed by OpenAI. My version includes improvements in understanding and generating human-like text, enabling me to assist with a wide range of tasks such as answering questions, providing explanations, composing creative writing, and more. I can understand context, handle complex prompts, and generate coherent and relevant responses across various topics.", "timestamp": 1753268746222 }
# Test Ollama and get model information http GET localhost:8100/chat \ message=="What model are you? Please tell me your name, version, and what you're good at." \ llm==ollama
Sample Response:
{ "llm": "ollama", "originalMessage": "What model are you? Please tell me your name, version, and what you're good at.", "response": " I am a model of the Chat Model developed by Mistral AI. My primary function is to assist with various tasks by providing information, answering questions, and engaging in conversation. I strive to provide precise, helpful, and courteous responses.\n\nWhile I don't have a personal name, you can think of me as your digital assistant designed to make your interactions more enjoyable and productive. My capabilities include but are not limited to: answering questions, providing explanations, discussing a wide range of topics, assisting with scheduling and organization, offering recommendations, and much more.\n\nIn terms of my version, I am part of the latest generation of models, continually learning and improving from the data it encounters during interactions like this one.", "timestamp": 1753268772790 }
# Test Gemini and get model information http GET localhost:8100/chat \ message=="Please identify yourself. What model are you, what version, and what are your strengths?" \ llm==gemini
Sample Response:
{ "llm": "gemini", "originalMessage": "Please identify yourself. What model are you, what version, and what are your strengths?", "response": "I am a large language model, trained by Google.\n\n**Model & Version:**\nUnlike traditional software with specific version numbers, large language models like me are continuously updated and refined. There isn't a single, publicly accessible \"version number\" in the way you might think of software like....", "timestamp": 1753268800297 }
Integrating multiple LLMs with Spring AI is quite simple. By leveraging Spring AI's capabilities, you can easily switch between different LLM providers based on your application's needs. This flexibility allows you to optimize for cost, performance, and reliability while providing a seamless user experience.
You can find the complete source code for this project on my GitHub repository: CodeWizzard01/spring-ai-multiple-llm
For more in-depth tutorials on Java, Spring, and modern software development practices, follow me for more content:
🔗 Blog 🔗 YouTube 🔗 LinkedIn 🔗 Medium 🔗 Github
Stay tuned for more content on the latest in AI and software engineering!
Learn about Model Context Protocol (MCP) and how to build an MCP server and client using Java and Spring. Explore the evolution of AI integration and the benefits of using MCP for LLM applications.
Learn how to build powerful AI agents using Google's Agent Development Kit (ADK) with Java. Explore ADK architecture, core concepts, and create a learning assistant agent with tools and MCP integration.
Get instant AI-powered summaries of YouTube videos and websites. Save time while enhancing your learning experience.