Multi-Agent AI Architecture Temporal Java / Spring Boot Distributed Systems

Building Production Multi-Agent AI Systems

Architecture patterns for durable, scalable agent orchestration — Planner, Executor, Memory, and Synthesizer agents coordinated by Temporal workflows, scaled to 100M+ executions per day.

Sandeep Erelli · Staff Software Engineer · 13+ Years Experience · 12 min read

The Case for Multi-Agent Systems in Enterprise

The next generation of enterprise software isn't just AI-assisted — it's AI-orchestrated. Multi-agent systems represent a fundamental shift: instead of a single model trying to do everything, you compose a network of specialized agents, each excellent at a narrow task, coordinated by an orchestrator that routes work to the right agent at the right time.

This isn't a research paper. Everything here reflects production patterns I've applied building systems that process tens of millions of workflows per day in Java, Spring Boot, and open-source Temporal — with Cassandra, Azure CosmosDB, Redis, and OpenSearch as the data backbone.

Core insight: Agents that can fail independently are far more reliable than a single monolithic agent. This mirrors the microservices principle: keep components small, testable, and independently deployable.


What is a Multi-Agent System?

A multi-agent system (MAS) consists of multiple autonomous AI agents that perceive their environment, reason over it, and take targeted actions — either independently or in coordination with other agents. In an enterprise context, this means:


Reference Architecture

The diagram below represents the full production architecture. Every box is an independently deployable, independently scalable component. Temporal is the durable backbone that guarantees at-least-once execution of every agent step — even across pod restarts, network partitions, or agent timeouts.

// Production Multi-Agent Architecture — Java + Spring Boot + Temporal User / API Request REST · gRPC · Kafka Event Orchestrator / Planner Agent Task decomposition · Strategy selection · Agent routing Memory Layer Redis — short-term context Vector DB — long-term RAG Temporal Workflows Durable · Retryable Cassandra · CosmosDB Tool Registry REST APIs · Databases Search · Code Execution Data Agent Ingestion · ETL Search Agent Retrieval · RAG Code Agent Generate · Analyze Action Agent Execute · Trigger Synthesizer Agent Aggregation · Reasoning · Response formatting Final Response Structured · Grounded Orchestration flow Agent output flow Memory / Tool access sandeep.erelli.dev
Fig. 1 — Production multi-agent architecture. Temporal provides durable orchestration across all agent executions.

Core Components

1. The Orchestrator / Planner Agent

The orchestrator is the brain of the system. It receives the raw user request and decomposes it into a structured execution plan: which agents are needed, what order they run in, and which steps can run in parallel. In Java, this maps directly to a Temporal WorkflowInterface.

MultiAgentWorkflow.java
@WorkflowInterface
public interface MultiAgentWorkflow {
    @WorkflowMethod
    AgentResponse execute(AgentRequest request);
}

@WorkflowImpl
public class MultiAgentWorkflowImpl implements MultiAgentWorkflow {

    private final AgentActivities agents = Workflow.newActivityStub(
        AgentActivities.class,
        ActivityOptions.newBuilder()
            .setStartToCloseTimeout(Duration.ofMinutes(5))
            .setRetryOptions(RetryOptions.newBuilder()
                .setMaximumAttempts(3)
                .setInitialInterval(Duration.ofSeconds(1))
                .setBackoffCoefficient(2.0)
                .build())
            .build());

    @Override
    public AgentResponse execute(AgentRequest request) {
        // Step 1: Decompose into a plan
        AgentPlan plan = agents.planTask(request);

        // Step 2: Run parallel steps concurrently
        List<Promise<AgentResult>> parallel = plan.getParallelSteps().stream()
            .map(step -> Async.function(agents::executeStep, step))
            .collect(Collectors.toList());

        // Step 3: Run sequential steps in order
        List<AgentResult> results = new ArrayList<>();
        plan.getSequentialSteps().forEach(step ->
            results.add(agents.executeStep(step)));

        // Wait for all parallel steps
        parallel.forEach(p -> results.add(p.get()));

        // Step 4: Synthesize into final response
        return agents.synthesize(results, request.getContext());
    }
}

2. Temporal for Durable Orchestration

The most critical architectural decision: using open-source Temporal as the workflow backbone. Without durable orchestration, multi-agent systems have a fundamental fragility problem — if an agent call times out or the worker pod restarts mid-execution, you lose the entire chain.

Temporal's core guarantee: Every activity execution — its inputs, outputs, and timing — is event-sourced into Cassandra. When a worker restarts, it replays the workflow history and continues exactly where it left off. At 100M+ executions per day, this is non-negotiable.

3. Specialized Agents

Each agent is a Spring component implementing a simple execute(AgentStep) contract. The key principle: agents don't know about each other. They receive a structured step and return a structured result. All coordination happens at the workflow layer.

SearchAgent.java
@Component
public class SearchAgent implements SpecializedAgent {

    @Autowired private OpenSearchClient searchClient;
    @Autowired private RedisTemplate<String, AgentResult> memoryStore;

    @Override
    public AgentResult execute(AgentStep step) {
        // Check memory cache first (short-term Redis)
        String cacheKey = "search:" + step.getQueryHash();
        AgentResult cached = memoryStore.opsForValue().get(cacheKey);
        if (cached != null) return cached;

        // kNN semantic search via OpenSearch
        SearchResponse<Map> resp = searchClient.search(req -> req
            .index(step.getTargetIndex())
            .knn(k -> k
                .field("embedding")
                .queryVector(step.getEmbedding())
                .numCandidates(100)
                .k(step.getTopK()))
            .size(step.getTopK()),
            Map.class);

        AgentResult result = AgentResult.fromHits(resp.hits());

        // Cache for 10 minutes (short-term memory)
        memoryStore.opsForValue().set(cacheKey, result,
            Duration.ofMinutes(10));

        return result;
    }
}

4. Worker Registration & Temporal Configuration

TemporalWorkerConfig.java
@Configuration
public class TemporalWorkerConfig {

    @Bean
    public WorkflowClient workflowClient(
            @Value("${temporal.host}") String host) {
        WorkflowServiceStubs stubs = WorkflowServiceStubs
            .newLocalServiceStubs(
                WorkflowServiceStubsOptions.newBuilder()
                    .setTarget(host).build());
        return WorkflowClient.newInstance(stubs);
    }

    @Bean
    public Worker multiAgentWorker(
            WorkflowClient client,
            AgentActivitiesImpl activities,
            @Value("${temporal.task-queue}") String queue) {

        WorkerFactory factory = WorkerFactory.newInstance(client);
        Worker worker = factory.newWorker(queue,
            WorkerOptions.newBuilder()
                .setMaxConcurrentActivityExecutionSize(200)
                .setMaxConcurrentWorkflowTaskExecutionSize(100)
                .build());

        worker.registerWorkflowImplementationTypes(
            MultiAgentWorkflowImpl.class);
        worker.registerActivitiesImplementations(activities);
        factory.start();
        return worker;
    }
}

Scaling to 100M+ Executions / Day

Getting to 100M+ daily executions required addressing three bottlenecks: infrastructure sizing, worker pool tuning, and async dispatch.

1. Temporal Infrastructure

2. Kafka as the Async Dispatch Layer

Don't dispatch 100M workflows per day via synchronous WorkflowClient calls. Instead, front the Temporal client with a Kafka consumer: events fan out to Temporal durably, giving you a buffer that absorbs traffic spikes without dropping work or overwhelming the Temporal frontend service.

WorkflowDispatcher.java
@Component
public class WorkflowDispatcher {

    @Autowired private WorkflowClient temporalClient;

    @KafkaListener(
        topics = "${kafka.topic.agent-requests}",
        groupId = "agent-workflow-dispatcher")
    public void dispatch(@Payload AgentRequest request) {
        MultiAgentWorkflow wf = temporalClient.newWorkflowStub(
            MultiAgentWorkflow.class,
            WorkflowOptions.newBuilder()
                .setTaskQueue("multi-agent-queue")
                .setWorkflowId(request.getRequestId())
                .setWorkflowExecutionTimeout(
                    Duration.ofHours(2))
                .build());

        WorkflowClient.start(wf::execute, request);
    }
}

Production Lessons


Conclusion

Multi-agent systems built on Temporal's durable orchestration represent the right abstraction for enterprise AI at scale. The combination of specialized Java/Spring Boot agents, Redis-backed short-term memory, OpenSearch for long-term RAG retrieval, Kafka for async dispatch, and Temporal for fault-tolerant workflow execution gives you a system you can operate in production — not just demonstrate in a demo.

The architecture scales from a single-tenant prototype to 100M+ daily executions with the same core code, simply by horizontally scaling Temporal workers and growing your Cassandra cluster to match the workflow state volume.

The biggest mistake I see engineers make with agent systems is skipping the durability layer and relying on synchronous in-process orchestration. The moment you hit a network timeout, a pod restart, or a multi-hour workflow that spans a maintenance window, that approach fails. Durable-first is the only architecture worth building on.