Building Reliable Office Automation with Temporal, Local LLMs, and Robot Framework
Most automation projects fail not because AI is weak — but because workflow reliability is underestimated.
Emails arrive late.
Humans respond days later.
Systems crash.
SAP has no API.
If your automation cannot pause, retry, resume, and audit, it will eventually break.
This article explains a modern, production-ready architecture that combines:
- Temporal — durable workflow orchestration
- Local LLMs (Ollama) — intelligent routing & extraction
- Robot Framework — UI automation for systems without APIs
This stack is especially suitable for finance, ERP, procurement, and office processes.
The Problem with Typical Automation Stacks
Most teams start with:
- Cron jobs
- Message queues
- Low-code tools
- LLM agents
These work well for short tasks, but break down when workflows:
- run for hours or days
- require human approval
- involve retries and compensations
- must survive restarts and deployments
Example failures:
- Invoice posted twice after retry
- Approval lost because a worker crashed
- LLM made a confident but wrong decision
- UI automation ran out of order
What’s missing is a durable workflow engine.
The Core Idea: Separate Intelligence, Orchestration, and Execution
The key architectural principle is separation of responsibility:
| Layer | Responsibility |
|---|---|
| Temporal | Orchestration, state, retries, guarantees |
| Local LLM | Understanding & classification |
| Robot Framework | UI execution (SAP, legacy systems) |
No layer does another layer’s job.
High-Level Architecture (Text Diagram)
Email / API / Schedule
|
v
Temporal Workflow (Source of Truth)
|
+--> Activity: Local LLM (Ollama)
| - classify email
| - extract fields
| - return confidence
|
+--> Gate (confidence / rules)
|
+--> Activity: Robot Framework
| - SAP UI automation
| - legacy system interaction
|
+--> Signals
- human approval
- correction
Temporal owns what happens next, not the AI.
Why Temporal (Not Cron, Not Queues, Not n8n Alone)
Temporal is different because it provides:
- Durable execution — workflows survive crashes
- Exactly-once semantics — no duplicate invoices
- Deterministic replay — perfect audit trail
- Human-in-the-loop — wait days without hacks
- Built-in retries & timeouts
In short:
Temporal treats workflows like a database treats data.
Role of the Local LLM (Ollama)
The LLM is not the brain of the system.
It acts as an advisor.
Typical responsibilities:
- Classify incoming email (invoice, RFQ, reminder)
- Extract structured data (invoice number, amount)
- Report confidence level
- Explain uncertainty
Example LLM output:
{
"workflow_key": "invoice.receive",
"confidence": 0.82,
"extracted_fields": {
"invoice_no": "INV-2025-001",
"amount": 125000,
"vendor": "ABC SUPPLY"
}
}
Important rule:
LLMs suggest. Temporal decides.
Gate Design: Where Automation Becomes Safe
Temporal enforces gate logic:
if decision.confidence < 0.8:
wait_for_human()
Other gates may include:
- required fields present
- amount threshold
- vendor allowlist
- duplicate detection
This prevents:
- hallucinated actions
- silent failures
- irreversible mistakes
Robot Framework: Automating Systems Without APIs
Many enterprise systems (SAP GUI, legacy ERP) still rely on UI.
Robot Framework is ideal because:
- deterministic
- testable
- screenshot & log based
- widely supported
Temporal runs Robot Framework as an activity, meaning:
- retries are controlled
- failures are visible
- execution is isolated
If Robot crashes:
- Temporal retries or escalates
- workflow state remains intact
Example Workflow: Vendor Sends Invoice by Email
- Email arrives
- Temporal workflow starts
- LLM classifies email → invoice.receive
- Confidence = 0.91 → passes gate
- Robot Framework posts invoice in SAP UI
- SAP document number returned
- Workflow completes
- Audit trail preserved forever
If confidence were low:
- workflow pauses
- human approves or corrects
- workflow resumes automatically
Why This Stack Scales
Scaling LLM usage
- Local inference (Ollama)
- Stateless activities
- Easy model swap
Scaling automation
- Temporal workers scale horizontally
- Robot runners isolated per VM
- No shared mutable state
Scaling teams
- Developers work in Git
- Ops monitor Temporal UI
- Business sees audit logs
When NOT to Use This Stack
This architecture is not for:
- simple Zapier-style automation
- chatbot-only projects
- short-lived scripts
- one-off integrations
It is for:
- finance workflows
- ERP processes
- approvals
- compliance-sensitive automation
- systems that must never double-execute
Final Takeaway
If your automation:
- touches money
- waits for humans
- integrates with fragile systems
- must survive failure
Then:
Temporal is the backbone,
LLMs are advisors,
Robot Framework is the hands.
This separation is what turns AI automation from a demo into a reliable system.
Get in Touch with us
Related Posts
- Temporal × 本地大模型 × Robot Framework 面向中国企业的可靠业务自动化架构实践
- RPA + AI: 为什么没有“智能”的自动化一定失败, 而没有“治理”的智能同样不可落地
- RPA + AI: Why Automation Fails Without Intelligence — and Intelligence Fails Without Control
- Simulating Border Conflict and Proxy War
- 先解决“检索与访问”问题 重塑高校图书馆战略价值的最快路径
- Fix Discovery & Access First: The Fastest Way to Restore the University Library’s Strategic Value
- 我们正在开发一个连接工厂与再生资源企业的废料交易平台
- We’re Building a Better Way for Factories and Recyclers to Trade Scrap
- 如何使用 Python 开发 MES(制造执行系统) —— 面向中国制造企业的实用指南
- How to Develop a Manufacturing Execution System (MES) with Python
- MES、ERP 与 SCADA 的区别与边界 —— 制造业系统角色与连接关系详解
- MES vs ERP vs SCADA: Roles and Boundaries Explained
- 为什么学习软件开发如此“痛苦” ——以及真正有效的解决方法
- Why Learning Software Development Feels So Painful — and How to Fix It
- 企业最终会选择哪种 AI:GPT 风格,还是 Gemini 风格?
- What Enterprises Will Choose: GPT-Style AI or Gemini-Style AI?
- GPT-5.2 在哪些真实业务场景中明显优于 GPT-5.1
- Top Real-World Use Cases Where GPT-5.2 Shines Over GPT-5.1
- ChatGPT 5.2 与 5.1 的区别 —— 用通俗类比来理解
- ChatGPT 5.2 vs 5.1 — Explained with Simple Analogies













