Building a SOC from Scratch: A Real-World Wazuh + IRIS-web Field Report

A three-week, commit-by-commit account of building a production Security Operations Center using Wazuh 4.x, IRIS-web, and a custom FastAPI integrator — the detection rules, the alert pipelines, the IOC enrichment, and the infrastructure bugs no one puts in their architecture diagrams.

Stack: Wazuh 4.x · IRIS-web · soc-integrator (FastAPI) · OpenSearch · Docker Compose · VirusTotal API · AbuseIPDB

flowchart TD
  subgraph "Log Sources"
    A["Windows Agent"]
    B["FortiGate Syslog"]
    C["Simulator Scripts"]
  end

  subgraph "Wazuh"
    D["Wazuh Manager"]
    E["Decoders + Rules"]
    F["OpenSearch Indexer"]
  end

  subgraph "soc-integrator (FastAPI)"
    G["Alert Poller"]
    H["Severity Filter"]
    I["IOC Enricher"]
    J["WazuhSyslogAdapter"]
    K["Webhook Receiver"]
    L["Email Notifier"]
  end

  subgraph "External Threat Intel"
    M["VirusTotal"]
    N["AbuseIPDB"]
    O["Feodo / URLhaus / ThreatFox"]
  end

  subgraph "IRIS-web"
    P["IRIS Alerts"]
    Q["Cases + Triage"]
    R["Outbound Webhook"]
  end

  A --> D
  B --> D
  C --> D
  D --> E
  E --> F
  F --> G
  G --> H
  H --> I
  I --> M
  I --> N
  O --> I
  I --> P
  P --> Q
  Q --> R
  R --> K
  K --> L
  J --> D
  I --> J

Week 1: The Blank Canvas

Every SOC project starts the same way: a blinking cursor and a list of threat scenarios that need to be detected.

Ours came as an internal specification — three appendices of use cases:

Appendix A — Windows and Active Directory abuse
Appendix B — Network and firewall events (FortiGate)
Appendix C — Identity: impossible travel, credential abuse, admin lateral movement

The first commits were unglamorous. A multipart dependency. Sample log files for the three appendices — 137 lines of real-looking firewall, authentication, and Windows event data scraped from production-adjacent systems. Then a pass.txt landed in the repo: credential notes and test outputs from early SSH sessions.

By March 16, the first real milestone was in: a test-firewall-syslog.py script that fired FortiGate-style UDP syslog packets at Wazuh’s port 514 across 10 different scenarios.

The Docker NAT Problem

The --via-docker flag was added almost immediately — on day two. Without it, every packet arrived in Wazuh with the Docker gateway IP as the source instead of the host. Rule matching by source IP was completely useless. The flag forces packets to route through the host network stack so Wazuh sees the correct origin.

If you’re building Wazuh simulations inside Docker and your source-IP rules never fire, this is why.

flowchart TD
  A["test-firewall-syslog.py"]
  B{"--via-docker flag?"}
  C["Packet exits via Docker NAT"]
  D["Wazuh sees Docker gateway IP"]
  E["Source-IP rules never match"]
  F["Packet exits via host network"]
  G["Wazuh sees real host IP"]
  H["Source-IP rules match correctly"]

  A --> B
  B -- "No" --> C
  C --> D
  D --> E
  B -- "Yes" --> F
  F --> G
  G --> H

Week 2: Rules, Decoders, and the OR-Trap

Detection engineering is mostly rewriting things you thought were correct.

March 17 produced a burst of commits tagged progress, progress-update, rule update. Fifty-nine simulation rules were taking shape across Appendix A, B, and C — but something was wrong. The A2 and A3 rules were firing when they shouldn’t.

Root Cause: The Multi-`<match>` OR-Trap

Wazuh’s XML rule syntax allows multiple <match> tags inside a single rule. Most engineers assume this works as AND logic. In Wazuh 4.x with certain decoder chains, it behaves as OR.

A rule written to fire only when action=deny AND logid=13 were both present was occasionally firing on just one of those conditions.

<!-- WRONG: behaves as OR in some decoder chains -->
<rule id="100201" level="10">
  <match>action=deny</match>
  <match>logid=13</match>
  <description>Firewall block — specific log ID</description>
</rule>

<!-- CORRECT: single regex with lookaheads enforces AND -->
<rule id="100201" level="10">
  <regex>action=deny.*logid=13|logid=13.*action=deny</regex>
  <description>Firewall block — specific log ID</description>
</rule>

The March 22 commit describes the fix tersely:

"fix A2/A3 rule OR-trap: replace multi-<match> with single <regex> lookaheads."

That one commit hides an afternoon of wazuh-logtest sessions feeding sample events and watching rules fire on inputs they should have ignored. If your Wazuh rules fire more broadly than expected, check whether you have multiple <match> tags in the same rule.

flowchart TD
  subgraph "WRONG: multi-match behaves as OR"
    A1["Incoming log event"]
    B1["match: action=deny"]
    C1["match: logid=13"]
    D1["Rule fires if EITHER matches"]
    A1 --> B1
    A1 --> C1
    B1 --> D1
    C1 --> D1
  end

  subgraph "CORRECT: single regex enforces AND"
    A2["Incoming log event"]
    B2["regex: action=deny AND logid=13"]
    C2["Rule fires only when BOTH match"]
    A2 --> B2
    B2 --> C2
  end

Week 3: Wiring Wazuh to IRIS

Firing a Wazuh rule and seeing it in a log file is one thing. Getting it into a case management platform where an analyst can triage it is another problem entirely.

The soc-integrator Pipeline

The answer was soc-integrator — a FastAPI service that sits between Wazuh and IRIS-web:

flowchart TD
  A["Raw Security Event"]
  B["Wazuh Manager"]
  C["Decoder Chain"]
  D["Rule Matching"]
  E["OpenSearch Indexer"]
  F["soc-integrator Poller"]
  G{"Severity >= threshold?"}
  H["Discard (noise)"]
  I["Create IRIS Alert"]
  J["IRIS-web Case Queue"]

  A --> B
  B --> C
  C --> D
  D --> E
  E --> F
  F --> G
  G -- "No" --> H
  G -- "Yes" --> I
  I --> J

The integrator:

Polls Wazuh Indexer (OpenSearch) every N seconds
Forwards only alerts at or above a configurable severity threshold (default: medium — everything below is noise)
Exposes GET/PUT endpoints so an analyst can adjust the threshold at runtime without a restart
Creates structured IRIS Alerts with full event metadata

A 7-step end-to-end test script confirmed the flow from raw event to IRIS Alert on March 23.

Three Timezone Fixes in One Day

All containers were running UTC. Analysts in Bangkok (ICT, UTC+7) were looking at timestamps seven hours off.

Standard containers: TZ=Asia/Bangkok in the Docker Compose env block
Go/scratch base images: timezone database stripped at build time — had to be mounted explicitly as a volume
Bonus fix: a dual ICT/UTC clock widget added to the IRIS navbar

Analysts noticed immediately. Small thing, real impact.

IOC Enrichment and Ditching Shuffle SOAR

Until March 24, threat intelligence was manual — an analyst would look up a suspicious IP. The new IOC pipeline replaced that entirely:

Ad-hoc lookups:

VirusTotal — per IP/domain/hash reputation
AbuseIPDB — IP abuse history

Background feed ingestion:

Feodo Tracker — C2 infrastructure
URLhaus — malicious URLs
ThreatFox — IOC aggregation
MalwareBazaar — malware hashes

Wazuh CDB list files (malicious-ip, malicious-domains, malware-hashes) are regenerated and hot-reloaded via the Wazuh API. New rules 110600–110602 handle inline CDB matching.

flowchart TD
  subgraph "Ad-hoc Lookups"
    A["Suspicious IP / Domain / Hash"]
    B["VirusTotal API"]
    C["AbuseIPDB API"]
    A --> B
    A --> C
  end

  subgraph "Background Feed Ingestion"
    D["Feodo Tracker (C2 IPs)"]
    E["URLhaus (Malicious URLs)"]
    F["ThreatFox (IOC Aggregator)"]
    G["MalwareBazaar (Hashes)"]
  end

  subgraph "Wazuh CDB Hot-Reload"
    H["malicious-ip list"]
    I["malicious-domains list"]
    J["malware-hashes list"]
    K["Wazuh API reload trigger"]
    H --> K
    I --> K
    J --> K
  end

  subgraph "Detection Rules"
    L["Rule 110600: IP match"]
    M["Rule 110601: Domain match"]
    N["Rule 110602: Hash match"]
  end

  B --> H
  C --> H
  D --> H
  E --> I
  F --> H
  F --> I
  G --> J
  K --> L
  K --> M
  K --> N

Shuffle SOAR was removed entirely. Direct API calls were faster, simpler, and didn’t require maintaining a separate workflow platform.

The Private IP Leak

Early testing had the integrator submitting 192.168.x.x and 10.x.x.x addresses to VirusTotal — eating API quota and generating 429 rate-limit errors on internal scan traffic.

Fix: skip RFC1918 and loopback ranges before any external enrichment call. Always check for private IPs before hitting external threat intel APIs.

Disk Pressure, Dashboards That Show Nothing, and Sync Noise

The logall_json Disaster

The logall_json setting in Wazuh Manager had been enabled during development for debugging. In production it was writing 14 GB of archives.json per day.

Fix: disable logall_json. Apply an OpenSearch ISM policy to delete old indices after 30 days. Add log rotation at the OS level inside the container.

The Dashboard That Showed Nothing

The Appendix C dashboard was filtering by rule.id:1005* — the simulation rule IDs from development. In production, the real detections lived in the 110xxx range. The dashboard returned nothing for real events.

Fix: switch from rule ID filtering to rule.groups:appendix_c. Group-based filtering survives rule ID changes.

Event Type Mismatch in the Sync Filter

The Wazuh→IRIS sync had been filtering by event_type text fields. Those fields don’t exist on real Windows events from Wazuh agents — they were simulation artifacts.

Fix: rebuild the filter as a frozenset of explicit rule IDs. Explicit, deterministic, easy to audit in a code review.

Week 4: Webhooks, Emails, and Closing the Loop

March 28 was the densest single day of the project.

The IRIS Webhook Receiver

When IRIS creates or updates an alert, how does the on-call analyst know? The answer was a webhook receiver in soc-integrator. IRIS supports outbound webhooks via its module system.

flowchart TD
  A["IRIS Alert created or updated"]
  B["IRIS outbound webhook module"]
  C["POST /iris/webhook (soc-integrator)"]
  D["Parse alert payload"]
  E["Enrich: entity name + event type"]
  F["Resolve IRIS_EXTERNAL_URL"]
  G["Format email body"]
  H["smtplib send"]
  I["On-call analyst inbox"]

  A --> B
  B --> C
  C --> D
  D --> E
  E --> F
  F --> G
  G --> H
  H --> I

The email notification went through three iterations in one day:

Version	What changed
v1	Bare notification: "an event arrived"
v2	Subject: "A1-02 Brute Force" not "IRIS Event"
v3	Full context: Alert ID, title, linked case, direct URL

The `.env` Inline Comment Bug

IRIS_EXTERNAL_URL=http://10.0.0.5 # production host was being parsed as the full string http://10.0.0.5 # production host. Inline comments on environment variable lines were silently corrupting values.

Always strip inline comments when parsing .env files, or use a proper parser like Python’s python-dotenv which handles this correctly.

The Simulation/Production Decoder Split

Real Wazuh agent events from Windows use the windows_eventchannel decoder chain. Syslog-injected simulation events use the JSON decoder. These are mutually exclusive — a rule chaining off windows_eventchannel will never fire for a simulated event.

flowchart TD
  A["Incoming Windows Event"]
  B{"Source type?"}
  C["Real Wazuh Agent"]
  D["Syslog-injected Simulator"]
  E["windows_eventchannel decoder"]
  F["JSON decoder"]
  G["Production anchor rule"]
  H["Simulation anchor rule 100270"]
  I["16 A4 Production rules"]
  J["16 A4 Simulation rules"]

  A --> B
  B -- "Agent" --> C
  B -- "Simulator" --> D
  C --> E
  D --> F
  E --> G
  F --> H
  G --> I
  H --> J

Solution: an anchor rule (100270) for the JSON decoder path, with all 16 A4 simulation rules pointing to it instead of the production anchor.

Pretty-Printing Is Part of the Detection Pipeline

Alert descriptions for Windows events were rendering as a single line of minified JSON. Two fixes were required:

Integrator side: detect JSON strings and pretty-print with 2-space indentation before storing in IRIS
Frontend side: <span> elements collapse whitespace — switch to <pre style="white-space:pre-wrap"> so the formatted JSON actually renders as formatted

An alert description rendered as one line of minified JSON gets ignored. The same data with indentation gets read. Presentation is part of the detection pipeline.

Week 5: Feedback Loops and Killing False Positives

Closing the Wazuh Feedback Loop

The C-series detections (C1 impossible travel, C2 credential abuse, C3 lateral movement) were detecting correctly but silently. The integrator would confirm a match and create an IRIS Alert — but Wazuh itself never knew.

This mattered for two reasons:

A level 15 rule in Wazuh should fire for confirmed attacks, not just raw events
If Wazuh doesn’t register the confirmed detection, the SOC dashboards won’t show it

Solution: WazuhSyslogAdapter — a small UDP sender inside the integrator. After confirming a C1 match, the integrator sends a structured syslog event back to Wazuh:

soc_event=correlation event_type=c1_impossible_travel user="..." src_ip=...

Wazuh decodes it via the soc-prod-integrator decoder, hits anchor rule 100260, then fires rule 110502 at level 15 (confirmed critical). The loop is closed. The dashboards are truthful.

flowchart TD
  A["Login event from two locations"]
  B["Wazuh raw rule fires (low level)"]
  C["OpenSearch Indexer"]
  D["soc-integrator poller"]
  E["C1 correlation logic"]
  F{"Impossible travel confirmed?"}
  G["Discard"]
  H["Create IRIS Alert"]
  I["WazuhSyslogAdapter"]
  J["UDP syslog back to Wazuh port 514"]
  K["soc-prod-integrator decoder"]
  L["Anchor rule 100260"]
  M["Rule 110502 fires (level 15 critical)"]
  N["SOC dashboards updated"]
  O["Email notification sent"]

  A --> B
  B --> C
  C --> D
  D --> E
  E --> F
  F -- "No" --> G
  F -- "Yes" --> H
  H --> I
  I --> J
  J --> K
  K --> L
  L --> M
  M --> N
  H --> O

The Off-Hours Detection Timezone Error

The C2 credential abuse off-hours window was configured as 20:00–06:00 UTC. That sounds reasonable until you localize it:

20:00 UTC = 03:00 ICT (Bangkok)
Business hours in Bangkok start at 08:00 ICT = 01:00 UTC

The rule was triggering during normal morning business activity.

Corrected window: 11:00–01:00 UTC = 18:00–08:00 ICT. Always configure time-based detection rules in the analyst’s local timezone, then convert to UTC.

C3-03: 24 False Positives → 0

Rule C3-03 detected admin lateral movement via RDP type-3 logon. It was firing 24 times per day on FPBIADFS01.

Root cause: AD FS performs constant service-to-service type-3 authentication with no source IP — exactly the pattern the rule was looking for, but from a known-safe service account.

Fix: add a single guard — ipAddress must be present and non-loopback. AD FS service auth has no source IP, so the guard drops it cleanly. Real lateral movement with a remote source still triggers the rule.

One condition. False positive rate: zero.

flowchart TD
  A["Type-3 logon event detected"]
  B{"ipAddress present and non-loopback?"}
  C["FPBIADFS01 service auth (no IP)"]
  D["Rule suppressed — false positive avoided"]
  E["Remote admin session (real IP)"]
  F["C3-03 fires — lateral movement alert"]

  A --> B
  B -- "No" --> C
  C --> D
  B -- "Yes" --> E
  E --> F

Infrastructure Notes That Weren’t in Any Commit Message

The macOS Bind-Mount Inode Problem

Docker on macOS tracks bind-mounted files by inode. When an editor creates a new inode on save (common with tools like sed -i or certain IDEs), the container continues reading from the old inode. The symptom: you edit a Wazuh rule, reload Wazuh, test — and the old behavior persists.

Fix: always run docker compose up --force-recreate after editing bind-mounted config files on macOS. Document this in your README on day one.

The `wazuh-logtest` stdin Hang

Piping a 20-line test file to wazuh-logtest via docker exec -i stalls after phase 2 of the last event.

Fix: write to a temp file inside the container and redirect from there, or wrap the call with a timeout.

Large File Transfers Through `docker exec`

Pipe truncation at ~64KB on some hosts when transferring large files via docker exec.

Fix: base64-encode the file on the host, pipe the encoded string into the container, decode with Python on the other side.

Current Status

As of April 1, 2026

Area	Status
Appendix A — Windows/AD simulation	59 rules, all fire end-to-end
Appendix B — Network/Firewall	Production rules active, FortiGate syslog ingesting
Appendix C — Identity (C1/C2/C3)	Detection + Wazuh feedback loop closed
Wazuh → IRIS sync	Running, severity-filtered, rule-ID-gated
IOC enrichment	VT + AbuseIPDB + 4 threat feeds, CDB hot-reload
Email notifications	On every IRIS webhook event, with full alert context
False positive controls	C3-03 ADFS guard, C2 off-hours timezone, logon-type filters
Log retention	3-day Wazuh logs, 30-day OpenSearch ISM policy

Key Takeaways

1. Detection engineering is iterative, not additive

The commit that adds a rule and the commit that fixes its false positives are equally important. A rule that fires on everything is worse than no rule — it trains analysts to ignore alerts, which defeats the entire purpose of a SOC.

2. The feedback loop matters

A detection that goes into IRIS but doesn’t register back in Wazuh is half a detection. The SIEM needs to know what the correlator confirmed, or the dashboards lie and analyst trust erodes.

3. Small infrastructure bugs compound

The timezone error, the .env inline comment parsing bug, the private-IP VirusTotal submission — none of these would break a demo. In production they create steady-state noise that erodes trust in the platform. Fix them early, before they become "known issues" that get ignored.

4. Always check for the OR-trap in Wazuh rules

Multiple <match> tags in a single Wazuh rule can behave as OR in certain decoder chain contexts. If your rules fire more broadly than expected, consolidate to a single <regex> with explicit AND logic.

5. Pretty-printing is not cosmetic

An alert description rendered as one line of minified JSON gets skimmed or ignored. The same data with proper indentation gets read and acted on. Presentation is part of the detection pipeline.