Atlas Hermes — Autonomous Email Intelligence
An AI agent that monitors your inbox, understands context, drafts replies, and learns from your decisions. It doesn't just filter email — it thinks about email.
What Hermes Does
- Polls Gmail every 30 minutes for new messages
- Analyzes each email with GPT-4o for intent, urgency, and context
- Scores confidence (0-100) across 6 weighted signals
- Drafts intelligent replies using Claude Sonnet or GPT-4o
- Sends notifications to Telegram for human review
- Learns from every approve/edit/ignore decision
Core Philosophy
Progressive Autonomy. Hermes starts cautious — notifying you about everything. As it learns your preferences, it handles routine emails automatically and only escalates complex ones.
Currently deployed for: operations@legacyinsuranceco.com
Technology Stack
Email Analysis Pipeline
Every incoming email passes through a multi-stage analysis pipeline before any action is taken. Here's what happens from inbox to notification.
End-to-End Flow
Stage-by-Stage Breakdown
-
Gmail Polling Connects via Gmail API (OAuth2). Fetches unread messages from the inbox every 30 minutes. Configurable interval via
hermes.config.json. -
Deduplication Each email's message ID is checked against SQLite. Already-processed emails are skipped. Thread context is preserved — Hermes knows if this is a reply in an ongoing conversation.
-
LLM Analysis (GPT-4o) Full email body + thread context sent to GPT-4o. Extracts: intent classification, urgency level (1-5), action items, sentiment, sender relationship, and whether a reply is expected.
-
Confidence Scoring 6 weighted signals produce a composite score from 0-100. Higher scores mean Hermes is more certain about what to do. See the next slide for signal details.
-
Draft Generation If a reply is warranted, Claude Sonnet or GPT-4o generates a draft in the user's writing style. The draft includes tone matching, context awareness, and appropriate detail level.
-
Telegram Notification Summary + draft sent to Telegram with inline action buttons. User can approve, edit, ignore, or snooze — all from their phone.
Analysis Output Schema
| GPT-4o Analysis Fields | ||
|---|---|---|
| intent | Classification: question, request, FYI, complaint, sales, scheduling, follow-up | AI |
| urgency | 1 (low) to 5 (critical). Based on deadlines, language, sender importance. | Score |
| action_required | Boolean — does the sender expect a response or action? | AI |
| action_items | List of extracted to-dos from the email body | AI |
| sentiment | positive, neutral, negative, frustrated, urgent | AI |
| summary | One-line plain-English summary of the email | AI |
| reply_warranted | Boolean — should a draft be generated? | AI |
6-Signal Confidence Scoring
Hermes doesn't guess — it calculates. Every email receives a weighted confidence score based on 6 distinct signals. This score determines what action Hermes recommends.
The 6 Signals
| Signal Breakdown | |||
|---|---|---|---|
| 1 | Sender History | How many previous interactions with this sender? Known contacts score higher. New/unknown senders reduce confidence. | Weight: 20% |
| 2 | Intent Clarity | How clear is the email's purpose? Direct questions score high. Ambiguous or multi-topic emails score low. | Weight: 20% |
| 3 | Pattern Match | Does this email match patterns Hermes has seen before? Similar emails that were previously approved boost confidence. | Weight: 25% |
| 4 | Reply Template Fit | Can Hermes generate a reply from learned templates/patterns? High template fit = high confidence in the draft quality. | Weight: 15% |
| 5 | Urgency Alignment | Is the urgency level consistent across signals? Mismatched urgency indicators (calm language + "ASAP" subject) reduce confidence. | Weight: 10% |
| 6 | Risk Assessment | Could a wrong reply cause damage? Financial, legal, or high-stakes emails get lower confidence regardless of other signals. | Weight: 10% |
Score Ranges & Actions
Notify only. No draft generated. Human must handle from scratch.
Draft generated, sent for review. Likely needs edits before sending.
Draft generated, ready to send. One-tap approve in Telegram.
Progressive autonomy: once enabled, Hermes sends automatically at this tier.
Score Calculation Example
From: john@clientco.com (12 prior interactions)
Sender History: 92/100 x 0.20 = 18.4
Intent Clarity: 95/100 x 0.20 = 19.0
Pattern Match: 88/100 x 0.25 = 22.0
Template Fit: 90/100 x 0.15 = 13.5
Urgency Align: 85/100 x 0.10 = 8.5
Risk Assessment: 95/100 x 0.10 = 9.5
─────────────────────────────
Composite Score: 90.9 → Auto-Send Ready
Telegram Interface
Your command center for email decisions. Hermes sends formatted notifications to Telegram with inline action buttons. Review, approve, or ignore — all from your phone.
Notification Format
Subject: Re: Thursday meeting reschedule
Intent: 📅 Scheduling request
Urgency: ★★★☆☆ (3/5)
"Hi John, 3pm works for me. I'll update the calendar invite. Talk then!"
Available Commands
| Telegram Bot Commands | ||
|---|---|---|
| /approve | Send the draft reply as-is. Email is sent immediately via Gmail API. | Action |
| /edit [text] | Replace the draft with your text and send. Or just type new text after tapping Edit. | Action |
| /ignore | Mark email as handled, no reply sent. Hermes learns this type doesn't need a response. | Action |
| /later | Snooze for a configurable duration (default: 2 hours). Re-notifies when the snooze expires. | Action |
| /digest | Get a daily summary: emails processed, actions taken, pending items, confidence trends. | Info |
| /stats | View learning statistics: approval rate, edit frequency, ignored patterns. | Info |
| /pause | Temporarily stop polling. Hermes goes silent until /resume. | Control |
| /resume | Restart polling after a pause. | Control |
Self-Learning System
Every decision you make teaches Hermes. Approvals, edits, and ignores all feed back into the confidence model, making future predictions more accurate.
The Feedback Loop
What Each Action Teaches
| Learning Signals | ||
|---|---|---|
| ✓ Approve | Strongest positive signal. Confirms: sender relationship, intent classification, draft tone, and reply style are all correct. Boosts confidence for similar future emails from this sender and pattern. | +Confidence |
| ✎ Edit | Moderate signal. The analysis was right (reply warranted) but the draft needed changes. Hermes diffs the original draft vs. your edit to learn: what was wrong, what tone you preferred, what details you added/removed. | Refine |
| ✗ Ignore | Negative signal for reply. This email didn't need a response. Pattern is stored — similar emails from this sender/type will get lower "reply_warranted" scores. | -Reply Need |
| 🕔 Later | Timing signal. The email matters but not right now. Hermes learns which email types tend to get deferred (e.g., newsletters, non-urgent FYIs). | Timing |
Style Learning
When you edit a draft, Hermes performs a diff analysis:
- Tone shifts (formal → casual or vice versa)
- Length preference (shorter or more detailed)
- Greeting/closing patterns
- Content you consistently add or remove
- Phrases and vocabulary you prefer
After ~20 edits, draft quality improves significantly.
Progressive Autonomy
As confidence in a pattern grows, Hermes reduces friction:
- Phase 1: Notify + draft for everything
- Phase 2: Auto-approve routine patterns (90+ score, 3+ prior approvals from same sender)
- Phase 3: Auto-send for trusted senders + known patterns
- Phase 4: Only notify for exceptions and unknowns
Each phase requires explicit opt-in.
Auto-Triage Rules
Not every email needs AI analysis. Auto-triage rules handle predictable patterns instantly — before the LLM is even invoked. Saves tokens and time.
Triage Flow
Built-In Triage Rules
| Default Rules (configurable) | ||
|---|---|---|
| Newsletter / Marketing | Emails from known newsletter platforms (Substack, Beehiiv, Mailchimp). Auto-archived, included in daily digest summary. | Auto-Archive |
| Automated Receipts | Purchase confirmations, shipping notifications, password resets from known services. Auto-labeled, no notification. | Auto-Label |
| Calendar Invites | Google Calendar / Outlook invites. Extracted to digest. Not analyzed for reply unless RSVP is pending. | Auto-Extract |
| Spam / Promotions | Gmail's own spam detection + Hermes pattern matching. Double-layer filtering. Silently skipped. | Auto-Skip |
| Out-of-Office Replies | OOO auto-replies detected by subject pattern + content analysis. Logged but not notified. | Auto-Log |
| VIP Sender Override | Emails from VIP list always skip triage and go straight to full pipeline + immediate notification. | Priority |
Custom Rule Format
"name": "Ignore Jira Notifications",
"match": {
"from": "*@jira.atlassian.net",
"subject_contains": ["assigned", "commented", "updated"]
},
"action": "archive",
"notify": false,
"include_in_digest": true
}
Configuration
Everything is configured through hermes.config.json. One file controls polling, LLM selection, confidence thresholds, triage rules, and notification preferences.
hermes.config.json — Key Sections
| Configuration Reference | ||
|---|---|---|
| gmail.poll_interval | How often to check for new emails. Default: 1800000 (30 minutes in ms) |
Timing |
| gmail.account | Gmail address to monitor. Requires OAuth2 credentials. | Auth |
| llm.analysis_model | Model for email analysis. Default: "gpt-4o" |
AI |
| llm.draft_model | Model for reply generation. Default: "claude-sonnet-4-20250514" |
AI |
| confidence.auto_send_threshold | Minimum score for auto-send (if enabled). Default: 90 |
Score |
| confidence.draft_threshold | Minimum score to generate a draft. Below this, notify only. Default: 40 |
Score |
| telegram.bot_token | Telegram bot API token. Create via @BotFather. | Auth |
| telegram.chat_id | Your Telegram chat ID for notifications. | Auth |
| digest.time | Daily digest delivery time. Default: "08:00" (local timezone) |
Timing |
| autonomy.enabled | Master switch for progressive autonomy. Default: false |
Control |
| autonomy.min_approvals | Approvals needed from a sender before auto-send kicks in. Default: 3 |
Control |
| triage.rules[] | Array of auto-triage rule objects (see previous slide) | Rules |
| vip_senders[] | Email addresses that always get priority treatment | List |
.env. The config file only references them by environment variable name.Environment Variables (.env)
ANTHROPIC_API_KEY=sk-ant-...
GMAIL_CLIENT_ID=...apps.googleusercontent.com
GMAIL_CLIENT_SECRET=...
GMAIL_REFRESH_TOKEN=...
TELEGRAM_BOT_TOKEN=...bot-token
TELEGRAM_CHAT_ID=...chat-id
Architecture
How all the pieces fit together. Hermes is a single Node.js process with a modular internal architecture and SQLite for persistence.
System Architecture
Internal Modules
| Module Map | ||
|---|---|---|
| src/poller.js | Gmail polling loop. Fetches new messages, handles OAuth refresh, manages rate limits. | Core |
| src/analyzer.js | LLM integration for email analysis. Builds prompts with thread context, parses structured output. | AI |
| src/scorer.js | 6-signal confidence scoring engine. Reads history from SQLite, computes weighted composite. | Score |
| src/drafter.js | Reply draft generation. Style model built from approved/edited history. | AI |
| src/triage.js | Auto-triage rule engine. Pattern matching before LLM analysis. | Filter |
| src/telegram.js | Telegram Bot API integration. Message formatting, inline buttons, callback handling. | Interface |
| src/learner.js | Feedback loop processor. Diff analysis for edits, pattern storage, confidence updates. | Learning |
| src/digest.js | Daily digest generator. Aggregates stats, formats summary, scheduled delivery. | Report |
| data/hermes.db | SQLite database. Stores email history, sender profiles, patterns, feedback, and confidence data. | Storage |
SQLite Tables
- emails — Processed email metadata + analysis
- senders — Sender profiles + relationship scores
- patterns — Learned email patterns by type
- feedback — User decisions (approve/edit/ignore/later)
- drafts — Generated drafts + edit diffs
- triage_log — Auto-triaged emails for audit
- digest_history — Past digest summaries
Deployment
- Runtime: Single Node.js process (PM2 managed)
- Database: SQLite (local file, no external DB needed)
- Memory: ~80-120MB typical
- Startup:
npm startorpm2 start - Logs: Structured JSON via Winston
- Health check: Internal heartbeat + Telegram ping
Strengths
- Zero external infrastructure (SQLite, single process)
- Self-improving accuracy over time
- Mobile-first via Telegram
- Cost-efficient (triage skips 40-60% of LLM calls)
- Thread-aware analysis
- Privacy: all data stays local
Known Limitations
- Single-user per instance (not multi-tenant)
- Gmail only (no Outlook/IMAP yet)
- No web dashboard (Telegram only)
- SQLite single-writer (no concurrent access)
- No attachment analysis yet
- Cold start: first ~50 emails have low confidence