AI Summary Domain¶
AI/LLM layer — executive summaries and RAG Q&A grounded in DuckDB customer data.
Domain Entities¶
src.domain.ai_summary.entities ¶
AI Summary domain entities.
Defines the core data structures for the AI/LLM layer: - SummaryContext: all facts retrieved from DuckDB that ground the LLM prompt - GuardrailResult: outcome of the hallucination + fact-grounding validation pass - ExecutiveSummary: the final entity returned by the use case, with full audit trail
Classes¶
SummaryContext
dataclass
¶
All structured facts retrieved from DuckDB that will ground the LLM prompt.
Business Context: This is the "retrieval" step of our RAG strategy. A single B2B customer's full history fits comfortably in Llama-3's 128k context window, so we use context-stuffing rather than a vector database. The LLM is explicitly constrained to only reference facts present here.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
customer
|
Customer
|
The Customer entity with profile and MRR data. |
required |
prediction
|
PredictionResult
|
The PredictionResult including churn probability and SHAP features. |
required |
events_last_30d_by_type
|
dict[str, int]
|
Count of usage events by type in the last 30 days. |
required |
open_tickets
|
list[dict[str, object]]
|
List of open support tickets with priority, topic, and age. |
required |
gtm_opportunity
|
dict[str, object] | None
|
Active GTM opportunity dict (stage, amount) if one exists. |
required |
cohort_churn_rate
|
float
|
Churn rate for customers in the same tier + industry cohort. |
required |
Source code in src/domain/ai_summary/entities.py
GuardrailResult
dataclass
¶
Outcome of the GuardrailsService validation pass.
Business Context: All LLM outputs must pass a validation layer before reaching CS teams or executives. A flawed summary (wrong probability, hallucinated feature) could trigger the wrong CS action or damage trust.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
passed
|
bool
|
True if all checks passed. |
required |
flags
|
list[str]
|
List of specific violations detected (e.g. 'probability_mismatch'). |
required |
confidence_score
|
float
|
1.0 if fully clean; decreases 0.2 per flag. Minimum 0.0. |
required |
Source code in src/domain/ai_summary/entities.py
ExecutiveSummary
dataclass
¶
The final output entity of the AI Summary bounded context.
Contains the LLM-generated narrative, guardrail validation result, and full provenance metadata for audit and human review.
Business Context: CSMs use this for pre-meeting prep (~30 sec vs 15 min manual writing). Executives use it for portfolio risk reviews. The guardrail result and watermark ensure human-in-the-loop accountability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
customer_id
|
str
|
UUID of the customer this summary is about. |
required |
audience
|
str
|
Target audience — 'csm' (tactical) or 'executive' (strategic). |
required |
content
|
str
|
LLM-generated narrative with guardrail watermark appended. |
required |
guardrail
|
GuardrailResult
|
Validation result including flags and confidence score. |
required |
generated_at
|
datetime
|
UTC timestamp of when the summary was created. |
required |
model_used
|
str
|
Name of the LLM model used (e.g. 'llama-3.1-8b-instant'). |
required |
llm_provider
|
str
|
Inference provider — 'groq' or 'ollama'. |
required |
Source code in src/domain/ai_summary/entities.py
Summary Port (ABC)¶
src.domain.ai_summary.summary_port ¶
SummaryPort – abstract base class for LLM backend adapters.
Both GroqSummaryService and OllamaSummaryService implement this port, making the LLM backend swappable without touching domain or application code.
Classes¶
SummaryPort ¶
Bases: ABC
Abstract port for LLM text generation.
Business Context: This port decouples the domain from any specific LLM provider. The application layer only knows about SummaryPort — swapping Groq for Ollama (or a future provider) requires only a config change, not a code change in the use case.
Implementations must be stateless and thread-safe; FastAPI may call generate() concurrently from multiple request handlers.
Source code in src/domain/ai_summary/summary_port.py
Attributes¶
abstractmethod
property
¶Name of the underlying LLM model (e.g. 'llama-3.1-8b-instant').
abstractmethod
property
¶Name of the inference provider (e.g. 'groq' or 'ollama').
Functions¶
abstractmethod
¶Generate a raw LLM narrative grounded in the provided context.
Business Context: Guardrails are applied by the caller (GuardrailsService) after this method returns. This method is responsible only for making the API call and returning the raw text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
SummaryContext
|
Structured facts from DuckDB that ground the prompt. |
required |
audience
|
str
|
Target audience — 'csm' (tactical) or 'executive' (strategic). |
required |
Returns:
| Type | Description |
|---|---|
str
|
Raw LLM-generated text string (watermark NOT yet appended). |
Source code in src/domain/ai_summary/summary_port.py
abstractmethod
¶Generate a raw LLM response from a pre-assembled prompt string.
Business Context: Used by the expansion narrative pipeline where PromptBuilder.build_expansion_prompt() assembles the full prompt before calling the LLM. This avoids coupling the LLM backend to SummaryContext and allows expansion-specific prompt engineering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt
|
str
|
Fully assembled prompt string (including [CONTEXT], [INSTRUCTION], and [CONSTRAINT] blocks). |
required |
Returns:
| Type | Description |
|---|---|
str
|
Raw LLM-generated text string (watermark NOT yet appended). |
Source code in src/domain/ai_summary/summary_port.py
Guardrails Service¶
src.domain.ai_summary.guardrails_service ¶
GuardrailsService – validates LLM output before returning to callers.
Three-layer defence
- Feature name whitelist — reject summaries that mention made-up model features
- Probability accuracy — flag if stated probability deviates > 2pp from model output
- Watermark — always append human-in-loop annotation to every output
Business Context: In a CS context, a hallucinated summary (wrong probability, invented feature name) could trigger the wrong intervention or erode trust with CS teams. The guardrail layer ensures all LLM outputs are fact-grounded before reaching customers-facing workflows.
Classes¶
GuardrailsService ¶
Validates LLM output and appends the human-in-loop watermark.
Business Context: The three validation layers (feature whitelist, probability accuracy, watermark) implement the ethical guardrails described in docs/ethical-guardrails.md. A confidence_score < 0.5 should trigger human review before the summary is used.
Source code in src/domain/ai_summary/guardrails_service.py
Functions¶
Validate raw LLM output and append the required watermark.
Business Context: Called by GenerateExecutiveSummaryUseCase after every LLM call. Returns the final text (with watermark) and a GuardrailResult for audit logging and confidence scoring.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_text
|
str
|
The raw string returned by the LLM backend. |
required |
context
|
SummaryContext
|
The SummaryContext used to generate the text (for fact-checking). |
required |
Returns:
| Type | Description |
|---|---|
tuple[str, GuardrailResult]
|
Tuple of (final_text_with_watermark, GuardrailResult). |
Source code in src/domain/ai_summary/guardrails_service.py
Expansion Guardrails Service¶
src.domain.ai_summary.expansion_guardrails_service ¶
ExpansionGuardrailsService — three-gate LLM output validation for expansion briefs.
Three-layer defence
- Feature name whitelist (Gate 1) — flag hallucinated snake_case signals; 2+ flags → REJECTED
- Tone calibration (Gate 2) — strip urgency language when propensity < 0.50
- PII/jargon scrub (Gate 3) — remove UUIDs and ML terms from email_draft only; append watermark to ae_tactical_brief
Business Context: An AE acting on a hallucinated signal (fabricated feature name, wrong propensity tier) in an outreach email destroys trust and may misrepresent product capabilities. The three gates ensure every expansion brief delivered to Sales is factually grounded and appropriately calibrated to actual propensity.
Classes¶
ExpansionGuardrailResult
dataclass
¶
Result of ExpansionGuardrailsService.validate().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ae_tactical_brief
|
str
|
Brief with watermark appended (urgency stripped if applicable). |
required |
email_draft
|
str | None
|
Scrubbed email draft, or None. |
required |
guardrail_status
|
Literal['PASSED', 'FLAGGED', 'REJECTED']
|
'PASSED' / 'FLAGGED' / 'REJECTED' based on Gate 1 flags. |
required |
fact_confidence
|
float
|
1.0 − (0.25 × n_flags), floored at 0.0. |
required |
flags
|
list[str]
|
List of Gate 1 flag strings for audit logging. |
required |
Source code in src/domain/ai_summary/expansion_guardrails_service.py
ExpansionGuardrailsService ¶
Validates and transforms LLM output for expansion briefs.
Business Context: Mirrors GuardrailsService for the churn domain but scoped to expansion signals. Gate 1 uses the expansion_result's actual top_features as the fact whitelist, preventing the LLM from referencing signals outside the model's output.
Gate thresholds
- 0 flags → PASSED, confidence 1.0
- 1 flag → FLAGGED, confidence 0.75
- 2+ flags → REJECTED, confidence ≤ 0.50
Source code in src/domain/ai_summary/expansion_guardrails_service.py
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 | |
Functions¶
validate(ae_tactical_brief: str, email_draft: str | None, expansion_result: object, propensity: float) -> ExpansionGuardrailResult
Run all three gates and return the validated/transformed result.
Business Context: Called by GenerateExpansionSummaryUseCase after every LLM call. Returns the final texts (with watermark and scrubbing) plus a result entity for audit logging and confidence scoring.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ae_tactical_brief
|
str
|
Raw LLM-generated AE brief. |
required |
email_draft
|
str | None
|
Optional raw LLM-generated email draft (None for CSM). |
required |
expansion_result
|
object
|
ExpansionResult entity (provides top_features whitelist). |
required |
propensity
|
float
|
Calibrated upgrade propensity in [0, 1]. |
required |
Returns:
| Type | Description |
|---|---|
ExpansionGuardrailResult
|
ExpansionGuardrailResult with all transformations applied. |
Source code in src/domain/ai_summary/expansion_guardrails_service.py
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 | |
Application Use Cases¶
src.application.use_cases.generate_executive_summary ¶
GenerateExecutiveSummaryUseCase – orchestrates the AI summary pipeline.
Fetches customer + prediction data, builds SummaryContext from DuckDB, calls the LLM via SummaryPort, and validates output with GuardrailsService.
Classes¶
GenerateSummaryRequest
dataclass
¶
Input DTO for GenerateExecutiveSummaryUseCase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
customer_id
|
str
|
UUID of the active customer to summarise. |
required |
audience
|
str
|
'csm' (tactical briefing) or 'executive' (revenue-focused). |
'csm'
|
Source code in src/application/use_cases/generate_executive_summary.py
GenerateExecutiveSummaryUseCase ¶
Generates an AI executive summary grounded in DuckDB customer data.
Business Context: Replaces ~15 min of manual CSM research with a 30-second API call. The output is grounded in the Phase 4 prediction pipeline (churn probability, SHAP drivers) and enriched with usage events, support tickets, and GTM signals from DuckDB.
Pipeline
- Fetch Customer entity (raises ValueError if not found / churned)
- Run PredictChurnUseCase to get calibrated probability + SHAP features
- Query DuckDB for events, tickets, GTM context
- Build SummaryContext (the RAG "retrieval" step)
- Call LLM via SummaryPort (no DB access in LLM layer)
- Validate output + append watermark via GuardrailsService
- Return ExecutiveSummary entity
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
customer_repo
|
CustomerRepository
|
Repository for fetching Customer entities. |
required |
predict_use_case
|
PredictChurnUseCase
|
PredictChurnUseCase for calibrated probability + SHAP. |
required |
usage_repo
|
UsageRepository
|
UsageRepository for event lookups. |
required |
summary_service
|
SummaryPort
|
SummaryPort implementation (Groq or Ollama). |
required |
guardrails
|
GuardrailsService
|
GuardrailsService for hallucination detection + watermark. |
required |
Source code in src/application/use_cases/generate_executive_summary.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 | |
Functions¶
Run the full AI summary pipeline for a single customer.
Business Context: All LLM calls are grounded in verified DuckDB data. The guardrail layer ensures hallucinated features or wrong probabilities are flagged before the summary reaches a CS workflow.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request
|
GenerateSummaryRequest
|
Contains customer_id and target audience. |
required |
Returns:
| Type | Description |
|---|---|
ExecutiveSummary
|
ExecutiveSummary with content, guardrail result, and provenance metadata. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the customer is not found or has already churned. |
Source code in src/application/use_cases/generate_executive_summary.py
src.application.use_cases.ask_customer_question ¶
AskCustomerQuestionUseCase – RAG chatbot for free-text questions about a customer.
Answers questions like "Why is this customer at risk?" by building a SummaryContext from DuckDB and passing it to the LLM with a strict grounding constraint.
Classes¶
AskCustomerRequest
dataclass
¶
Input DTO for AskCustomerQuestionUseCase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
customer_id
|
str
|
UUID of the customer to ask about. |
required |
question
|
str
|
Free-text question from the CSM (5–500 characters). |
required |
Source code in src/application/use_cases/ask_customer_question.py
AskCustomerResponse
dataclass
¶
Output DTO for AskCustomerQuestionUseCase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
customer_id
|
str
|
UUID of the customer asked about. |
required |
question
|
str
|
The original question. |
required |
answer
|
str
|
LLM-generated answer with guardrail watermark. |
required |
confidence_score
|
float
|
0–1; degrades with guardrail flags. |
required |
guardrail_flags
|
list[str]
|
List of detected violations (e.g. 'probability_mismatch'). |
required |
scope_exceeded
|
bool
|
True if the question couldn't be answered from available data. |
required |
generated_at
|
datetime
|
UTC timestamp of the response. |
required |
model_used
|
str
|
LLM model name for audit. |
required |
llm_provider
|
str
|
Provider name for audit. |
required |
Source code in src/application/use_cases/ask_customer_question.py
AskCustomerQuestionUseCase ¶
Answers free-text questions about a customer using their DuckDB history as context.
Business Context: CSMs can ask questions like "Why is this customer at risk?" or "What support tickets are open?" and get answers grounded in real data. Questions outside available context return a 'scope_exceeded' flag rather than hallucinated answers — protecting CSM trust in the tool.
The underlying RAG strategy is context-stuffing: the customer's full history fits in Llama-3's 128k context window, so no vector database is required.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
customer_repo
|
CustomerRepository
|
Repository for fetching Customer entities. |
required |
predict_use_case
|
PredictChurnUseCase
|
PredictChurnUseCase for calibrated probability + SHAP. |
required |
usage_repo
|
UsageRepository
|
UsageRepository for event lookups. |
required |
summary_service
|
SummaryPort
|
SummaryPort with answer_question() method. |
required |
guardrails
|
GuardrailsService
|
GuardrailsService for hallucination detection + watermark. |
required |
Source code in src/application/use_cases/ask_customer_question.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |
Functions¶
Answer a free-text question about a customer.
Business Context: Retrieves full customer context from DuckDB, passes question + context to the LLM, validates the answer, and returns a structured response with scope_exceeded flag for out-of-context questions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request
|
AskCustomerRequest
|
Contains customer_id and the question to answer. |
required |
Returns:
| Type | Description |
|---|---|
AskCustomerResponse
|
AskCustomerResponse with grounded answer and audit metadata. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the customer is not found or has already churned. |
Source code in src/application/use_cases/ask_customer_question.py
src.application.use_cases.generate_expansion_summary ¶
GenerateExpansionSummaryUseCase — orchestrates the expansion narrative pipeline.
Translates a high-propensity ExpansionResult into an AE tactical brief and optional email draft, validated by ExpansionGuardrailsService before returning.
Classes¶
PropensityTooLowError ¶
Bases: ValueError
Raised when propensity is below the API-layer minimum threshold (0.15).
Business Context: Accounts with propensity < 0.15 are not expansion candidates. Calling the LLM for these accounts wastes tokens and produces misleading briefs. The API layer maps this to HTTP 422 so callers know the account is not ready.
Source code in src/application/use_cases/generate_expansion_summary.py
GenerateExpansionSummaryRequest
dataclass
¶
Input DTO for GenerateExpansionSummaryUseCase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
customer_id
|
str
|
UUID of the active customer to generate a brief for. |
required |
audience
|
Literal['account_executive', 'csm']
|
'account_executive' (tactical brief + optional email) or 'csm' (nurture brief only; email_draft forced to None). |
'account_executive'
|
include_email_draft
|
bool
|
If True and audience is 'account_executive', the response will include a 3-sentence email draft. |
False
|
Source code in src/application/use_cases/generate_expansion_summary.py
GenerateExpansionSummaryUseCase ¶
Generates a personalised AE brief grounded in the expansion propensity model.
Business Context: Reduces AE prep time from ~20 minutes to 30 seconds. Personalisation via SHAP signals drives 10–15% conversion lift vs generic outreach. The correlation_id in each result enables the data team to join brief quality (fact_confidence) to close rates in the V2 fine-tuning flywheel.
Pipeline
- Fetch Customer entity (raises ValueError if not found / churned)
- Run PredictExpansionUseCase → ExpansionResult
- Propensity < 0.15 → raise PropensityTooLowError (API → 422)
- Propensity < 0.35 → return "not ready" result without LLM call
- CSM audience override: force include_email_draft=False
- Build expansion prompt via PromptBuilder
- Call LLM via SummaryPort.generate_from_prompt()
- Validate + transform via ExpansionGuardrailsService
- Return ExpansionSummaryResult
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
customer_repo
|
CustomerRepository
|
Repository for fetching Customer entities. |
required |
expansion_use_case
|
PredictExpansionUseCase
|
PredictExpansionUseCase for propensity + SHAP. |
required |
summary_service
|
SummaryPort
|
SummaryPort implementation (Groq or Ollama). |
required |
guardrails
|
ExpansionGuardrailsService
|
ExpansionGuardrailsService for validation + watermark. |
required |
Source code in src/application/use_cases/generate_expansion_summary.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 | |
Functions¶
Run the full expansion narrative pipeline for a single customer.
Business Context: All LLM calls are grounded in verified model outputs (ExpansionResult SHAP features). The guardrail layer ensures hallucinated signals are caught before the brief reaches an AE's CRM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request
|
GenerateExpansionSummaryRequest
|
Contains customer_id, audience, and email draft flag. |
required |
Returns:
| Type | Description |
|---|---|
ExpansionSummaryResult
|
ExpansionSummaryResult with brief, guardrail result, and provenance. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the customer is not found or has already churned. |
PropensityTooLowError
|
If propensity < 0.15 (API maps this to 422). |
Source code in src/application/use_cases/generate_expansion_summary.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 | |
Infrastructure — LLM Adapters¶
src.infrastructure.llm.groq_summary_service ¶
GroqSummaryService – primary LLM backend via Groq Cloud API.
Uses llama-3.1-8b-instant by default: fast inference, free tier, 128k context. Temperature is kept low (0.2) for factual grounding in production.
Classes¶
GroqSummaryService ¶
Bases: SummaryPort
Implements SummaryPort using Groq Cloud inference API.
Business Context: Groq provides sub-3-second inference for Llama-3 models on free tier, making it cost-effective for per-request executive summaries. No infrastructure to manage — just an API key.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
str
|
Groq API key (from GROQ_API_KEY environment variable). |
required |
model
|
str
|
Groq model ID. Defaults to 'llama-3.1-8b-instant' for speed; use 'llama-3.1-70b-versatile' for higher quality at higher cost. |
'llama-3.1-8b-instant'
|
Source code in src/infrastructure/llm/groq_summary_service.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
Attributes¶
property
¶Inference provider name reported in ExecutiveSummary provenance.
Functions¶
Call Groq API and return the raw LLM-generated narrative.
Business Context: Low temperature (0.2) keeps the output factual and consistent. max_tokens=400 enforces the 3-5 sentence length constraint for CSM briefings. Guardrails are applied by the caller.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
SummaryContext
|
Structured facts from DuckDB that ground the prompt. |
required |
audience
|
str
|
'csm' or 'executive' — controls tone and focus. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Raw LLM text string (no watermark; guardrails applied by caller). |
Source code in src/infrastructure/llm/groq_summary_service.py
Call Groq API with a pre-assembled prompt and return raw LLM text.
Business Context: Used by the expansion narrative pipeline where the full prompt is built by PromptBuilder.build_expansion_prompt() before the LLM call. max_tokens=600 accommodates brief + optional email draft.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt
|
str
|
Fully assembled prompt string. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Raw LLM-generated text string (no watermark; guardrails applied by caller). |
Source code in src/infrastructure/llm/groq_summary_service.py
Answer a free-text question about a customer, constrained to DuckDB context.
Business Context: Used by the RAG chatbot endpoint. The prompt includes a strict [CONSTRAINT] block that prevents the LLM from fabricating answers to questions outside the available customer data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
SummaryContext
|
All structured facts for the customer from DuckDB. |
required |
question
|
str
|
CSM's free-text question (5–500 chars). |
required |
Returns:
| Type | Description |
|---|---|
str
|
LLM answer string, or the scope-exceeded sentinel phrase. |
Source code in src/infrastructure/llm/groq_summary_service.py
src.infrastructure.llm.ollama_summary_service ¶
OllamaSummaryService – local LLM backend via Ollama for offline/dev use.
Ollama runs as a Docker sidecar (dev profile). Zero API cost, fully offline. Latency target: < 15s p95 on CPU. Used when LLM_PROVIDER=ollama.
Classes¶
OllamaSummaryService ¶
Bases: SummaryPort
Implements SummaryPort using a local Ollama instance.
Business Context: Ollama is the local fallback for development environments without Groq API access. It runs the same Llama-3 family model, ensuring prompt/response behaviour is consistent across environments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
host
|
str
|
Ollama API base URL. Defaults to http://localhost:11434. |
'http://localhost:11434'
|
model
|
str
|
Ollama model tag. Defaults to 'llama3.1:8b'. |
'llama3.1:8b'
|
timeout
|
float
|
HTTP timeout in seconds for the synchronous API call. |
60.0
|
Source code in src/infrastructure/llm/ollama_summary_service.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | |
Attributes¶
property
¶Inference provider name reported in ExecutiveSummary provenance.
Functions¶
Call local Ollama API and return the raw LLM-generated narrative.
Business Context: Ollama uses the /api/generate endpoint with stream=False for simplicity. The same prompt structure as Groq is used to ensure output quality is comparable across providers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
SummaryContext
|
Structured facts from DuckDB that ground the prompt. |
required |
audience
|
str
|
'csm' or 'executive' — controls tone and focus. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Raw LLM text string (no watermark; guardrails applied by caller). |
Raises:
| Type | Description |
|---|---|
HTTPError
|
If the Ollama service is unreachable. |
Source code in src/infrastructure/llm/ollama_summary_service.py
Call local Ollama API with a pre-assembled prompt and return raw LLM text.
Business Context: Used by the expansion narrative pipeline. Delegates to _call_ollama with the system prompt prepended for consistent behaviour across providers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt
|
str
|
Fully assembled prompt string. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Raw LLM-generated text string (no watermark; guardrails applied by caller). |
Source code in src/infrastructure/llm/ollama_summary_service.py
Answer a free-text question using local Ollama inference.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
SummaryContext
|
All structured facts for the customer from DuckDB. |
required |
question
|
str
|
CSM's free-text question. |
required |
Returns:
| Type | Description |
|---|---|
str
|
LLM answer string constrained to available context. |
Source code in src/infrastructure/llm/ollama_summary_service.py
src.infrastructure.llm.prompt_builder ¶
PromptBuilder – assembles structured prompts from SummaryContext.
Every prompt contains a [CONTEXT] block with ONLY the facts from DuckDB. The system prompt explicitly constrains the LLM to reference only those facts. This is the primary hallucination-prevention mechanism at the prompt level.
Classes¶
PromptBuilder ¶
Builds grounded prompts for the LLM from structured SummaryContext data.
Business Context: Prompt structure is the first line of defence against hallucination. By providing a [CONTEXT] block with only verified facts and a [CONSTRAINT] that explicitly forbids extrapolation, we reduce the LLM's tendency to invent figures or feature names.
Source code in src/infrastructure/llm/prompt_builder.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 | |
Functions¶
Assemble a summary generation prompt for the given audience.
Business Context: CSM prompts focus on actionable tactics; executive prompts focus on revenue impact and ROI of intervention. Both include the same [CONTEXT] block so the same facts ground both narratives.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
SummaryContext
|
All verified facts from DuckDB for this customer. |
required |
audience
|
str
|
'csm' for Customer Success Manager, 'executive' for VP/C-suite. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Complete prompt string ready to send to the LLM. |
Source code in src/infrastructure/llm/prompt_builder.py
build_expansion_prompt(expansion_result: ExpansionResult, audience: str, include_email_draft: bool = False) -> str
Assemble an expansion narrative prompt grounded in ExpansionResult facts.
Business Context: Only injects facts from expansion_result.to_summary_context() and the top 3 SHAP features. The LLM cannot reference signals outside this set — primary hallucination-prevention at the prompt level.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expansion_result
|
ExpansionResult
|
ExpansionResult entity from the expansion pipeline. |
required |
audience
|
str
|
'account_executive' (tactical brief + optional email) or 'csm' (nurture brief only, no email). |
required |
include_email_draft
|
bool
|
If True AND audience is 'account_executive', append email draft instructions. |
False
|
Returns:
| Type | Description |
|---|---|
str
|
Complete prompt string ready to send to the LLM. |
Source code in src/infrastructure/llm/prompt_builder.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | |
Assemble a Q&A prompt that constrains answers to available customer data.
Business Context: The RAG chatbot answers free-text questions from CSMs about a specific customer. The [CONSTRAINT] block prevents the LLM from fabricating information not present in the customer's DuckDB history.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
SummaryContext
|
All verified facts from DuckDB for this customer. |
required |
question
|
str
|
The CSM's free-text question (5–500 characters). |
required |
Returns:
| Type | Description |
|---|---|
str
|
Complete prompt string ready to send to the LLM. |