We don't just use AI — we encode our founding team's decades of industry experience into the connective tissue between foundation models, real-time market data, and proprietary trade intelligence. Here's how the stack works.
Six model providers, orchestrated into one collaborative research system. Reasoning, visual parsing, mathematical verification, text preprocessing — different cognitive tasks route to the best-fit model, outputs converge into a single analytical pipeline.
Open-source. General-purpose local inference and data sovereignty. MoE architecture with long context; sensitive financial data runs on local instances — no API calls, no data transmission, full audit trail. Fallback inference during API outages.
Open-source. Pure RL-trained reasoning model matching o1 on math benchmarks. Locally deployed for quantitative verification: valuation auditing, hypothesis testing, scenario probability calculations (the V4 preview extends the context window further).
Open-source. Lightweight local model for preprocessing, classification, and embedding. News filtering, sentiment tagging, metadata extraction — concentrating compute budget on the analytical core.
Proprietary. The o3 reasoning model handles multi-step math verification: option pricing, probability-weighted scenario analysis, sensitivity calculations. GPT-5.5 for multimodal parsing — earnings slides with embedded charts, scanned regulatory filings. Computation and parsing, not prose.
Proprietary. Primary reasoning engine with extended thinking for auditable reasoning chains. 1M token context processes full annual reports plus multiple sell-side reports in a single pass. Precision at critical decision points.
Open-weight. Preferred for European compliance scenarios with strong multilingual capabilities. A single dense model fusing reasoning, multimodal, and code abilities with a 256K context — fit for medium-complexity batch analysis tasks.
Through MCP, AI queries data sources autonomously. Four-layer denoising and cleaning ensures data reaching the reasoning models is clean, aligned, and leakage-free.
China market data core — A-shares, HK equities, futures, funds, macro indicators. Open-source, API-native, queried by AI in real-time via MCP.
Global equities, fixed income, derivatives, ESG data. Cross-border comparisons require normalization across different accounting standards and trading calendars.
Four-layer cleaning: RMT covariance denoising → delisting bias correction → outlier detection (distinguishing flash crashes from bad ticks) → cross-border temporal alignment.
Strict point-in-time discipline: the system only uses information knowable at each historical moment. Cross-border accounting normalization (CAS vs GAAP vs IFRS), ambiguous cases flagged for human review.
Auto-collecting sell-side reports, news, papers, social signals. Parsed, deduplicated, tagged with metadata, vectorized for semantic retrieval.
Connecting models, data sources, and tools into repeatable automated pipelines. Each workflow is a directed acyclic graph — signal detection triggers context assembly, context feeds multi-model analysis.
Cross-border finance demands specific cloud architecture: data sovereignty compliance, cross-region latency optimization, and multi-cloud redundancy.
Global infrastructure backbone. Bedrock for managed Claude deployment, Data Exchange for institutional market feeds, cross-region data redundancy.
BigQuery for large-scale market data analysis and backtesting. Vertex AI for custom model training. Cloud Run for serverless MCP server deployment.
Edge-first deployment. Pages for global static delivery, Workers for edge compute, R2 for egress-free object storage. The public-facing research platform runs entirely on Cloudflare's edge network.
Every code change triggers: build verification, content compliance scan, visual regression testing, staged deployment. Environment configuration is version-controlled, auditable, and reproducible. Rollback to any previous state in under 60 seconds.
A single AI cannot handle the full research lifecycle. Nine production agents — analysis, writing, challenging, fact-checking, compliance review — each operates in isolated context.
Every research task follows a three-phase cycle. Planning iterates through multiple reviews, execution runs in parallel, verification enforces deterministic quality gates. No output reaches publication without passing all three phases.
The analyst agent accesses data but cannot publish. Compliance agent flags violations but cannot modify analysis. Editor refines language but cannot change conclusions. Separation of concerns prevents unchecked error propagation.
For every bullish thesis, a dedicated agent builds the bear case — not a token objection, but systematic dismantling. The analyst must address each adversarial point with evidence before the thesis advances.
Every mistake is recorded as a permanent rule. Next session loads accumulated rules before starting work. Over weeks, the error rate drops measurably — institutional memory encoded as explicit constraints.
Every thesis starts with a prior probability and a falsification condition. New evidence updates confidence through Bayesian inference. Stale theses decay automatically. We track, update, and retire systematically.
Every new thesis enters with an explicit probability — not "bullish" or "bearish," but a calibrated confidence level. You cannot hold a view without quantifying your confidence in it.
New data does not replace the thesis — it updates it. Each update logs the evidence, reasoning, and magnitude of adjustment. The thesis becomes a living document with complete evidential history.
Every thesis must define its own death condition before publication — pre-committed exit criteria. If you cannot articulate what would prove you wrong, your thesis is not rigorous enough to publish.
A thesis without recent evidence runs on inertia. The system automatically decreases its confidence, flagging it for review. Prevents zombie theses — views valid six months ago but unexamined since.
Single-model output is never the final word. Every conclusion must survive adversarial review, cross-model verification, source independence checks, and quantitative stress testing before earning publication status.
When two models analyze the same question and reach different conclusions, the system flags the disagreement for human resolution. Model disagreement is information, not noise.
Three supporting pieces of evidence may look strong — until you realize they share a common origin. The system traces provenance and flags non-independent evidence clusters.
Key assumptions are perturbed and the system reports sensitivity bands. High-sensitivity conclusions get robustness warnings. Low-sensitivity conclusions earn higher confidence.
When our internal thesis diverges significantly from sell-side consensus, the system triggers a structured review. The best research comes from understood disagreement, not accidental disagreement.
Model-agnostic does not mean indifferent. Each cognitive mode has a computational form that fits it best. We use multiple models not for cost optimization — but because different cognitive tasks need different computational architectures.
| Cognitive Mode | Core Question | Model Assignment | Why This Model |
|---|---|---|---|
| Bayesian Inference | How much should new evidence shift our confidence? | Claude Opus 4.8 | Longest context preserves full prior-posterior chains |
| Causal Reasoning | Did A cause B, or did they just co-occur? | Claude Opus 4.8 + o3 cross-check | Claude builds causal narratives; o3 verifies counterfactual math |
| Falsification | Under what conditions does this thesis die? | Adversarial Agent (Claude Sonnet 4.6) | Dedicated adversary needs speed and volume, not maximum depth |
| Information Theory | How much genuine signal does this new data carry? | Embedding Models | Vector distance measures novelty — far = high information gain |
| Econometrics | Does this statistical relationship survive controls? | o3 / o4-mini | Pure math — regression, IV, hypothesis testing |
| Behavioral Finance | Is consensus rational or sentiment-driven? | Gemini 3.5 Flash + Claude | Gemini scans sentiment at scale; Claude judges if deviation is actionable |
| Complex Systems | How do micro behaviors emerge as macro patterns? | Claude Opus 4.8 (long context) | Must hold multiple layers simultaneously |
| Monte Carlo Simulation | What does the probability distribution look like? | o3 + Local Llama 4 / DeepSeek-R1 | o3 for probability math; Llama runs large-batch simulations locally |
| Adversarial Thinking | Where is the weakest link if someone attacks this thesis? | Two Claude instances in opposition | One builds thesis, one dismantles — neither sees the other's reasoning |
| Meta-cognition | What systematic biases exist in our analytical process? | Institutional Memory Protocol | Not model inference — system-level accumulated error patterns correct the pipeline |
This is not a bibliography. It is an intellectual genome — every design decision in our system traces back to a specific intellectual tradition.
Every system component rests on a peer-reviewed academic tradition. These are the foundational works that shaped our analytical framework.
Bayes (1763) → Black-Litterman (1992) → Tetlock (2015). From the original probability theorem to portfolio view combination to calibrated forecasting — the mathematical foundation of our Thesis Tracking system.
Granger (1969) → Rubin (1974) → Pearl (2009). From time-series causality to potential outcomes to causal graphs — the full toolkit for distinguishing correlation from causation.
Popper (1934) → Lakatos (1978) → Mayo (2018). Falsifiability as demarcation, research programme resilience, and how to design tests with genuine statistical power.
Knight (1921) → Markowitz (1952) → Taleb (2007, 2012). From the fundamental risk-uncertainty distinction to mean-variance to fat tails and antifragility — stress testing is non-negotiable.
Simon (1955) → Miller (1956) → Kahneman (1979, 2011). Bounded rationality, working memory limits, systematic biases — cognitive constraints shared by humans and AI that drive our architecture.
Shannon (1948) → Vaswani (2017) → Anthropic MCP (2024). From information theory to Transformers to RAG to Constitutional AI to Model Context Protocol — academic roots of every stack layer.
Codd (1970) → Fama-French (1992) → Wickham (2014). Relational model, financial data cleaning standards, tidy data principles — the academic skeleton of our cross-market data layer.
MemGPT (2023) → TradingAgents (2024) → FactorMAD (2025). From dual-tier memory to multi-agent trading to factor mining debate — validating AI-assisted judgment, not AI-replaced judgment.
Investment research that only covers equities misses half the picture. Our infrastructure extends to derivatives, fixed income, foreign exchange, and structured products — each with purpose-matched models.
Commodity futures, equity index futures, interest rate futures, and swaps. We monitor term structure dynamics — contango, backwardation, roll yield — as structural signals. Track basis, warehouse receipts, and shipping data to distinguish speculation from fundamental shifts.
Implied volatility surfaces, skew dynamics, term structure, and vol arbitrage signals. Beyond Black-Scholes — we apply Heston stochastic volatility and jump-diffusion to capture fat-tailed risk. The vol surface itself is a data source: skew changes signal institutional hedging demand.
Government bonds, corporate bonds, commercial paper, CDs. Yield curve modeling is the analytical backbone — decomposing into level, slope, and curvature factors. Credit spread analysis layers fundamental assessment with CDS-implied default probabilities. For China, we track PBOC OMOs and MLF/LPR.
G10 and EM currency pairs, NDFs, and currency swaps. Three-layer framework: short-term (positioning, sentiment, flows), medium-term (rate differentials, carry, policy divergence), structural (PPP, current account, reserve currency dynamics). For USD/CNY, we monitor PBOC fixing signals.
The most valuable signals emerge between asset classes. We monitor stock-bond correlations, commodity-currency linkages, and credit-equity divergences. Regime detection identifies when correlation structures break — these transitions concentrate the highest-conviction opportunities.
Convertible bonds, ABS, CLOs, and linked notes. Decomposition into component parts — a convertible is simultaneously a bond, call option, and credit instrument. Cash flow waterfall modeling, prepayment risk, subordination, and trigger events.
Every layer above — model orchestration, MCP data access, multi-cloud deployment — is not a demo. It ships published market research every day. We're looking for infrastructure partners who can provide compute and model access: you get evaluation under real load and an on-the-record deployment, we get the resources to push the research further.
Inference calls, context length, multi-model routing, cross-region latency — all happen inside a continuously running research pipeline, not a benchmark script.
The output is published openly and on the record. The effect of your model and cloud capabilities is judged by independent readers, not internal metrics.
On resource access, model evaluation, or wiring your capabilities into this pipeline — we are open to a concrete technical conversation.