Research Infrastructure

Decades of knowledge, encoded into AI-native infrastructure

We don't just use AI — we encode our founding team's decades of industry experience into the connective tissue between foundation models, real-time market data, and proprietary trade intelligence. Here's how the stack works.

Models

Multiple models, one research mind

Six model providers, orchestrated into one collaborative research system. Reasoning, visual parsing, mathematical verification, text preprocessing — different cognitive tasks route to the best-fit model, outputs converge into a single analytical pipeline.

Meta Llama 4 (Scout / Maverick)

Open-source. General-purpose local inference and data sovereignty. MoE architecture with long context; sensitive financial data runs on local instances — no API calls, no data transmission, full audit trail. Fallback inference during API outages.

DeepSeek-R1 / V3

Open-source. Pure RL-trained reasoning model matching o1 on math benchmarks. Locally deployed for quantitative verification: valuation auditing, hypothesis testing, scenario probability calculations (the V4 preview extends the context window further).

Google Gemma 4

Open-source. Lightweight local model for preprocessing, classification, and embedding. News filtering, sentiment tagging, metadata extraction — concentrating compute budget on the analytical core.

OpenAI GPT-5.5 / o3

Proprietary. The o3 reasoning model handles multi-step math verification: option pricing, probability-weighted scenario analysis, sensitivity calculations. GPT-5.5 for multimodal parsing — earnings slides with embedded charts, scanned regulatory filings. Computation and parsing, not prose.

Claude (Opus 4.8 / Sonnet 4.6)

Proprietary. Primary reasoning engine with extended thinking for auditable reasoning chains. 1M token context processes full annual reports plus multiple sell-side reports in a single pass. Precision at critical decision points.

Mistral Medium 3.5

Open-weight. Preferred for European compliance scenarios with strong multilingual capabilities. A single dense model fusing reasoning, multimodal, and code abilities with a 256K context — fit for medium-complexity batch analysis tasks.

Data

AI talks directly to market data

Through MCP, AI queries data sources autonomously. Four-layer denoising and cleaning ensures data reaching the reasoning models is clean, aligned, and leakage-free.

AKShare + MCP

China market data core — A-shares, HK equities, futures, funds, macro indicators. Open-source, API-native, queried by AI in real-time via MCP.

Cross-Market Data Layer

Global equities, fixed income, derivatives, ESG data. Cross-border comparisons require normalization across different accounting standards and trading calendars.

Data Denoising Pipeline

Four-layer cleaning: RMT covariance denoising → delisting bias correction → outlier detection (distinguishing flash crashes from bad ticks) → cross-border temporal alignment.

Anti-Leakage & Normalization

Strict point-in-time discipline: the system only uses information knowable at each historical moment. Cross-border accounting normalization (CAS vs GAAP vs IFRS), ambiguous cases flagged for human review.

Research Ingestion

Auto-collecting sell-side reports, news, papers, social signals. Parsed, deduplicated, tagged with metadata, vectorized for semantic retrieval.

Workflow Orchestration

Connecting models, data sources, and tools into repeatable automated pipelines. Each workflow is a directed acyclic graph — signal detection triggers context assembly, context feeds multi-model analysis.

Cloud & Compute

Where the research runs: multi-cloud

Cross-border finance demands specific cloud architecture: data sovereignty compliance, cross-region latency optimization, and multi-cloud redundancy.

Amazon Web Services

Global infrastructure backbone. Bedrock for managed Claude deployment, Data Exchange for institutional market feeds, cross-region data redundancy.

Google Cloud Platform

BigQuery for large-scale market data analysis and backtesting. Vertex AI for custom model training. Cloud Run for serverless MCP server deployment.

Cloudflare

Edge-first deployment. Pages for global static delivery, Workers for edge compute, R2 for egress-free object storage. The public-facing research platform runs entirely on Cloudflare's edge network.

CI/CD & Infrastructure as Code

Every code change triggers: build verification, content compliance scan, visual regression testing, staged deployment. Environment configuration is version-controlled, auditable, and reproducible. Rollback to any previous state in under 60 seconds.

Agents

Specialized roles that challenge each other

A single AI cannot handle the full research lifecycle. Nine production agents — analysis, writing, challenging, fact-checking, compliance review — each operates in isolated context.

Plan-Execute-Verify Loop

Every research task follows a three-phase cycle. Planning iterates through multiple reviews, execution runs in parallel, verification enforces deterministic quality gates. No output reaches publication without passing all three phases.

Role-Specialized Agents

The analyst agent accesses data but cannot publish. Compliance agent flags violations but cannot modify analysis. Editor refines language but cannot change conclusions. Separation of concerns prevents unchecked error propagation.

Adversarial Review

For every bullish thesis, a dedicated agent builds the bear case — not a token objection, but systematic dismantling. The analyst must address each adversarial point with evidence before the thesis advances.

Self-Improving Error Log

Every mistake is recorded as a permanent rule. Next session loads accumulated rules before starting work. Over weeks, the error rate drops measurably — institutional memory encoded as explicit constraints.

Thesis Tracking

Opinions that evolve with evidence

Every thesis starts with a prior probability and a falsification condition. New evidence updates confidence through Bayesian inference. Stale theses decay automatically. We track, update, and retire systematically.

Prior Assignment

Every new thesis enters with an explicit probability — not "bullish" or "bearish," but a calibrated confidence level. You cannot hold a view without quantifying your confidence in it.

Evidence Accumulation

New data does not replace the thesis — it updates it. Each update logs the evidence, reasoning, and magnitude of adjustment. The thesis becomes a living document with complete evidential history.

Falsification Trigger

Every thesis must define its own death condition before publication — pre-committed exit criteria. If you cannot articulate what would prove you wrong, your thesis is not rigorous enough to publish.

Confidence Decay

A thesis without recent evidence runs on inertia. The system automatically decreases its confidence, flagging it for review. Prevents zombie theses — views valid six months ago but unexamined since.

Validation

Trust nothing until verified

Single-model output is never the final word. Every conclusion must survive adversarial review, cross-model verification, source independence checks, and quantitative stress testing before earning publication status.

Model Disagreement Detection

When two models analyze the same question and reach different conclusions, the system flags the disagreement for human resolution. Model disagreement is information, not noise.

Source Independence Check

Three supporting pieces of evidence may look strong — until you realize they share a common origin. The system traces provenance and flags non-independent evidence clusters.

Quantitative Stress Test

Key assumptions are perturbed and the system reports sensitivity bands. High-sensitivity conclusions get robustness warnings. Low-sensitivity conclusions earn higher confidence.

Consensus vs. Edge Mapping

When our internal thesis diverges significantly from sell-side consensus, the system triggers a structured review. The best research comes from understood disagreement, not accidental disagreement.

Cognitive Architecture

Mind × Model Mapping

Model-agnostic does not mean indifferent. Each cognitive mode has a computational form that fits it best. We use multiple models not for cost optimization — but because different cognitive tasks need different computational architectures.

Cognitive ModeCore QuestionModel AssignmentWhy This Model
Bayesian InferenceHow much should new evidence shift our confidence?Claude Opus 4.8Longest context preserves full prior-posterior chains
Causal ReasoningDid A cause B, or did they just co-occur?Claude Opus 4.8 + o3 cross-checkClaude builds causal narratives; o3 verifies counterfactual math
FalsificationUnder what conditions does this thesis die?Adversarial Agent (Claude Sonnet 4.6)Dedicated adversary needs speed and volume, not maximum depth
Information TheoryHow much genuine signal does this new data carry?Embedding ModelsVector distance measures novelty — far = high information gain
EconometricsDoes this statistical relationship survive controls?o3 / o4-miniPure math — regression, IV, hypothesis testing
Behavioral FinanceIs consensus rational or sentiment-driven?Gemini 3.5 Flash + ClaudeGemini scans sentiment at scale; Claude judges if deviation is actionable
Complex SystemsHow do micro behaviors emerge as macro patterns?Claude Opus 4.8 (long context)Must hold multiple layers simultaneously
Monte Carlo SimulationWhat does the probability distribution look like?o3 + Local Llama 4 / DeepSeek-R1o3 for probability math; Llama runs large-batch simulations locally
Adversarial ThinkingWhere is the weakest link if someone attacks this thesis?Two Claude instances in oppositionOne builds thesis, one dismantles — neither sees the other's reasoning
Meta-cognitionWhat systematic biases exist in our analytical process?Institutional Memory ProtocolNot model inference — system-level accumulated error patterns correct the pipeline
Intellectual Lineage

Every design decision traces to an intellectual tradition

This is not a bibliography. It is an intellectual genome — every design decision in our system traces back to a specific intellectual tradition.

Popper (1934) Falsification Popperian Exit Protocol
Every thesis must define its own death condition
Bayes (1763) → Black-Litterman (1992) Bayesian updating Bayesian Belief Network
Views are not right or wrong; they are probabilities
Pearl (2009) Causal graphs Causal Chain Visualizer
Correlation is not causation
Tetlock (2015) Calibrated forecasting Prior Calibration + Evidence Accumulation
Update incrementally, never all-in
Taleb (2007) Fat tails Sensitivity & Regime Analysis
Can your model survive a black swan?
Kahneman (1979) Cognitive bias Red Team Analysis + Contrarian Alpha
Are you sure it's not confirmation bias?
Shannon (1948) Information theory Signal-to-Noise Filtering
How much new information does this data actually carry?
Miller (1956) Working memory limits Cognitive Load Optimization
Both humans and AI have working memory ceilings
Black & Scholes (1973) → Heston (1993) Derivatives pricing Multi-Model Pricing Engine
Every instrument decomposes into risk factors; every risk factor has a model that fits it best
Brown / OpenAI (2024) Inference-time compute scaling Extended Thinking for Research
More compute at reasoning time beats more compute at training time
Ng (2024) Agentic design patterns Plan-Execute-Verify Architecture
Reflection, Tool Use, Planning, Multi-Agent Collaboration
TradingAgents / FactorMAD (2024-2025) Multi-agent debate Adversarial Review Pipeline
Bull vs Bear debate, not single-model consensus
Olah et al. / Anthropic (2024) Mechanistic interpretability Model Trust Infrastructure
Millions of interpretable features extracted via sparse autoencoders
Amodei (2025) Interpretability urgency Extended Thinking Audit Trail
If we can't see how the model thinks, we can't trust its conclusions
Foundational Literature

Academic Foundations

Every system component rests on a peer-reviewed academic tradition. These are the foundational works that shaped our analytical framework.

Bayesian Tradition

Bayes (1763) → Black-Litterman (1992) → Tetlock (2015). From the original probability theorem to portfolio view combination to calibrated forecasting — the mathematical foundation of our Thesis Tracking system.

Causal Inference

Granger (1969) → Rubin (1974) → Pearl (2009). From time-series causality to potential outcomes to causal graphs — the full toolkit for distinguishing correlation from causation.

Falsification & Scientific Method

Popper (1934) → Lakatos (1978) → Mayo (2018). Falsifiability as demarcation, research programme resilience, and how to design tests with genuine statistical power.

Risk & Uncertainty

Knight (1921) → Markowitz (1952) → Taleb (2007, 2012). From the fundamental risk-uncertainty distinction to mean-variance to fat tails and antifragility — stress testing is non-negotiable.

Cognition & Behavior

Simon (1955) → Miller (1956) → Kahneman (1979, 2011). Bounded rationality, working memory limits, systematic biases — cognitive constraints shared by humans and AI that drive our architecture.

AI & Computation

Shannon (1948) → Vaswani (2017) → Anthropic MCP (2024). From information theory to Transformers to RAG to Constitutional AI to Model Context Protocol — academic roots of every stack layer.

Data Engineering

Codd (1970) → Fama-French (1992) → Wickham (2014). Relational model, financial data cleaning standards, tidy data principles — the academic skeleton of our cross-market data layer.

Agentic AI & Financial Agents

MemGPT (2023) → TradingAgents (2024) → FactorMAD (2025). From dual-tier memory to multi-agent trading to factor mining debate — validating AI-assisted judgment, not AI-replaced judgment.

Asset Class Coverage

Beyond Equities — Full Spectrum Coverage

Investment research that only covers equities misses half the picture. Our infrastructure extends to derivatives, fixed income, foreign exchange, and structured products — each with purpose-matched models.

Futures & Derivatives

Commodity futures, equity index futures, interest rate futures, and swaps. We monitor term structure dynamics — contango, backwardation, roll yield — as structural signals. Track basis, warehouse receipts, and shipping data to distinguish speculation from fundamental shifts.

Options Analytics

Implied volatility surfaces, skew dynamics, term structure, and vol arbitrage signals. Beyond Black-Scholes — we apply Heston stochastic volatility and jump-diffusion to capture fat-tailed risk. The vol surface itself is a data source: skew changes signal institutional hedging demand.

Fixed Income & Bills

Government bonds, corporate bonds, commercial paper, CDs. Yield curve modeling is the analytical backbone — decomposing into level, slope, and curvature factors. Credit spread analysis layers fundamental assessment with CDS-implied default probabilities. For China, we track PBOC OMOs and MLF/LPR.

Foreign Exchange

G10 and EM currency pairs, NDFs, and currency swaps. Three-layer framework: short-term (positioning, sentiment, flows), medium-term (rate differentials, carry, policy divergence), structural (PPP, current account, reserve currency dynamics). For USD/CNY, we monitor PBOC fixing signals.

Cross-Asset Correlation

The most valuable signals emerge between asset classes. We monitor stock-bond correlations, commodity-currency linkages, and credit-equity divergences. Regime detection identifies when correlation structures break — these transitions concentrate the highest-conviction opportunities.

Structured Products

Convertible bonds, ABS, CLOs, and linked notes. Decomposition into component parts — a convertible is simultaneously a bond, call option, and credit instrument. Cash flow waterfall modeling, prepayment risk, subordination, and trigger events.

For infrastructure partners

This research stack is your capabilities under real production load

Every layer above — model orchestration, MCP data access, multi-cloud deployment — is not a demo. It ships published market research every day. We're looking for infrastructure partners who can provide compute and model access: you get evaluation under real load and an on-the-record deployment, we get the resources to push the research further.

01

A real, observable workload

Inference calls, context length, multi-model routing, cross-region latency — all happen inside a continuously running research pipeline, not a benchmark script.

02

A window facing outside readers

The output is published openly and on the record. The effect of your model and cloud capabilities is judged by independent readers, not internal metrics.

03

Concrete integration and feedback

On resource access, model evaluation, or wiring your capabilities into this pipeline — we are open to a concrete technical conversation.