Research Infrastructure

Models

Multiple models, one research mind

Six model providers, orchestrated into one collaborative research system. Reasoning, visual parsing, mathematical verification, text preprocessing — different cognitive tasks route to the best-fit model, outputs converge into a single analytical pipeline.

Meta Llama 4 (Scout / Maverick)

Open-source. General-purpose local inference and data sovereignty. MoE architecture with long context; sensitive financial data runs on local instances — no API calls, no data transmission, full audit trail. Fallback inference during API outages.

DeepSeek-R1 / V3

Open-source. Pure RL-trained reasoning model matching o1 on math benchmarks. Locally deployed for quantitative verification: valuation auditing, hypothesis testing, scenario probability calculations (the V4 preview extends the context window further).

Google Gemma 4

Open-source. Lightweight local model for preprocessing, classification, and embedding. News filtering, sentiment tagging, metadata extraction — concentrating compute budget on the analytical core.

OpenAI GPT-5.5 / o3

Proprietary. The o3 reasoning model handles multi-step math verification: option pricing, probability-weighted scenario analysis, sensitivity calculations. GPT-5.5 for multimodal parsing — earnings slides with embedded charts, scanned regulatory filings. Computation and parsing, not prose.

Claude (Opus 4.8 / Sonnet 4.6)

Proprietary. Primary reasoning engine with extended thinking for auditable reasoning chains. 1M token context processes full annual reports plus multiple sell-side reports in a single pass. Precision at critical decision points.

Mistral Medium 3.5

Open-weight. Preferred for European compliance scenarios with strong multilingual capabilities. A single dense model fusing reasoning, multimodal, and code abilities with a 256K context — fit for medium-complexity batch analysis tasks.

Deep dives

Data

AI talks directly to market data

Through MCP, AI queries data sources autonomously. Four-layer denoising and cleaning ensures data reaching the reasoning models is clean, aligned, and leakage-free.

AKShare + MCP

China market data core — A-shares, HK equities, futures, funds, macro indicators. Open-source, API-native, queried by AI in real-time via MCP.

Cross-Market Data Layer

Global equities, fixed income, derivatives, ESG data. Cross-border comparisons require normalization across different accounting standards and trading calendars.

Data Denoising Pipeline

Four-layer cleaning: RMT covariance denoising → delisting bias correction → outlier detection (distinguishing flash crashes from bad ticks) → cross-border temporal alignment.

Anti-Leakage & Normalization

Strict point-in-time discipline: the system only uses information knowable at each historical moment. Cross-border accounting normalization (CAS vs GAAP vs IFRS), ambiguous cases flagged for human review.

Research Ingestion

Auto-collecting sell-side reports, news, papers, social signals. Parsed, deduplicated, tagged with metadata, vectorized for semantic retrieval.

Workflow Orchestration

Connecting models, data sources, and tools into repeatable automated pipelines. Each workflow is a directed acyclic graph — signal detection triggers context assembly, context feeds multi-model analysis.

Deep dives

Cloud & Compute

Where the research runs: multi-cloud

Cross-border finance demands specific cloud architecture: data sovereignty compliance, cross-region latency optimization, and multi-cloud redundancy.

Amazon Web Services

Global infrastructure backbone. Bedrock for managed Claude deployment, Data Exchange for institutional market feeds, cross-region data redundancy.

Google Cloud Platform

BigQuery for large-scale market data analysis and backtesting. Vertex AI for custom model training. Cloud Run for serverless MCP server deployment.

Cloudflare

Edge-first deployment. Pages for global static delivery, Workers for edge compute, R2 for egress-free object storage. The public-facing research platform runs entirely on Cloudflare's edge network.

CI/CD & Infrastructure as Code

Every code change triggers: build verification, content compliance scan, visual regression testing, staged deployment. Environment configuration is version-controlled, auditable, and reproducible. Rollback to any previous state in under 60 seconds.

Deep dives

→ Cloud Architecture for Cross-Border Finance: Compliance, Latency, and Cost

Agents

Specialized roles that challenge each other

A single AI cannot handle the full research lifecycle. Nine production agents — analysis, writing, challenging, fact-checking, compliance review — each operates in isolated context.

Plan-Execute-Verify Loop

Every research task follows a three-phase cycle. Planning iterates through multiple reviews, execution runs in parallel, verification enforces deterministic quality gates. No output reaches publication without passing all three phases.

Role-Specialized Agents

The analyst agent accesses data but cannot publish. Compliance agent flags violations but cannot modify analysis. Editor refines language but cannot change conclusions. Separation of concerns prevents unchecked error propagation.

Adversarial Review

For every bullish thesis, a dedicated agent builds the bear case — not a token objection, but systematic dismantling. The analyst must address each adversarial point with evidence before the thesis advances.

Self-Improving Error Log

Every mistake is recorded as a permanent rule. Next session loads accumulated rules before starting work. Over weeks, the error rate drops measurably — institutional memory encoded as explicit constraints.

Thesis Tracking

Opinions that evolve with evidence

Every thesis starts with a prior probability and a falsification condition. New evidence updates confidence through Bayesian inference. Stale theses decay automatically. We track, update, and retire systematically.

Prior Assignment

Every new thesis enters with an explicit probability — not "bullish" or "bearish," but a calibrated confidence level. You cannot hold a view without quantifying your confidence in it.

Evidence Accumulation

New data does not replace the thesis — it updates it. Each update logs the evidence, reasoning, and magnitude of adjustment. The thesis becomes a living document with complete evidential history.

Falsification Trigger

Every thesis must define its own death condition before publication — pre-committed exit criteria. If you cannot articulate what would prove you wrong, your thesis is not rigorous enough to publish.

Confidence Decay

A thesis without recent evidence runs on inertia. The system automatically decreases its confidence, flagging it for review. Prevents zombie theses — views valid six months ago but unexamined since.

Validation

Trust nothing until verified

Single-model output is never the final word. Every conclusion must survive adversarial review, cross-model verification, source independence checks, and quantitative stress testing before earning publication status.

Model Disagreement Detection

When two models analyze the same question and reach different conclusions, the system flags the disagreement for human resolution. Model disagreement is information, not noise.

Source Independence Check

Three supporting pieces of evidence may look strong — until you realize they share a common origin. The system traces provenance and flags non-independent evidence clusters.

Quantitative Stress Test

Key assumptions are perturbed and the system reports sensitivity bands. High-sensitivity conclusions get robustness warnings. Low-sensitivity conclusions earn higher confidence.

Consensus vs. Edge Mapping

When our internal thesis diverges significantly from sell-side consensus, the system triggers a structured review. The best research comes from understood disagreement, not accidental disagreement.

Cognitive Architecture

Mind × Model Mapping

Model-agnostic does not mean indifferent. Each cognitive mode has a computational form that fits it best. We use multiple models not for cost optimization — but because different cognitive tasks need different computational architectures.

Cognitive Mode	Core Question	Model Assignment	Why This Model
Bayesian Inference	How much should new evidence shift our confidence?	Claude Opus 4.8	Longest context preserves full prior-posterior chains
Causal Reasoning	Did A cause B, or did they just co-occur?	Claude Opus 4.8 + o3 cross-check	Claude builds causal narratives; o3 verifies counterfactual math
Falsification	Under what conditions does this thesis die?	Adversarial Agent (Claude Sonnet 4.6)	Dedicated adversary needs speed and volume, not maximum depth
Information Theory	How much genuine signal does this new data carry?	Embedding Models	Vector distance measures novelty — far = high information gain
Econometrics	Does this statistical relationship survive controls?	o3 / o4-mini	Pure math — regression, IV, hypothesis testing
Behavioral Finance	Is consensus rational or sentiment-driven?	Gemma 4 + Claude	Gemma 4 scans sentiment at scale; Claude judges if deviation is actionable
Complex Systems	How do micro behaviors emerge as macro patterns?	Claude Opus 4.8 (long context)	Must hold multiple layers simultaneously
Monte Carlo Simulation	What does the probability distribution look like?	o3 + Local Llama 4 / DeepSeek-R1	o3 for probability math; Llama runs large-batch simulations locally
Adversarial Thinking	Where is the weakest link if someone attacks this thesis?	Two Claude instances in opposition	One builds thesis, one dismantles — neither sees the other's reasoning
Meta-cognition	What systematic biases exist in our analytical process?	Institutional Memory Protocol	Not model inference — system-level accumulated error patterns correct the pipeline

Intellectual Lineage

Every design decision traces to an intellectual tradition

This is not a bibliography. It is an intellectual genome — every design decision in our system traces back to a specific intellectual tradition.

Popper (1934) → Falsification → Popperian Exit Protocol

Every thesis must define its own death condition

Bayes (1763) → Black-Litterman (1992) → Bayesian updating → Bayesian Belief Network

Views are not right or wrong; they are probabilities

Pearl (2009) → Causal graphs → Causal Chain Visualizer

Correlation is not causation

Tetlock (2015) → Calibrated forecasting → Prior Calibration + Evidence Accumulation

Update incrementally, never all-in

Taleb (2007) → Fat tails → Sensitivity & Regime Analysis

Can your model survive a black swan?

Kahneman (1979) → Cognitive bias → Red Team Analysis + Contrarian Alpha

Are you sure it's not confirmation bias?

Shannon (1948) → Information theory → Signal-to-Noise Filtering

How much new information does this data actually carry?

Miller (1956) → Working memory limits → Cognitive Load Optimization

Both humans and AI have working memory ceilings

Black & Scholes (1973) → Heston (1993) → Derivatives pricing → Multi-Model Pricing Engine

Every instrument decomposes into risk factors; every risk factor has a model that fits it best

Brown / OpenAI (2024) → Inference-time compute scaling → Extended Thinking for Research

More compute at reasoning time beats more compute at training time

Ng (2024) → Agentic design patterns → Plan-Execute-Verify Architecture

Reflection, Tool Use, Planning, Multi-Agent Collaboration

TradingAgents / FactorMAD (2024-2025) → Multi-agent debate → Adversarial Review Pipeline

Bull vs Bear debate, not single-model consensus

Olah et al. / Anthropic (2024) → Mechanistic interpretability → Model Trust Infrastructure

Millions of interpretable features extracted via sparse autoencoders

Amodei (2025) → Interpretability urgency → Extended Thinking Audit Trail

If we can't see how the model thinks, we can't trust its conclusions

Foundational Literature

Academic Foundations

Every system component rests on a peer-reviewed academic tradition. These are the foundational works that shaped our analytical framework.

Bayesian Tradition

Bayes (1763) → Black-Litterman (1992) → Tetlock (2015). From the original probability theorem to portfolio view combination to calibrated forecasting — the mathematical foundation of our Thesis Tracking system.

Causal Inference

Granger (1969) → Rubin (1974) → Pearl (2009). From time-series causality to potential outcomes to causal graphs — the full toolkit for distinguishing correlation from causation.

Falsification & Scientific Method

Popper (1934) → Lakatos (1978) → Mayo (2018). Falsifiability as demarcation, research programme resilience, and how to design tests with genuine statistical power.

Risk & Uncertainty

Knight (1921) → Markowitz (1952) → Taleb (2007, 2012). From the fundamental risk-uncertainty distinction to mean-variance to fat tails and antifragility — stress testing is non-negotiable.

Cognition & Behavior

Simon (1955) → Miller (1956) → Kahneman (1979, 2011). Bounded rationality, working memory limits, systematic biases — cognitive constraints shared by humans and AI that drive our architecture.

AI & Computation

Shannon (1948) → Vaswani (2017) → Anthropic MCP (2024). From information theory to Transformers to RAG to Constitutional AI to Model Context Protocol — academic roots of every stack layer.

Data Engineering

Codd (1970) → Fama-French (1992) → Wickham (2014). Relational model, financial data cleaning standards, tidy data principles — the academic skeleton of our cross-market data layer.

Agentic AI & Financial Agents

MemGPT (2023) → TradingAgents (2024) → FactorMAD (2025). From dual-tier memory to multi-agent trading to factor mining debate — validating AI-assisted judgment, not AI-replaced judgment.

View complete bibliography →

Asset Class Coverage

Beyond Equities — Full Spectrum Coverage

Investment research that only covers equities misses half the picture. Our infrastructure extends to derivatives, fixed income, foreign exchange, and structured products — each with purpose-matched models.

Futures & Derivatives

Commodity futures, equity index futures, interest rate futures, and swaps. We monitor term structure dynamics — contango, backwardation, roll yield — as structural signals. Track basis, warehouse receipts, and shipping data to distinguish speculation from fundamental shifts.

Options Analytics

Implied volatility surfaces, skew dynamics, term structure, and vol arbitrage signals. Beyond Black-Scholes — we apply Heston stochastic volatility and jump-diffusion to capture fat-tailed risk. The vol surface itself is a data source: skew changes signal institutional hedging demand.

Fixed Income & Bills

Government bonds, corporate bonds, commercial paper, CDs. Yield curve modeling is the analytical backbone — decomposing into level, slope, and curvature factors. Credit spread analysis layers fundamental assessment with CDS-implied default probabilities. For China, we track PBOC OMOs and MLF/LPR.

Foreign Exchange

G10 and EM currency pairs, NDFs, and currency swaps. Three-layer framework: short-term (positioning, sentiment, flows), medium-term (rate differentials, carry, policy divergence), structural (PPP, current account, reserve currency dynamics). For USD/CNY, we monitor PBOC fixing signals.

Cross-Asset Correlation

The most valuable signals emerge between asset classes. We monitor stock-bond correlations, commodity-currency linkages, and credit-equity divergences. Regime detection identifies when correlation structures break — these transitions concentrate the highest-conviction opportunities.

Structured Products

Convertible bonds, ABS, CLOs, and linked notes. Decomposition into component parts — a convertible is simultaneously a bond, call option, and credit instrument. Cash flow waterfall modeling, prepayment risk, subordination, and trigger events.

For infrastructure partners

This research stack puts your capabilities under real production load

Every layer above — model orchestration, MCP data access, multi-cloud deployment — is not a demo. It ships published market research every day. We're looking for infrastructure partners who can provide compute and model access: you get evaluation under real load and an on-the-record deployment, we get the resources to push the research further.

A real, observable workload

Inference calls, context length, multi-model routing, cross-region latency — all happen inside a continuously running research pipeline, not a benchmark script.

A window facing outside readers

The output is published openly and on the record. The effect of your model and cloud capabilities is judged by independent readers, not internal metrics.

Concrete integration and feedback

On resource access, model evaluation, or wiring your capabilities into this pipeline — we are open to a concrete technical conversation.

Get in touch → [email protected]

Decades of knowledge, encoded into AI-native infrastructure

Multiple models, one research mind

Meta Llama 4 (Scout / Maverick)

DeepSeek-R1 / V3

Google Gemma 4

OpenAI GPT-5.5 / o3

Claude (Opus 4.8 / Sonnet 4.6)

Mistral Medium 3.5

AI talks directly to market data

AKShare + MCP

Cross-Market Data Layer

Data Denoising Pipeline

Anti-Leakage & Normalization

Research Ingestion

Workflow Orchestration

Where the research runs: multi-cloud

Amazon Web Services

Google Cloud Platform

Cloudflare

CI/CD & Infrastructure as Code

Specialized roles that challenge each other

Plan-Execute-Verify Loop

Role-Specialized Agents

Adversarial Review

Self-Improving Error Log

Opinions that evolve with evidence

Prior Assignment

Evidence Accumulation

Falsification Trigger

Confidence Decay

Trust nothing until verified

Model Disagreement Detection

Source Independence Check

Quantitative Stress Test

Consensus vs. Edge Mapping

Mind × Model Mapping

Every design decision traces to an intellectual tradition

Academic Foundations

Bayesian Tradition

Causal Inference

Falsification & Scientific Method

Risk & Uncertainty

Cognition & Behavior

AI & Computation

Data Engineering

Agentic AI & Financial Agents

Beyond Equities — Full Spectrum Coverage

Futures & Derivatives

Options Analytics

Fixed Income & Bills

Foreign Exchange

Cross-Asset Correlation

Structured Products

This research stack puts your capabilities under real production load

A real, observable workload

A window facing outside readers

Concrete integration and feedback