Building a Cross-Border Data Layer with AKShare and a Global Data Provider

The Cross-Border Data Problem

Most research infrastructure is built for a single region. Traditional global terminals have limited China-data depth; domestic services like Wind or Choice are deep on China but shallow globally. Every cross-border researcher knows this fragmentation — they keep three terminals open and spend half their day moving data between them.

KSINQ’s data layer was built to solve this specific problem: how do you create a single analytical environment where Chinese and global market data are equally accessible, equally queryable, and equally integrated into the reasoning process?

The Architecture: Two Pillars

AKShare — The China Pillar. AKShare is an open-source financial data library that covers A-shares, Hong Kong equities, mainland futures, fund data (including QDII/LOF NAV and premium data), and Chinese macro-economic indicators from the National Bureau of Statistics, the central bank, and SAFE. We chose AKShare over commercial alternatives (Wind, Choice) for three reasons specific to our AI-native workflow.

First, API-first design. AKShare was built as a Python library, not a GUI terminal. It integrates naturally into programmatic workflows — a critical requirement when the “user” is an AI model making real-time data requests through MCP, not a human clicking through a desktop application. Commercial terminals have APIs, but they were designed as afterthoughts to terminal products. AKShare’s API is the product.

Second, open-source transparency. When our model queries AKShare for a company’s financial data, we can inspect the exact data source, the parsing logic, and the transformation pipeline. With proprietary terminals, the data is a black box — you get a number, but you cannot verify the chain from raw filing to displayed value. For a research process built on falsifiability, this transparency is not optional.

Third, cost structure. AI-native research workflows make orders of magnitude more data requests than human analysts. A workflow that queries 50 companies across 10 financial metrics for a sector screen generates 500 API calls in seconds. Commercial terminal licensing is priced for human usage patterns, not AI-scale throughput. AKShare’s open-source model eliminates this constraint.

Global Financial Data — The Global Pillar. The global pillar is accessed through a licensed cross-border financial data provider covering US and international equities, fixed income, derivatives, macro-economic indicators, credit ratings, and ESG data. Through MCP integration, Claude can query this feed directly — pulling US peer financials, global sector benchmarks, and macro indicators in the same analytical pass that queries AKShare for Chinese data.

The combination is greater than the sum of its parts. A single research conversation can start with AKShare data showing that a Chinese chemical company’s gross margins have expanded for three consecutive quarters, then pivot to global data showing that US peers’ margins are compressing over the same period, then ask: “What explains this divergence, and is it sustainable?” The model reasons across both datasets without the researcher switching tools, exporting files, or manually aligning data formats.

Solving the Hard Problems

Data normalization. Chinese companies report under Chinese Accounting Standards (CAS), which differ from US GAAP and IFRS in treatment of revenue recognition, lease accounting, and government subsidies. Our data layer includes a normalization module that adjusts for these differences when performing cross-border comparisons. The adjustments are context-dependent and sometimes require judgment calls that the model must flag for human review.

Temporal alignment. Chinese listed companies report semi-annually (with Q1/Q3 interim updates), while US companies report quarterly. Fiscal year-end dates differ. Our data layer handles temporal alignment by standardizing to trailing-twelve-month (TTM) metrics for comparability, and explicitly flagging when temporal misalignment exceeds one quarter.

QDII/LOF premium data. This is KSINQ’s most differentiated data capability. AKShare provides real-time and historical NAV data for QDII and LOF funds, which we combine with market price data to calculate premium/discount rates. This feeds directly into our QDII Premium Monitor tool and our cross-border research analysis.

Above the Data Layer

Clean, normalized data reaches Claude’s reasoning environment through the MCP protocol. How MCP works and why we chose it over a custom API is covered in MCP for Research.

Qualitative information — sell-side reports, news, academic literature — is ingested through a separate path. How Readwise plugs into the research flow via MCP is also in MCP for Research. Workflow orchestration (Dify pipelines, morning briefings) is in Research Workflow.

Limitations and Open Questions

This data layer is not perfect. A few known issues:

The judgment boundary of CAS-GAAP adjustments. The normalization module handles mechanical differences — lease capitalization, depreciation methods. But adjustments involving government subsidy classification or related-party transaction pricing require business judgment. The module flags these for human review. We have not found a way to fully automate this, and we are not looking for one in the near term — these judgment calls are part of where research value comes from.

AKShare’s uneven data quality. A-share main board data coverage and timeliness are solid, but ChiNext and BSE data occasionally has delays or gaps. Futures historical depth is also shallower than commercial terminals. Our approach: cross-validate critical data points. We do not treat AKShare as the single source of truth.

QDII premium latency. Premium/discount calculations depend on the time gap between NAV and market price. QDII fund NAVs are typically published T+1 or T+2, while market prices are real-time. The “real-time premium” we calculate is actually a lagging indicator. We label NAV timestamps explicitly in output so users can judge whether the lag affects their conclusions.