PortfolioPilot: Mohith G

The setup

In June 2021, I joined Global Predictions as the first engineer on the team. The product hypothesis was sharp, and it was not “an AI chatbot for finance”. It was: build a real fiduciary financial advisor for retail investors, with hedge-fund-grade quantitative substance behind every recommendation. Macro views, multi-model risk and return forecasting, the kind of analysis a private wealth manager pays for. The AI’s role was the surface, not the brain. Translate dense statistical output into something a normal investor could read, understand, and act on.

PortfolioPilot would be a registered investment adviser under the SEC. A real fiduciary, not a chatbot pretending to give financial advice. The team was a handful of finance and ML PhDs. There was no codebase yet. There was no product.

Five years later, PortfolioPilot.com is SEC-registered and advises hundreds of thousands of investors on tens of billions of dollars of household wealth. Underneath: a multi-model quantitative engine that has been built and refined for half a decade (the specifics are under NDA; the surface area is broad). Above it: the AI layer that turns the engine’s output into language a user can act on. Mobile apps, an iOS-first investment-management product, a public API, GPTs, a daily AI advisor. I have shipped or co-shipped basically every layer of both halves.

I’m still here.

/ what i owned

Bootstrapped the entire codebase: Python services, a TypeScript monorepo, the data ingestion DAGs, the deployment pipeline.
Designed and built the financial-data ingestion layer: dozens of vendor and broker integrations, normalized into one canonical model.
Built large parts of the quantitative engine architecture (specifics under NDA): macro signal pipelines, multi-model orchestration, the plumbing that runs hedge-fund-grade math at retail scale.
Built the recommendation engine that turns engine output into the personalized actions users see daily: tax-aware, account-aware, holdings-aware.
Shipped the AI translation layer: one of the first wave of ChatGPT plugins (early 2023), the GPTs migration, then the custom agent runtime we run today. The AI's job is making the math legible, not making the math.
Real-time advice streaming UI: tool-calling agents, stepwise reasoning visible, citations to the source data the engine ran on.
Production infra: AWS, Kubernetes, Postgres, Celery, Redis, observability, costs.

/ architecture

PortfolioPilot stack: quant engine + AI surface

★ built from scratch ◆ owned ◇ shaped / refactored ○ inherited / pre-existing

The early ChatGPT plugins moment

In early 2023, OpenAI announced ChatGPT plugins: a way for ChatGPT to call your API and answer questions over your data. We shipped the PortfolioPilot plugin in the first wave. Users could open ChatGPT and ask “how is my portfolio doing?” and get a real personalized answer, charts and all, with the quant engine underneath.

When OpenAI deprecated plugins in favor of GPTs, we migrated. When GPTs gave way to the broader agent ecosystem, we built our own runtime: same tool definitions, same translation contracts, but on infra we control. The prompts have evolved through three model generations now. The tools haven’t, much. I designed them to be agent-runtime-agnostic from day one because I expected the substrate to keep shifting. It did.

/ a hard problem, solved

How do you make hedge-fund-grade quantitative analysis legible to a retail investor without dumbing it down?

The substance of PortfolioPilot is genuinely complex. Macro views fold into multi-model risk forecasts. Risk forecasts fold into portfolio-level recommendations. Recommendations fold into user-specific actions, filtered by tax constraints, account types, and holdings. A typical recommendation a user sees is the output of a chain of statistical computations that, written out, would fill a page of a finance textbook.

The legibility problem is not “translation” in the easy sense. It is: how do you preserve enough nuance that the advice is correct, while compressing enough that the user acts on it? Lose nuance and you give bad advice. Lose compression and the user closes the tab.

The answer, after several iterations, is that the chain has to be coherent end-to-end. Every layer annotates why. The quant engine emits not just a recommendation but the signals that drove it. The recommendation engine forwards those signals into the personalization step. The AI layer reads the annotated chain and writes a few paragraphs of plain English with the right hedges in the right places. None of those steps independently is the hard part. The hard part is making sure the macro view, the model output, the recommendation, and the AI translation all say the same thing about the same portfolio at the same moment. That coherence is the product.

That coherence is also the part that does not show up in any one piece of code. It is enforced across every pull request, by people who care, and by tooling that catches when the chain drifts. (Yes, this includes a serious test bench for the AI side. Evals matter. They are one of the things that keep advice correct, alongside instrumented engine output, type-safe contracts between layers, and a small army of synthetic portfolios we replay every release. I don’t lead with evals because they’re the visible piece. The invisible coherence is what actually ships.)

Outcome

PortfolioPilot today is the AI surface millions of investors have first encountered AI financial advice through. It’s been featured in major financial press, has live integrations with most of the brokerages a US investor would use, and in 2025 became the first AI advisor that the big banks treat as a real competitor rather than a curiosity. The quant engine that powers it has been refined across half a decade and is, by any honest measure I have access to, the most ambitious retail-facing financial analysis system that runs at retail-scale costs.

I’m still here, still shipping.

What I’d do differently

I’d treat the AI’s vocabulary as a contract, not an output.

We built the AI as a translation layer over the quant engine, which was the right call architecturally. What I underweighted for too long is that the words the AI is allowed to say are themselves a contract on the engine. If the AI tells a user “consider tax-loss harvesting,” then tax-loss harvesting has to be something the engine can identify, quantify, justify, and audit. If the AI says “your portfolio is over-concentrated in semiconductors,” the engine has to be able to defend that classification on demand, in a way a regulator would accept. The AI’s vocabulary is, in effect, a hidden API on the underlying system.

For a long time that contract lived in nobody’s repo. The quant team would ship a new model output and the AI team would invent words for it; once a quarter we’d discover the AI had been recommending something the engine couldn’t fully verify, or the engine had been computing something the AI never surfaced. Both directions of that asymmetry are bad in different ways: one creates compliance risk, the other wastes the engine’s best work. The fix, when we wrote it down, was a single document both teams co-owned: every concept the AI is permitted to use, what engine signal it maps to, what disclaimers it travels with, what regulatory constraints attach to it. The document should have existed on day one, before the first prompt.

The generalizable lesson, which I now hunt for in any AI product: when two teams compose into one user-visible artifact, the words of that artifact are an API between them. The team that ships the surface and the team that ships the substance need to agree on the vocabulary before either of them writes a line of code. In a regulated domain, especially before either of them writes a line of code.

Or jump anywhere on the timeline: IntuitionAI → Domino · Navya · Rippling · Codebrahma · AthenaHealth