Building the World’s Most Trusted Legal AI — A Trans‑Pacific Conversation

Tokyo meets San Francisco as Joe, Chief AI Officer for LegalOn Japan, and Gabor, Vice President of AI for LegalOn U.S., exchange notes on frontier-scale models, benchmark battles, and why accuracy rules in legal tech. Their vision: an autonomous, memory‑rich assistant that lets lawyers work at the speed of thought. Together, they reveal how Japan’s leading contract platform is evolving into an AI-driven leader in legal technology for the global stage.

Gabor Melli – Vice President of AI, LegalOn Technologies US
Gabor is a veteran AI leader with 20+ years in machine learning, data science, and product engineering. He has built consumer- and enterprise-scale AI at companies such as Sony PlayStation (game & content-recommendation engines), VigLink (intelligent in-text ad placement), and OpenGov. Alongside his industry work, Gabor has authored numerous academic papers on semi-supervised learning and natural-language processing. Gabor joined LegalOn Technologies in August 2023 and was appointed to his current position at the same time.

Yiqing Zhu (Joe) – Chief AI Officer, LegalOn Technologies JP
Joe holds a B.S. in Software Engineering from Zhejiang University and an M.S. in Computer Science from University of Tsukuba. He began his career as a software engineer in Rakuten’s Search Group, then spent seven years at Indeed as a full-stack engineer, tech lead, and engineering manager, honing deep expertise in NLP and machine learning. Joe joined LegalOn Technologies in September 2023 and was appointed Chief AI Officer in April 2024.

Left: Gabor | Right: Joe
“We use Gemini to record,” Joe smiles. Gabor laughs: “An artificial-intelligence instrument—very modern!”

Why This Matters Now

Recorded during Gabor's visit to Tokyo (JP), April 2025.

Frontier models are moving fast, but legal AI demands benchmark‑proven accuracy, self‑improving prompts, and privacy‑aware memory—making this moment pivotal.

Joe: Let’s start with the big picture—why is now such a pivotal moment for legal AI?

Gabor: Because we’re riding a wave of frontier models. Our focus is really on the frontier, because one of our main concerns in legal is accuracy. Lawyers expect factual answers, not marketing demos.

Joe: Accuracy, yes—but candidates keep asking about hallucinations too. I don’t see a benchmark that truly captures legal yet. Most public scores focus on math or coding; they miss the edge cases that attorneys care about. That gap is why we built “LegalRikai”, the first open benchmark dataset that checks whether large language models can actually process legal tasks under Japanese laws and regulations.

Gabor: Exactly. Every three months, we run our own contract‑centric benchmark. The results come in waves—slow gains, then a big jump when a new family of models drops. Each jump forces us to re‑evaluate prompts, latency, and cost.

Joe: So it’s not just which model, it’s how we drive it. Recently, we’ve been letting the LLM rewrite its own prompt—recursive prompting. That self‑improvement loop means fewer manual tweaks as models get smarter.

Gabor: And yet, even the most innovative model can’t read a lawyer’s mind. Self-service users still struggle to craft an effective prompt. That’s why memory matters: We’re now evolving a memory of our users—what they said two weeks ago, their focus on construction‑law clauses. If we remember that, we can bias the answer toward relevance without extra work on their side.

Joe: Which brings security to the front. We store preferences, not proprietary data; users will be able to flip the right‑to‑be‑forgotten switch at any time. No hidden fine‑print.

Gabor: Frontier‑grade accuracy, domain‑specific benchmarks, self‑improving prompts, and privacy‑respectful memory. That mix is why this moment matters—and why building here feels like working three years in one.

Frontier AI Landscape

What does “the frontier” look like when you’re living on it?

Labs release new model families every few months, agent-focused startups flourish, and legal’s accuracy-latency-cost trade-off opens up unique opportunities.

Joe: We’ve been meeting with all the big labs lately—OpenAI, Anthropic, Google and more. What’s your read on where each is heading?

Gabor: They’re sprinting in parallel. Whoever offers the cleanest factual output wins our business.

Joe: DeepMind just dropped Gemini‑Diffusion—a text‑diffusion model instead of the usual autoregressive stack. I’ve never seen that, but it makes sense., could be a new wave.

Gabor: True. Every six months, we see a step‑change family release—GPT‑4o, Claude 3, Gemini 1.5 Flash. Each jump resets the playing field for cost, latency, and accuracy.

Joe: And the startup vibe in San Francisco? You said even the coffee‑shop chatter is full of the word “agent.”

Gabor: Definitely. Step into any downtown cafe and you’ll hear people swapping tips on “agent stacks” and “autonomous workflows.” Founders pitch tools that can plan, reason, and act for hours—or days—without a human touch. That autonomy is priceless—provided you can prove it’s safe.

Joe: That fixation on autonomy ties into another debate I keep hearing in ML circles: classic supervised fine-tuning versus reinforcement fine-tuning (RFT). Classic fine-tuning nudges a model with labeled examples, while RFT adds a second pass where the model gets rewarded for answers judged better by a custom scorer. Some friends swear RFT is more accurate, yet they still choose classic fine‑tuning because it’s cheaper.

Gabor: In legal AI, cost never trumps accuracy. Today’s RFT reward signals come from math and coding tasks, so they don’t capture legal nuance. Until we can reward “cites the right statute” or “matches contract-law precedent,” classic fine-tuning plus careful prompting wins on both accuracy and budget.

Joe: Until then, mixture‑of‑experts models will pour talent into easy‑reward areas, leaving niches like statute interpretation wide open for us. That could be our opportunity.

Gabor: Exactly. The giants race on general reasoning; we specialise where “it depends” is the default answer. That’s a moat (a defensible advantage).

Joe: Frontier AI isn’t a single model—it’s a moving stack of models, tuning methods, and agent frameworks. You’ll see the whole stack here, not just the glossy demo.

Gabor: And you’ll ship it to production, with real lawyers validating every release.

LegalOn’s Technical North Star

From benchmarks to memory: how do we decide what to build next?

Accuracy sits at the top; we push it via quarterly LegalRikai benchmarks, self‑tuning prompts, secure preference memory, and specialised legal tools.

Joe: Let’s pin down our North Star. What’s the one metric we refuse to compromise on?

Gabor: Accuracy—full stop. Our customers have a high bar of expectation for factual answers. In-house, we call it “contract‑safe output.”

Joe: And “LegalRikai” gives us the scoreboard. We open‑sourced it so anyone can see if an LLM really understands Japanese statutes.

Gabor: We re‑run that suite every quarter on each frontier model, plus our own pipelines. When we see a jump, we migrate fast.

Joe: But the metric alone isn’t enough; we need levers to move it. First lever: self‑improving prompts.

Gabor: Right. We let the LLM critique and rewrite its own prompt—recursive prompting. That shave‑off delivered a 10‑point gain on confidentiality‑clause accuracy.

Joe: Second lever: domain memory. We separate knowledge (public law) from preference (a client’s playbook: “30‑day payment terms only”).

Gabor: Preferences live in a private vector store under the customer’s tenancy. If they request a wipe, we can erase it in seconds.

Joe: Third lever: specialised tools—redlining, clause extraction, playbook lookup. Each gets its own unit test inside LegalRikai‑Plus so a regression shows up before a lawyer ever sees it.

Gabor: Stack those levers and the pyramid looks like this: Benchmark → Prompt Self‑Tuning → Memory → Tools → Security.

Joe: Or, in one sentence: Ship the most accurate answer possible, at human speed, without leaking a byte.

From Tools to Agents

Turning point: when single‑function tools learn to think, remember, and act.

By adding perception, memory, planning, and autonomy to single‑turn tools, we’re evolving them into multi‑step legal agents.

Joe: Last week I asked, What even counts as an agent? I sketched a framework with five dimensions—horizon length, context understanding, tool use, memory, and autonomy.

Gabor: I like that, though I slice it slightly differently: conversation, perception, action, planning, and memory. The more boxes you tick, the more agentic the system feels.

Joe: Today our redlining tool is single‑turn—“do A, then B.” But the moment it can plan a 20‑step review, that’s a long‑horizon agent.

Gabor: Planning needs language. “Do A and B unless C; escalate if D.” The agent must parse that and build a plan.

Joe: And to execute, it needs perception—events like “new file uploaded” or “deadline hits.” That’s coming with "Matter Management" module.

Gabor: Memory is the linchpin. “Lawyers expect the assistant to recall what they decided two days ago.” Short‑term session memory plus long‑term playbook memories.

Joe: Tool use is where legal gets tricky. Playbooks and Templates—both now available in LegalOn Cloud—act as specialised tools: review pulls Playbooks, drafting taps Templates. The agent must select the right asset at each step. Drafting involves pulling Templates; review involves pulling Playbooks; negotiation may require calling external search APIs. The agent must select the appropriate tool at each step.

Gabor: And autonomy lives on a slider: the less human in the loop, the higher the stakes. For contract drafting, we’ll start with human-approved, then graduate to auto‑draft simple NDAs.

Joe: So, where are we on that slider?

Gabor: We already have the conversational interface running in production. The next milestones are event awareness—so the assistant notices new files or looming deadlines—and a smarter memory that recalls past sessions and playbooks automatically. Deep planning and fully automated actions will follow, one careful step at a time.

Joe: In other words, from helpful tools to trusted agents—still lawyer‑in‑the‑loop, but doing 80 % of the grunt work.

Gabor: And every new dimension we unlock becomes a moat: few vendors can combine legal accuracy and multi-step agency without incurring significant risk.

Joe: That’s the fun part—we get to define what “legal agent” means before the industry catches up.

Beyond the Horizon

A candid glance twelve months out.

Within a year, longer-running agents and AI-accelerated coding will enable small teams to execute at an enterprise scale, while lawyers focus on edge cases.

Joe: Let’s jump ahead a year—how different will our tech stack feel?

Gabor: Reasoning windows will explode. One lab has just shown agents thinking for hours; some teams are already discussing research runs that last days or even weeks.

Joe: Longer runs demand more autonomy. If an assistant can keep that much context, it could shepherd a contract almost end‑to‑end before a lawyer ever steps in.

Gabor: But that autonomy only matters if we can ship faster. The frontier labs are already using their own models to write code; we have to do the same.

Joe: Right—LLMs won’t erase engineering jobs, they’ll multiply throughput. More features, shipped sooner.

Gabor: Exactly. We’re not short of work; we’re short of time. Agents that can draft code, generate tests, and triage bugs will let a ten‑person team move like fifty.

Joe: And that keeps our lead in the legal domain, where “it depends” is the default answer.

Gabor: Twelve months from now, I expect our assistant to plan, draft, redline, and file simple agreements with almost no hand‑holding, while lawyers focus on the edge cases.

Joe: A good year to be building—on both sides of the Pacific.

Note — Speed Beyond Code

Speed matters across the entire delivery chain — from idea to launch.

LLMs already trim coding and unit-test time, but coding accounts for only ~20 % of a typical release. Requirements, design, peer review, documentation, help pages, and launch assets still consume most of the calendar. Even if we make coding one-third faster, the whole timeline shrinks by barely ~13 %. That’s why our AI-powered Development Center of Excellence (AID CoE) is widening its scope — piloting copilots for user stories, test cases, release notes, and more. The goal: end-to-end AI support that lets product-centric teams cycle faster with fewer context switches.

Deep dive → https://tech.legalforce.co.jp/entry/2025/05/07/115000

AI-powered Development Center of Excellence (AID CoE) 発足！AI駆動開発で LegalOn Technologies の開発を加速する - LegalOn Technologies Engineering Blog

こんにちは、2025年2月から発足した CTO オフィスでリーダーをしている時武です。 LegalOn Technologies ではこの度、開発組織全体の生産性向上を目指す CTO オフィス主導のもと、「AI-powered Development Center of Excellence (AID CoE)」を新たに立ち上げ、初期的な活動として「Cursor」と「Devin」を全社に導入しました。本稿では、その背景、目的、そして私たちが AI 駆動開発にかける想いについてお伝えします。

https://tech.legalforce.co.jp/entry/2025/05/07/115000

The Road Ahead

Joe: Hard to believe our whole conversation took only thirty minutes.

Gabor: Frontier years feel like dog years—so much shifts in a quarter.

Joe: One constant: lawyers need answers they can trust.

Gabor: And we’re determined to give them an assistant that’s both fearlessly fast and painstakingly accurate.

LegalOn’s north star is clear: pair frontier‑grade AI reasoning with tight, lawyer‑in‑the‑loop safeguards, so every contract is drafted, reviewed, and filed faster and more accurately than ever before. Stay tuned to LegalOn Now for deeper dives and the next wave of product launches.

We are hiring!

LegalOn's development team is actively recruiting engineers!If you are interested, please feel free to contact our recruitment team or apply via the link below!

株式会社LegalOn Technologiesの募集・採用・求人情報 - Wantedly

株式会社LegalOn Technologiesの新卒・中途・インターンの募集が41件あります。気軽に面談して話を聞いてみよう。職種や採用形態からあなたにあった募集を見つけることができます。募集では「どんなことをやるのか」はもちろん、「なぜやるのか」「どうやるのか」や実際に一緒に働くメンバーについて知ることができます。

https://www.wantedly.com/companies/legalontechnologies/projects

Invitation from 株式会社LegalOn Technologies

If this story triggered your interest, have a chat with the team?

株式会社LegalOn Technologies's job postings

Building the World’s Most Trusted Legal AI — A Trans‑Pacific Conversation

堀次真梨子

株式会社LegalOn Technologies / 人事　採用

株式会社LegalOn Technologies

私たちは、AI分野における高度な技術力と法律・契約の専門知識を兼ね備えたグローバルリーガルAIカンパニーです。 2017年の設立当初から、AIを活用したリーガルAIサービスの開発に注力し、累計ラウンド総額約286億円の達成や、海外展開を加速するべく2022年に米国にグループ会社を設立するなど、驚異的なスピードで成長を続けています。現在は主力プロダクトとして、「LegalOn: World Leading Legal AI」を提供し、日本国内だけでなく米国と英国でリリースしました。また、複数の新規事業領域にも果敢に挑戦し、2025年1月には事業領域をコーポレート全体に拡大。AIカウンセル「CorporateOn」をリリースしました。自然言語処理分野における機械学習アルゴリズムや、大規模言語モデル（LLM）に加え、蓄積された情報を体系的に整理する"リーガルドキュメントグラフ"や自律的に計画・タスクを実行するAIエージェントなどの、AIを積極的に活用した機能を搭載しています。さらに、OpenAI, Inc.と戦略的連携を開始し、リーガルAIのリーダーとして先進的なユーザー体験と高度なソリューションの提供を実現していきます。 ▼会社や製品、開発組織について詳しくはこちらをご覧ください！ https://legalforce-recruit.notion.site/LegalOn-Technologies-3e114a8aecfb410a96424e34a0ed8bd6 会社ホームページ https://legalontech.jp/ サービスページ LegalOn： https://www.legalon-cloud.com/ CorporateOn：https://www.legalon-cloud.com/corporateon

Like 堀次真梨子's Story

Let 堀次真梨子's company know you're interested in their content

Building the World’s Most Trusted Legal AI — A Trans‑Pacific Conversation

Why This Matters Now

Frontier AI Landscape

LegalOn’s Technical North Star

From Tools to Agents

Beyond the Horizon

Note — Speed Beyond Code

The Road Ahead

We are hiring!

株式会社LegalOn Technologies

生成AI

Weekly ranking

Why This Matters Now

Frontier AI Landscape

LegalOn’s Technical North Star

From Tools to Agents

Beyond the Horizon