Blog / A Deep Dive into the Hidden Systems Behind Google Search

A Deep Dive into the Hidden Systems Behind Google Search

Uncovering live experiments, entity-based infrastructure, AI agents, and more.

Over the past few months, we’ve conducted a wide-ranging investigation into the inner workings of Google.
It led to major discoveries — some of which we reveal here.

While we can’t disclose everything, the information below provides a clearer understanding of how Google generates and ranks its results.


What ~1,200 Experiments Reveal About Google’s Inner Workings

We obtained a list of nearly 1,200 Google experiments, more than 800 of which were active as of June 2025.

This dataset confirms that many components referenced in the 2024 leaks — Mustang, Twiddlers, QRewrite, Tangram, QUS, and others — remain central to the system.

New intriguing codenames also surfaced: Harmony, Thor, Whisper, Moonstone, Solar, and more.

Particularly noteworthy are DeepNow (a successor to Google Now) with NowBoost, and SuperGlue, potentially replacing Glue — NavBoost’s equivalent for universal search.


Word Cloud: 1,200 Experiments

Unlike most websites, which receive a redesign every 3–5 years, Google evolves continuously.
There is no big “new release” — only a stream of microchanges moving from experiment to launch to full integration.

This explains the layered nature of the experiment list: older tests sit next to new ones, some already on their 15th iteration (e.g., MagiCotRev15Launch).

This incremental approach reduces risk — failed tests affect only a small portion of users — and enables innovation at a pace traditional redesigns cannot match.

Covered Areas Include:

  • AI: numerous Magi and AIM (AI Mode) variants
  • Shopping: over 50 dedicated experiments
  • Verticals: sports, finance, weather, travel, and more

Each vertical is assigned its own domain, such as:

  • ShoppingOverlappingDomain
  • TravelOverlappingDomain
  • SportsOverlappingDomain

This points to an architecture where each team operates in its own testing environment, enabling parallel experimentation without conflicts.

See the full list of experiments


Entities Everywhere

Entities play a central role throughout Google’s ecosystem — a topic thoroughly explored at the “Entities Everywhere” talk by Damien Andell and Sylvain Deauré of 1492.vision earlier this year in Marseille.

The Knowledge Graph — Google’s Central Nervous System

Research shows that the Knowledge Graph is far more than a sidebar widget.
It functions as the nervous system of Google: powering Search, Discover, YouTube, Maps, Assistant, Gemini, and AI Overviews.

At the core lies Livegraph, which assigns confidence weights to data triples before integrating them into the graph.

Namespace Hierarchy:

  • kc: Highly verified data (e.g., official records)
  • ss: Web-extracted “webfacts”
  • hw: Manually curated information

These labels aren’t cosmetic — they directly affect the confidence of facts and their use across Google services.


Ghost Entities and Real-Time Adaptation

One of the most fascinating discoveries: ghost entities — temporary data structures that lack stable IDs.

They allow Google to:

  • Dynamically generate new entities
  • Validate them progressively
  • Surface them in real-time results

Supporting this are SAFT and WebRef — systems (revealed in 2024 leaks) that extract, classify, and link entities to form a semantic map of the web.


SEO and Entity Validation

Key takeaway for SEOs: your brand must exist as a validated entity in Google’s expanding Knowledge Graph.

The 2024 leaks revealed how Google vectors entire sites, calculating thematic metrics like siteFocusScore and NSR that penalize scattered content.

Chrome browsing data also plays a role:

  • Updating trust signals
  • Tracking emerging trends
  • Identifying visited entities

In this world, content volume matters less than whether your site is a well-connected entity in a thematic graph.

More in “Entities Everywhere: The Knowledge Graph, the Invisible Architecture of the Google Empire” by 1492.vision.


Google AI Mode: 90 Projects and a Constellation of Agents

Recent discoveries revealed an internal Google debug menu (visible only via corporate network or VPN).
The May 28, 2025 version lists nearly 90 projects — up by 40 from earlier iterations.

A Constellation of Specialized Agents

Google is not building one universal assistant but many highly focused agents:

  • MedExplainer — health
  • Travel Agent, Flight Deals — travel
  • Neural Chef, Food Analyzer, Smart Recipe — cooking
  • News Digest, Daily Brief — news
  • Shopping AI Studio — commerce

Project Magi: The Backbone of AI Mode

More than 50 experiments belong to Project Magi:

  • MagiModelLayerDomain: core infrastructure
  • MagitV2p5Launch: aligned with Gemini 2.5
  • SuperglueMagiAlignment: tracks user interaction patterns

The most advanced: MagitCotRev15Launch — 15th iteration.
It uses Chain-of-Thought reasoning:

Reflect → Research → Read → Synthesize → Polish


AIM (AI Mode) and the New UI

The AIM project focuses on user interfaces:

  • AimLhsOverlay: AI-powered sidebar
  • SbnAimEntrypoints: converts the “I’m Feeling Lucky” button into an AI entry
  • Google’s logo itself becomes interactive

Also emerging: StatefulJourney and ContextBridge — signaling a shift from isolated queries to full conversational sessions.

See full AI Mode Debug Menu


SEO Takeaways from AI Mode

  • Hyper-specialization: content must match the expertise of specific agents
  • Multimodality: text, images, video, structured data all feed into AI agents
  • Deep personalization: user context across sessions trumps single-query optimization

See “AI at the Heart of Google’s Strategy” by RESONEO for more.


Profiling Engine: Smile, You’re Being Embedded!

Our investigation revealed Google’s profiling system: every digital interaction becomes a vector — a mathematical embedding of your identity.

At its core: Nephesh — Google’s universal user-embedding system.

It generates behavior-based vectors across all Google products.

As per the 2024 leaks:

  • Scores whether a user is “typical” or “atypical”
  • Predicts engagement by matching user and content vectors

Picasso and VanGogh — Dual Embedding for Discover

  • Picasso: long-term memory (STAT for recent interest, LTAT for long-term passions)
  • VanGogh: runs on-device, capturing real-time signals (scrolling, queries, device state)

Together, they balance immediate needs and persistent interests.


A Constellation of Embeddings

  • Vertical: podcasts, video, travel, shopping
  • Temporal: real-time, short-term, long-term
  • Contextual: adapting to environment and situation

Google’s HULK system pushes behavioral analysis further:

It detects if you’re:

  • IN_VEHICLE
  • ON_BICYCLE
  • ON_STAIRS
  • IN_ELEVATOR
  • SLEEPING

It tracks places like SEMANTIC_HOME and SEMANTIC_WORK to personalize results based on predicted destinations.

More in “Smile, You’re Being Embedded!” by 1492.vision.


Query Understanding: Expansion and Real-Time Scoring

We’ve uncovered how Google expands and evaluates queries in real time.

Example: cycling tour france
→ Combines cycling tourcyclingtour, expands to bike, bicycle, trips

Special markers appear:

  • iv;p: in-verbatim matches
  • iv;d: linguistic derivations

Geographic Intelligence

Query: nail salon fort lauderdale 17th street

  • Category: geo:ypcat:manicuring
  • Zone code: geo;88d850000000000
  • Address variation expansion
  • On-the-fly translation based on local intent

Confirmed: 2024-leaked architecture is still active:

GWS → Superroot → QUS → QBST


Real-Time Term Scoring

Each term receives 0–10 points per URL:

  • Stopwords ignored
  • Title terms get bonuses
  • Named entities often score highest

Scores are context-dependent — the same term may score differently depending on the query.

This aligns with the Salient Terms Process using virtualTf, idf, salience, and more.

While not used for final ranking (NavBoost and freshness dominate), these scores shape real-time interpretation.

More in “Uncovering Google’s Query Expansion System” by RESONEO.


All information is derived from publicly available sources without bypassing access controls. It is published solely for informational purposes.

This content was prepared based on content from Search Engine Land.