Blog / A Deep Dive into the Hidden Systems Behind Google Search
A Deep Dive into the Hidden Systems Behind Google Search
Uncovering live experiments, entity-based infrastructure, AI agents, and more.
Over the past few months, we’ve conducted a wide-ranging investigation into the inner workings of Google.
It led to major discoveries — some of which we reveal here.
While we can’t disclose everything, the information below provides a clearer understanding of how Google generates and ranks its results.
What ~1,200 Experiments Reveal About Google’s Inner Workings
We obtained a list of nearly 1,200 Google experiments, more than 800 of which were active as of June 2025.
This dataset confirms that many components referenced in the 2024 leaks — Mustang, Twiddlers, QRewrite, Tangram, QUS, and others — remain central to the system.
New intriguing codenames also surfaced: Harmony, Thor, Whisper, Moonstone, Solar, and more.
Particularly noteworthy are DeepNow (a successor to Google Now) with NowBoost, and SuperGlue, potentially replacing Glue — NavBoost’s equivalent for universal search.
Word Cloud: 1,200 Experiments
Unlike most websites, which receive a redesign every 3–5 years, Google evolves continuously.
There is no big “new release” — only a stream of microchanges moving from experiment to launch to full integration.
This explains the layered nature of the experiment list: older tests sit next to new ones, some already on their 15th iteration (e.g., MagiCotRev15Launch).
This incremental approach reduces risk — failed tests affect only a small portion of users — and enables innovation at a pace traditional redesigns cannot match.
Covered Areas Include:
- AI: numerous Magi and AIM (AI Mode) variants
- Shopping: over 50 dedicated experiments
- Verticals: sports, finance, weather, travel, and more
Each vertical is assigned its own domain, such as:
ShoppingOverlappingDomain
TravelOverlappingDomain
SportsOverlappingDomain
This points to an architecture where each team operates in its own testing environment, enabling parallel experimentation without conflicts.
See the full list of experiments
Entities Everywhere
Entities play a central role throughout Google’s ecosystem — a topic thoroughly explored at the “Entities Everywhere” talk by Damien Andell and Sylvain Deauré of 1492.vision earlier this year in Marseille.
The Knowledge Graph — Google’s Central Nervous System
Research shows that the Knowledge Graph is far more than a sidebar widget.
It functions as the nervous system of Google: powering Search, Discover, YouTube, Maps, Assistant, Gemini, and AI Overviews.
At the core lies Livegraph, which assigns confidence weights to data triples before integrating them into the graph.
Namespace Hierarchy:
- kc: Highly verified data (e.g., official records)
- ss: Web-extracted “webfacts”
- hw: Manually curated information
These labels aren’t cosmetic — they directly affect the confidence of facts and their use across Google services.
Ghost Entities and Real-Time Adaptation
One of the most fascinating discoveries: ghost entities — temporary data structures that lack stable IDs.
They allow Google to:
- Dynamically generate new entities
- Validate them progressively
- Surface them in real-time results
Supporting this are SAFT and WebRef — systems (revealed in 2024 leaks) that extract, classify, and link entities to form a semantic map of the web.
SEO and Entity Validation
Key takeaway for SEOs: your brand must exist as a validated entity in Google’s expanding Knowledge Graph.
The 2024 leaks revealed how Google vectors entire sites, calculating thematic metrics like siteFocusScore
and NSR
that penalize scattered content.
Chrome browsing data also plays a role:
- Updating trust signals
- Tracking emerging trends
- Identifying visited entities
In this world, content volume matters less than whether your site is a well-connected entity in a thematic graph.
More in “Entities Everywhere: The Knowledge Graph, the Invisible Architecture of the Google Empire” by 1492.vision.
Google AI Mode: 90 Projects and a Constellation of Agents
Recent discoveries revealed an internal Google debug menu (visible only via corporate network or VPN).
The May 28, 2025 version lists nearly 90 projects — up by 40 from earlier iterations.
A Constellation of Specialized Agents
Google is not building one universal assistant but many highly focused agents:
- MedExplainer — health
- Travel Agent, Flight Deals — travel
- Neural Chef, Food Analyzer, Smart Recipe — cooking
- News Digest, Daily Brief — news
- Shopping AI Studio — commerce
Project Magi: The Backbone of AI Mode
More than 50 experiments belong to Project Magi:
MagiModelLayerDomain
: core infrastructureMagitV2p5Launch
: aligned with Gemini 2.5SuperglueMagiAlignment
: tracks user interaction patterns
The most advanced: MagitCotRev15Launch
— 15th iteration.
It uses Chain-of-Thought reasoning:
Reflect → Research → Read → Synthesize → Polish
AIM (AI Mode) and the New UI
The AIM project focuses on user interfaces:
AimLhsOverlay
: AI-powered sidebarSbnAimEntrypoints
: converts the “I’m Feeling Lucky” button into an AI entry- Google’s logo itself becomes interactive
Also emerging: StatefulJourney
and ContextBridge
— signaling a shift from isolated queries to full conversational sessions.
SEO Takeaways from AI Mode
- Hyper-specialization: content must match the expertise of specific agents
- Multimodality: text, images, video, structured data all feed into AI agents
- Deep personalization: user context across sessions trumps single-query optimization
See “AI at the Heart of Google’s Strategy” by RESONEO for more.
Profiling Engine: Smile, You’re Being Embedded!
Our investigation revealed Google’s profiling system: every digital interaction becomes a vector — a mathematical embedding of your identity.
At its core: Nephesh — Google’s universal user-embedding system.
It generates behavior-based vectors across all Google products.
As per the 2024 leaks:
- Scores whether a user is “typical” or “atypical”
- Predicts engagement by matching user and content vectors
Picasso and VanGogh — Dual Embedding for Discover
- Picasso: long-term memory (STAT for recent interest, LTAT for long-term passions)
- VanGogh: runs on-device, capturing real-time signals (scrolling, queries, device state)
Together, they balance immediate needs and persistent interests.
A Constellation of Embeddings
- Vertical: podcasts, video, travel, shopping
- Temporal: real-time, short-term, long-term
- Contextual: adapting to environment and situation
Google’s HULK system pushes behavioral analysis further:
It detects if you’re:
- IN_VEHICLE
- ON_BICYCLE
- ON_STAIRS
- IN_ELEVATOR
- SLEEPING
It tracks places like SEMANTIC_HOME
and SEMANTIC_WORK
to personalize results based on predicted destinations.
More in “Smile, You’re Being Embedded!” by 1492.vision.
Query Understanding: Expansion and Real-Time Scoring
We’ve uncovered how Google expands and evaluates queries in real time.
Example: cycling tour france
→ Combines cycling tour
→ cyclingtour
, expands to bike
, bicycle
, trips
Special markers appear:
iv;p
: in-verbatim matchesiv;d
: linguistic derivations
Geographic Intelligence
Query: nail salon fort lauderdale 17th street
- Category:
geo:ypcat:manicuring
- Zone code:
geo;88d850000000000
- Address variation expansion
- On-the-fly translation based on local intent
Confirmed: 2024-leaked architecture is still active:
GWS → Superroot → QUS → QBST
Real-Time Term Scoring
Each term receives 0–10 points per URL:
- Stopwords ignored
- Title terms get bonuses
- Named entities often score highest
Scores are context-dependent — the same term may score differently depending on the query.
This aligns with the Salient Terms Process using virtualTf
, idf
, salience
, and more.
While not used for final ranking (NavBoost and freshness dominate), these scores shape real-time interpretation.
More in “Uncovering Google’s Query Expansion System” by RESONEO.
All information is derived from publicly available sources without bypassing access controls. It is published solely for informational purposes.
This content was prepared based on content from Search Engine Land.