';

The Massive 2024 Google Algorithm Leak

The Massive 2024 Google Algorithm Leak

Towards the end of May 2024, the SEO world was hit by a leak of extraordinary magnitude — an anonymous source had somehow got their hands on and published more than 2,500 pages of Google’s internal search algorithm documentation with a repository of 14,000+ SEO ranking factors. 

That led to it being rapidly labeled as “The biggest Google Search ranking factor and algorithm leak in history”.

Rand Fishkin, CEO of SparkToro and previously the cofounder of Moz, and Mike King, CEO of iPullRank shared the original leak. They spent weeks checking the leaked data, and released top level overviews that shared some rather starling indications as to how Google’s search algorithm really works behind the scenes.

This leaked information could shift how digital marketers and publishers approach SEO. In this post, we’ll dive into the key insights and takeaways from the leak and explore how it may impact SEO strategies moving forward.

Leak Confirms Importance of User Signals

leak

One of the most significant takeaways from the leak was the heavy reliance of Google’s algorithm on user signals, including click data, and dwell time to rank content.

There also is an algorithm feature known as NavBoost which is heavily based on data from Google Chrome: 

  • Most Clicked Search Listings
  • Highest View Time Pages
  • Last visit and “dwell-time” of users on a website

NavBoost is said to be a “powerful ranking signal” for the search engine, based on evidence found in a former Department of Justice suit against Google.

The Role of Google Quality Raters

That leak revealed a deep level of integration between Google search quality raters who read and rate search results and the search algorithm itself via API. These are contractors hired by Google to rate the quality of search results for exact queries.

While the precise impact of this quality rater data is unclear, the fact that it’s baked into the algorithm confirms how influential the subjective judgment of these raters can be in determining rankings.

Potential “Whitelists” for Sensitive Topics

For some categories like travel, COVID-19, and political topics, Google seems to keep “whitelists” of approved sites, per the leak. This means only certain, pre-vetted sources, might be eligible to rank when searches are made on these touchy topics.

If those one-click answers are then positioned above the organic results, like we see for the majority of these sorts of queries, including the ones that can easily be psychically interrupted, it’s a potential explanation for why we see just a small subset of authority sources are so consistently winning the SEO portion of the game for this type of query.

Signs of a “Toxic Backlink” Penalty

backlinks

One of the more eye-opening findings points to the existence of a “toxic backlink” classifier that can demote sites based on the quality of their inbound link profile. This validates long-held suspicions among SEOs that Google does indeed have a way to devalue or discount links from low-quality, spammy sources.

Limits on Result Composition

The data suggests Google’s algorithm may actively limit the number of results from certain categories like blogs, small personal sites, commercial travel sites, and local businesses that can be displayed for any particular query.

This could make it increasingly difficult for non-authority sites in saturated verticals to crack the top results, even for niche topics they cover exceptionally well.

Heavy Focus on Entities and Mentions

dont mention it

Multiple references are made to scoring systems that analyze domain and page-level “entity” association, as well as tracking overall mentions of entities across the web.

This implies that Google is placing a lot of emphasis on understanding the core conceptual topics a piece of content covers (via entities), who created it (via author entities), and how frequently and authoritatively a site and its authors are mentioned across other high-quality web sources.

Site-wide Scoring for Some Factors

Interestingly, the data reveals that some scoring relates to entire websites, not just individual pages. Factors like page titles, keyword relevance, and vectorization (how concepts are mapped) all seem to have components that look at the entire site.

This underscores the importance for publishers to have a cohesive, conceptually tight focus across all content on their sites – rather than operating as a disjointed collection of separate pages.

Outlinking Alone Not a Positive Factor

Another nugget that surprised some SEOs – the leak implies that outlinking alone is not seen as a positive ranking signal by Google’s algorithm. 

Outlinking may only be scored as a potential indicator of spam or low quality if overdone.

Recency and Updating Content

recency

According to the data, Google stores a version history of at least the last 20 revisions of web pages. Combined with other references indicating content freshness as an input, this reinforces the need for publishers to consistently update and refresh their content over time.

Technical Factors Like Font Size and Video Count

On a more granular level, Google’s algorithm seems to track and score specific technical factors like:

  • Font size and text formatting within pages
  • The presence and proportion of video content
  • Exact-match domain naming conventions
  • Quality of site navigation and internal linking

While the precise impact of each of these technical elements is still unknown, it’s clear that Google leaves no stone unturned in its quest to comprehensively score and categorize websites.

EEAT and Original Content Generation

original

The phrase “E-E-A-T” itself — which stands for expertise, experience, authoritativeness, and trust — is technically a third-party term and is not mentioned explicitly in the documentation on the algorithms, contrary to some assumptions.

That said, the leak gives us some context: documents refer to scoring and classification systems that try to determine if the content is synthetic vs. human.

The fact is, while EEAT the brand may be more fiction than fact, the idea of Google continuing to reward great, unique content from real, live human authors and publishers is still true.

No Quick “Hacks” or “Get Rich Quick” Tactics

hacks

But the leak — while revealing — is no “cheat code” that will immediately enable any website to rank. Like Rand Fishkin and other SEOs have preached time and time again:

Google stopped rewarding scrappy, clever, SEO-savvy operators who have all the right tricks. They’re rewarding brand strength, search-measurable forms of popularity, and searchers seeing a domain they already know and trust.

These leaks remind us that there are no real shortcuts in SEO these days — lasting success is about creating a broad, deep web of high quality that delivers engagement with users in many ways over time.

So What Should SEOs Do?

Given the sheer scope and complexity of the leaked information, it’s clear that there are no simple answers or cut-and-dried takeaways. Still, here are some of the main points of emphasis and changes in strategy that digital marketers might want to think about as a result of the leaks:

User Experience Above All The primacy of metrics like dwell time and click data reinforces that optimizing for an exceptional user experience has to be the top priority. Creating easily accessible, highly engaging content that answers queries comprehensively and keeps users on-site should be goal #1.

Cultivate an Authoritative Online Presence and Brand
With mentions, entities, and brand association being tracked, publishers need to invest in building a comprehensive, authoritative digital footprint with robust profiles, citations, quality backlinks, and content that firmly cements their brand as a pre-eminent voice in their core subject areas.

Audit and Improve Technical Website Health While user experience and authority are paramount, the granular level of technical factors scored by Google means that basic website housekeeping like optimizing navigation, internal linking, tagging, use of video/images, formatting consistency, and other elements can no longer be overlooked.

Consider a Unified, Topically Cohesive Content Strategy
The apparent importance of site-wide conceptual relevance may necessitate a pivot toward more unified, taxonomically organized content clusters – rather than a disjointed collection of individual, topically siloed pieces.

Re-evaluate Link Building Approaches With links from low-quality or low-traffic sources potentially being discounted, a more judicious, tightly targeted approach to link building may be advisable. Prioritize mentions and links from high-authority, thematically relevant sites over indiscriminately chasing volume alone.

Get Authoritative Human Authors Involved Since human authorship carries importance, look to get credentialed, authoritative writers and subject matter experts as integrated into your content creation process as possible. Having recognized entities attached to your content may pay dividends.

Don’t Neglect Video Content While not as mission-critical as textual content, the apparent scoring and categorization of pages/sites based on video content usage suggest that publishers should ensure multimedia integration isn’t an afterthought. A steady pipeline of premium video may be a differentiator.

Implement Structured, Ongoing Content Maintenance With content freshness being a clear input, having defined processes for routinely auditing, updating, expanding, refreshing, and re-optimizing your site’s content assets should be non-negotiable. Letting pages grow stale is a potential red flag.

Conclusion

The 2024 Google Search algorithm leak shifts the landscape for how digital marketers and publishers should be thinking about SEO and organic search strategy, pending any further unforeseen changes.

Although still lacking in the full context of how inputs are individually given precedence and acted upon, the leak has exposed several inherent features and machine-learned models that Google employs to power the search engine rankings.

The data clearly indicates the profound complexity and multitude of variables covered by Google’s algorithm, covering all for any website that may directly (or indirectly) influence the website and how it would be relevant for any given search query.

From user signals like click data and dwell time, to assessments of webpage media and technical elements, entity and brand analysis, human authorship detection, content freshness, and much more – each component represents an individual dimension that Google’s machine learning systems are constantly analyzing and adjusting to produce the best possible ranked results.

The key takeaways are clear:

  1. User experience is paramount – Google rewards engaging content that comprehensively meets users’ needs.
  2. Brand authority and establishing trust are critical – Strong brands with demonstrable expertise earn visibility.
  3. Technical optimization matters – Basic website housekeeping and structured data can differentiate results.
  4. Original, human-authored content is prized – AI-generated and low-quality derivative content will continue losing ground.
  5. Content maintenance is non-negotiable – Stale, neglected pages will inevitably decay in the rankings over time.

The reality is that SEO success comes from a commitment to a long-term strategy in which you methodically craft and expand on the strongest possible web presence that does everything possible within Google’s view of quality, expertise, authority, trust, relevance, and user experience brilliance.

As much as the leaked data does give almost unprecedented visibility into the machinations of the algorithm, it is of course not a golden key to a set of 18 growth hacks. Proper optimization will always involve getting down and dirty — in both creating a great product and promoting it using earned attention and authoritative, audience-attracting channels.

The existing bar was already hard enough to reach, and as Google makes more machine learning breakthroughs and trains its algorithm to take into account more and more factors, that bar will just keep getting higher. Survival of the fittest means only those who can constantly change, learn, adapt, and optimize will manage to stay ahead of the curve.

mukuba2002

Leave a reply