Crawl Budget Boosting
Will Google index and rank a new page you upload on your website? Not always! Statistics show that Google misses around half of the pages on huge websites. Google must crawl a page before it may appear in search results and bring traffic to your site. “Crawling is the entrance point for sites into Google’s search results,” Google says.
Because Google does not have unlimited time or resources to crawl every page on the internet all of the time, not every page will get crawled. Crawl budget is what SEOs refer to it as, and optimising it can be crucial to the success of your company website.
What is the crawl budget?
The crawl budget is the maximum number of pages on a website that a search engine can and wants to crawl. The crawl budget is calculated by calculating the crawl rate limit and crawl demand.
Crawl rate limit: Your crawl rate limit is affected by the speed of your pages, crawl errors, and the crawl limit set in Google Search Console (website owners have the option of decreasing Googlebot’s crawl of their site).
Crawl demand: it is influenced by the popularity of your pages as well as how fresh or stale they are.
Should I worry about the crawl budget?
If you’re optimizing for smaller websites, you may not have to worry about the crawl budget. Google says that the Crawl budget is not something most publishers have to worry about. Most of the time, a site with fewer than a few thousand URLs will be crawled efficiently.
If you operate on a large website, especially one that generates pages based on URL parameters, you might want to prioritize actions that help Google figure out what to crawl and when.
How do I determine my crawl budget?
Instead of accepting Google’s word for it, whether you operate on a site with 1,000 or one million URLs, you should check for yourself to determine if you have a crawl budget issue.
Comparing the total number of pages in your site design with the number of pages crawled by Googlebot is the easiest technique to verify your crawl budget and see whether Google is missing any of your pages. This requires the use of both a site crawler and a log file analyzer.
Use log analysis with URL segmentation
You can check how many URLs Google crawls on your site each month by looking at your log files. This is your Google crawl budget. To figure out how your crawl budget is being spent, combine your log files with a comprehensive site crawl. Separate the data by pagetype to see which parts of your site are being scanned by search engines and how often.
How are your site’s most important pages crawled?
Use the Crawls Venn Diagram
The Crawls Venn Diagram is one of the finest ways to see, at a high level, the ratio of pages Googlebot is crawling vs. not crawling.
Pages in your site architecture, pages outside your site architecture (crawled by Google only, AKA “orphan pages”), and pages crawled by Google are represented by the two circles in the Venn diagram.
When it comes to your crawl budget, pages crawled by Google just represent potential room for improvement. You may be wasting some of your crawl budget if those pages aren’t linked anywhere on your site but Google still finds and crawls them.
Each site’s crawl ratio varies significantly. Only 40% of unoptimized sites’ strategic URLs are indexed by Google each month on average. That’s 60% of a site’s pages that aren’t crawled regularly and hence aren’t indexed or served to searchers. This makes a compelling case for tracking and adjusting your crawl budget.
How can I optimize my crawl budget?
Increasing your crawl budget (i.e. getting Google to spend more time on your site) or getting Google to spend the time they’ve already given to your site more wisely can both be part of optimising your crawl budget. This can include things like:
- Prevent your non-canonical URLs from being crawled by Google
Canonical tags inform Google about the preferred, primary version of a page.
For example, suppose you have a product category page for “men’s jeans” at /clothing/men/jeans, and visitors can filter by price from low to high (i.e. faceted navigation).
This could result in the URL being changed to /clothing/men/jeans?sortBy=PriceLow. Because changing the order of the jeans on the page did not affect the content, you wouldn’t want both /clothing/men/jeans?sortBy=PriceLow and /clothing/men/jeans to be indexed.
On /clothing/men/jeans?sortBy=PriceLow, you’d probably include a canonical tag to indicate that clothing/men/jeans is the primary version of that page and the other version is a duplicate. The same is true for session identifiers attached to URL parameters.
With several tools that include non-indexable signs, you can simply discover when Google is spending time scanning non-canonical pages. Consider this e-commerce site, which has a high number of crawlable non-canonical URLs. Non-canonical URLs accounted for 97 per cent of the one million pages Google crawled.
This is bad because the site could have had a near-100 per cent crawl ratio, increasing the chances of more pages generating visitors. Another possible consequence of not crawling this large number of non-canonical URLs is that more pages will be crawled more frequently, and we know that more frequently visited pages generate more visits. Even though Google identified this issue as a waste of crawl budget years ago, it continues to be a serious SEO issue.
What is the solution? Tell search engines what not to crawl with your robots.txt file.
Using server resources on these types of pages can divert crawl activity away from pages with actual value, perhaps preventing or delaying Google’s discovery of your excellent content. You may tell search engine bots what to crawl and what to ignore by utilising the robots.txt file on your site.
- Minimizing crawl errors & non-200 status codes
If Googlebot encounters a lot of issues while crawling your site, such as 500 server errors, your crawl rate limit and, as a result, your crawl budget may be reduced. If you’re getting a lot of 5xx errors, you might consider upgrading your server’s capabilities.
Non-200 status codes, on the other hand, can be a waste of time. Why waste Google’s time scanning pages you’ve deleted and/or redirected when you could focus their efforts solely on your current, active URLs?
What is the solution? Make sure your internal linking is in order and that your XML sitemap is up to date.
It’s also a good idea to avoid referring to pages with non-200 status codes, in addition to preventing search engine bots from scanning problematic URLs.
Make sure you’re connecting to the live, recommended version of your URLs throughout your content to avoid squandering your crawl money. If a URL isn’t the end destination for your content, you should avoid connecting to it as a general rule. It’s vital to include only live, recommended URLs and make sure you’re not missing any important pages that you want search engines to scan and index.
Improving crawl can improve your revenue
Applying these optimizations to a site with millions of pages can open up a world of possibilities for your crawl budget, as well as your site’s traffic and revenue.
This is due to the SEO funnel theory, which demonstrates that improvements in the crawl phase have downstream effects in the ranking, traffic, and revenue stages, which will make your stakeholders very happy.
The crawl budget is more than just a technical consideration. It’s all about the money. Bring the bots – and only the visitors to the good stuff!