When was the last time you looked at your XML sitemap? What?!? You don’t REMEMBER?!? Omg, GO LOOK AT IT NOW!!
Your XML sitemap is arguably one of the most important things to optimize for SEO. It tells search engines which URLs on your site matter, how often they change, and which ones you want crawled and indexed.
Unfortunately, it is also one of the most common places where websites quietly sabotage their own SEO by pushing too much content, or the wrong content, to Google.
We recently did an audit of a large e-commerce site and found tens of thousands of URLs with caching parameters in the XML sitemap! Since cache expires, they were nearly all broken links. That poor site was having its precious crawl budget wasted on tens of thousands of URLs that didn’t even load correctly, instead of having googlebot find their new product and category URLs. It took them about three weeks to clean it up, but the impact on their crawl budget was massive and nearly immediate! Googlebot went from hitting several THOUSAND 404 errors per day on their site, to a negligible number, nearly overnight!
Don’t Feed Google EVERYTHING in Your XML Sitemaps
I talked about this in my interview with Barry Schwartz of Search Engine Roundtable:
One of the core points I made in the interview was simple: do not feed Google everything. Your XML sitemap should not be a dumping ground for every URL your CMS can generate, yet all too often, that’s what happens.
It SHOULD BE a curated list of the pages you actually want Google to crawl, index, and rank. I talked with Barry about how sitemaps are not a set-it-and-forget-it task. They need to be managed over time as your content, features, and post types change, and they need to be audited on a regular basis.
I also pointed out that CMS platforms like WordPress will happily automate sitemap generation for you, but that automation often results in a large number of URLs being submitted to Google that you probably do not want submitted at all. The lesson was consistent throughout the conversation: make sure the content you push is valuable, make sure the content is actually there, and prune everything else.
Since WordPress is the most popular CMS we encounter, let’s talk about optimizing XML sitemaps in it. For a long time, a plugin was required in order to add an XML sitemap to WordPress, but as of version 5.5, it FINALLY has a built-in XML sitemap. However, I still recommend using Yoast instead, and here’s why.
What’s the Difference Between the Core XML Sitemap in WordPress and the Yoast XML Sitemap?
The WordPress core XML sitemap is very basic, especially when compared to Yoast XML sitemaps. Yoast XML sitemaps give you the option to choose what goes into your sitemap, including the ability to exclude any pages or posts you noindex. Yoast sitemaps also include added properties to help search engines and crawlers easily identify new content, such as the last modified date, and they include images in the sitemaps. Yoast SEO will also break up larger sitemaps into several smaller ones to prevent your website from slowing down.
WordPress XML sitemaps, on the other hand, perform only the basic functions of a sitemap: they help search engines and crawlers discover your content, including updated content. The current version of the XML sitemap in WordPress is very basic and supports only a small set of content types. There is, however, no way for you to control from WordPress itself what goes into the core sitemap, or what should not appear in it. That is a significant limitation, and it is exactly the kind of thing that leads to the bloated, noisy sitemaps I warned about in my interview with Barry.
How to Generate an XML Sitemap in WordPress
WordPress 5.5 and later will automatically generate WordPress sitemaps for all public and publicly queryable post types and taxonomies, as well as for author archives and the homepage of the site. The robots.txt file exposed by WordPress will reference the sitemap index so that it can be easily discovered by search engines.
To be able to generate a sitemap in WordPress on the frontend of your website, you will need to install the SimpleXML PHP extension. If the extension is not available, an error message will display in place of the sitemap, and the HTTP status will be changed to code 501 (“Not implemented”).
If you already have a sitemap in place for your website, you might want to turn off the generated WordPress sitemap since you don’t want to confuse search engines and crawlers about which sitemap they should crawl for your website or potentially cause an indexing issue. (But if you’re using Yoast, the Yoast plugin will automatically deactivate the native WordPress XML sitemap for you.)
Why Cleaning Up Your Sub-Sitemaps Is So Important
Generating a sitemap is only the first step. The part that most site owners skip, and the part that matters most, is auditing the sub-sitemaps and cleaning them up. A modern XML sitemap is not a single file. It is an index that points to several sub-sitemaps, each one covering a specific type of content.
In a typical WordPress site, you will see separate sub-sitemaps for pages, posts, products, categories, tags, authors, attachments, custom post types, and more. Any custom post types you may have will show up there as well. Every one of those sub-sitemaps is a signal to Google about what you consider worth crawling.
The easiest way to start optimizing is to open your sitemap index and actually read through each sub-sitemap. Look at the URLs inside and ask a direct question for each one: does this page deserve to be indexed by Google? If the answer is no, it should not be in the sitemap.
Here are the sub-sitemaps that most often need to be cleaned up on a WordPress site:
- Attachment sub-sitemaps. WordPress creates a dedicated URL for every image you upload. These attachment pages are almost always thin, duplicative, and provide no value to a searcher. They should rarely, if ever, be in your sitemap.
- Tag and category sub-sitemaps. Many WordPress sites have hundreds of tags that were created once, applied to a single post, and then forgotten. These thin archive pages can water down your topical authority. Prune aggressively and only include the tags and categories you have deliberately optimized as landing pages.
- Author sub-sitemaps. If your site has guest contributors, former employees, or an author archive for someone who wrote one post five years ago, those author pages are prime candidates for removal.
- Custom post type sub-sitemaps. Plugins often register their own post types, such as testimonials, portfolio items, popups, or form entries. These get swept into your sitemap automatically. Review every custom post type and decide whether each one belongs.
- Product variation and filter URLs. On WooCommerce sites, faceted navigation and product variations can explode the URL count. Submit the canonical product URLs only.
- Thank-you pages, confirmation pages, and utility pages. If a page exists only to be shown after a form submission or as a system page, it should not be in your sitemap.
Cleaning up sub-sitemaps has real, measurable SEO impact. Google allocates a finite crawl budget to your site. Every thin, duplicative, or low-value URL you feed it through your sitemap is a URL Google is crawling instead of your actual money pages. A sitemap full of low-quality URLs sends Google a signal that this is the kind of content you want to be known for. The cleaner and more intentional your sitemaps are, the more clearly you are telling Google what your site is about and which pages deserve to rank.
We often see a conflict between a client’s desire to showcase every piece of content they’ve ever created and the practical reality of crawl budget. They might insist on including every individual testimonial post in the sitemap, believing it’s all valuable. However, you can de-index a sub-sitemap, like individual testimonial posts, yet still leave the archive page, which lists all the testimonials in one place, indexable.
I used to firmly believe that more indexable pages always meant better SEO. But after seeing Google continually update its algorithm to be more and more strict about “thin content” and watching Google penalize sites for it, my entire approach to sitemaps and content indexing had to change. It has now become my new rule of thumb that short individual posts should be shown on an indexable archive page, but not be indexable individually. This means removing that sub-sitemap from the XML sitemap index.
Don’t Forget About Search Console!
After you have cleaned up each sub-sitemap, resubmit your sitemap index in Google Search Console and monitor the Coverage report. Watch for “Submitted URL marked as noindex,” “Submitted URL not found (404),” and “Crawled, currently not indexed” errors. These are Google’s way of flagging the exact kind of sitemap bloat we are talking about, and they should drop as your cleanup work takes effect.
This is also not a one-time project. Every time you launch a new section of the site, add a new plugin that registers a post type, or run a content pruning pass, your sitemaps need to be reviewed again. The sites that rank well treat sitemap hygiene as an ongoing discipline, not a launch checklist item.
Looking for Help with XML Sitemap Optimization?
We know how important an SEO-friendly website is to the overall success of your business, and your XML sitemap sits at the center of that foundation. At Pam Ann Marketing, we believe business owners have the right to be aware of exactly what their SEO company is doing to their site. That is why we are always 100% transparent with everything we do for your business, and we will even teach you how to do what we do.
Our SEO services are “white hat,” meaning completely in line with Google’s Webmaster Guidelines, and our SEO Technical Audits are consistently referred to by web developers as “the most comprehensive and detailed” audits they have ever seen. A full sitemap and sub-sitemap cleanup is a standard part of that work. Contact us today to discuss your specific SEO needs.
- Why All AI-SEO Studies are Flawed (and What to Trust Instead) - March 12, 2026
- How Much AI-Generated Content is Acceptable for SEO Writing? - February 25, 2026
- How to Spot a “Black Hat” SEO/GEO Scam in 2026 - January 8, 2026



