Doing “subtraction” requires deleting 30%-50% of low-quality or zero-traffic pages (such as duplicates, AI-stuffed content, outdated material), merging similar topics, and increasing original content ratio to over 70%; reducing keyword stuffing (density ≤2%), trimming ads and redirects; cleaning up content without authors/sources, adding author credentials and data citations, improving E-E-A-T and user dwell time.

Table of Contens
ToggleContent Cleanup
Four Types of Cleanup Targets
Go to Google Search Console backend and pull the timeframe to the past 16 months. Download all page clicks and impressions into a spreadsheet. Find URLs that have been live for over half a year, clicked fewer than 9 times, and don’t even reach 250 impressions. Separate out article pages with the /blog/ suffix for individual review.
Run all 3,000 URLs through a regular site crawler tool. Check how many words each page contains. Identify pages with 1,500+ words that haven’t received a single visitor from Google in the past three months. Some keywords get 500 searches per month, yet your article sits at position 12. Out of 1,000 people passing by, only 3 click through.
Switch to your analytics backend to see how long visitors stay. An article sprawling with 2,000+ words, yet readers spend an average of just 12 seconds. A normal person reads about 300-400 words per minute. 12 seconds is barely enough to scan the headline and first two lines, not solving the problem the visitor came to look up.
Visitors glanced and didn’t scroll further, then closed the page and left. The analytics backend records this as a poor visit. Pull out URLs with engagement rates below 15% and save them somewhere.
Combine data from both sources in a spreadsheet to find patterns:
- Text-only articles with under 50 impressions after 180+ days online
- 1,000-word long-form articles with average dwell time under 10 seconds
- Outdated news compilations with year numbers from several years ago in titles
- 3 similar URLs competing for the same keyword
- Archive pages flagged by Google as discovered but not indexed
Request the past 30 days of access logs from your hosting provider. Scan through with a text viewer. Check how often Google’s bots visit your site. Some pages haven’t been crawled for a full 90 days.
The search bot has shelved those URLs. The daily crawl budget is already limited, all wasted on empty category pages. Check the backend count for /tag/ directory tags—800 tags open to reveal just one lonely article each.
Right-click to view the source code beneath the page. The developer crammed in a 400KB effects file. The actual readable text on the page is just a bare 150 words. Word count doesn’t even reach 10% of the entire page code.
Compile the cleanup list based on code and logs:
- Showcase pages where under 10% is text, all flash and effects
- Deep URLs not revisited by bots for over 90 days
- Articles with 5+ broken outgoing links to old content
- Misaligned pages where headlines and content don’t match
Use a rank checking tool. Enter your domain and check data for Google positions 11 to 50. One article莫名其妙 matches 40 irrelevant long-tail keywords. Leftover mess from years of careless article copying and pasting.
Search in the search box with site:yourdomain plus your product name. 15 pages containing that term appear. Click through the top 5, and a shocking 70% of the content is identical.
Check the product catalog on your sales site. A shoes category displays just two pairs of shoes, then crudely generates 800 words of machine-written descriptions below. Shoppers only want to see two clear images, the $50 price, and shoe sizes. Wall-to-wall useless text dominates the most visible screen real estate.
Flip through blog comment records. Articles posted two-plus years have zero real human comments—only spam gibberish with gambling URLs. Find a session recording tool and capture user behavior.
Recording shows that out of 500 page visitors, only 10 scrolled to the bottom. 400 people’s clicks landed on the “Back to Homepage” button at the top. Everyone’s rushing to escape the current page. Font size set below 12px makes reading on mobile extremely straining.
Review customer emails from the past three months. Someone complained that steps in a tutorial simply don’t work. Check the page publish date—it stopped in April 2019. The software’s interface has already been redesigned 4 times since.
Back to habits of how people actually browse to identify bad articles:
- Traffic pages with 300px-high ads filling the entire screen upon load
- Text blocks with tiny fonts and lines squeezed together on mobile
- Old pages with video players that no longer function
- Tutorial articles untouched for two years with all images broken
Categorized Handling
Select all 500 low-quality URLs and add them to an online shared spreadsheet. Write handling notes and tags for each row. Whatever you do, don’t hit the delete key and clear everything out in a fit of frustration. We have four operational options: 301 redirect, 404 error, refresh content, or code-based blocking.
You happen to have 5 old articles all about “coffee bean moisture prevention.” Article A pulls 120 visitor clicks monthly,稳稳当当 ranking at Google position 6. Articles B, C, D, and E show miserable data—just 4 total clicks over the entire past year combined.
Copy the detailed 300-word section about moisture prevention bag usage from Article B,粘贴 verbatim into Article A’s third paragraph. Right-click-save the 2 photos of sealed containers from Article C, upload them into Article A’s image library.
Open your site backend and find the redirect plugin tool. Point all 4 old URLs from B, C, D, and E to Article A’s link. The 15 old external backlinks pointing to old URLs from forums over the years will, through redirects, flow all traffic into Article A.
Hard criteria for merging old articles:
- Search terms in titles overlap by more than 80%
- Daily organic clicks under 5
- Carrying 2 external backlinks from other websites
- Published over 20 months ago
Review articles stuck at Google positions 11-20. The overall framework is fine, but the information inside feels dated. Open the text editor and change the year in the title from 2022 to this year. Delete the opening 50-word pleasantries cleanly.
Remove the 3 old screenshots that are only 800×600 pixels, too blurry to see. Use your phone to photograph 4 new 1080P high-definition bright photos of the actual items and upload them to the backend. Last week’s inbox received 20 customer emails—pick the most-asked question and hand-type 150 words as an FAQ at the article end.
Use a broken link checker to scan and find 12 dead links in an article. Replace those abandoned redirect links that used to throw 404 errors with Wikipedia entries updated just yesterday. Click the publish button in the upper right corner and update the page timestamp to today.
| Changes Made | Backend Data Changes | Results Achieved |
|---|---|---|
| Replaced 4 HD images | Dwell time increased by 20 seconds | Reduced page bounce rate |
| Fixed 3 dead links | Crawler visits increased by 5 | Improved overall site health |
| Added 150-word Q&A | Added 2 long-tail keywords | Increased search visibility |
| Refreshed date timestamp | CTR increased by 2.5% | Encouraged visitor clicks |
Keep an eye on 200 garbage short articles that have had zero visitors all year. All 150-word snippets assembled by paid software back in 2018. 16 consecutive months of zero clicks in backend statistics. Select all 200 articles in the backend list and put them in the trash with a firm click.
The hosting server will dutifully comply. When the search bot crawls the 200 old URLs, the server returns a 404 not found or 410 Gone error code. Google’s bot will see the error next week and proceed to clean the pages from the massive search index.
Your site has only 1,000 daily crawl budget allocations for bots. Clear out 200 nutrient-less waste pages and the bot will crawl your newly written quality articles more frequently. New page indexing speed drops from 5 days to just 12 hours.
Watch those 15 return/exchange policy pages and shopping cart checkout URLs. Clothing shoppers have to spend a full 3 minutes carefully reading 3,000 words of refund terms and conditions. People searching for information absolutely don’t want to see an empty shopping cart in search results.
Open the code editor and precisely locate the header.php file. Insert <meta name="robots" content="noindex"> blocking code on line 4. The page stays on your site—buyers can still access it through the menu. When search bots understand the code, they’ll quietly remove the page from search listings within 7 days.
Pages that must have blocking code added:
- Instruction manuals over 5MB for download
- Employee login backend with /wp-admin/ suffix
- Search results lists generated by visitors using the in-site search box
- Category directories containing only 1 lonely article
After modifying 500 URLs, hand the spreadsheet to the server admin to execute the checklist. Wait 14 days peacefully, then check the analytics backend. Of the 100 quality articles retained, 30 quietly moved up 5 positions. Daily organic visitors through search increased steadily by 40.
Deduplication and Consolidation
Why Do This
Last September’s search algorithm update reshuffled the fate of millions of web pages. A 5-year-old gardening blog that once enjoyed 8,000 daily organic visitors plummeted to under 1,500 within weeks. Backend statistics revealed that 20 similar articles targeting “tomato watering guide” were massively deindexed across the domain.
Today’s crawler bots have an extreme aversion to publishing the same information under 2-3 different titles repeatedly. When low-quality duplicate pages exceed the 30% warning threshold on a single server, the entire site’s content visibility suffers collective punishment. Originally, 30 articles worth 20 hours each of video shooting and editing were produced—yet they lost eligibility to appear on search results’ front three pages.
Consolidating scattered hundred-word snippets is an operation that can rapidly change reading metrics. 600 words of fragmented content distributed across 5 different URLs was combined into a single 3,000-word long-form page with 7 HD images. The page heatmap plugin recorded visitor scroll and dwell time extending from the previous 35 seconds to 2 minutes 10 seconds.
Identifying redundant URLs for deletion and merging requires checking just a few basic data metrics:
- Old posts over 24 months with zero visitor comments
- Pages with 5,000 impressions but fewer than 10 clicks from search
- Articles with 60%+ content overlap using synonyms
- Edge-case杂文 with zero external references from peer sites
Merging pages is like connecting three thin pipes into one thick conduit. Old page A carries 3 external website recommendation links, old page B carries 5, and old page C carries 2. After configuring 301 redirect code for the old URLs, the new consolidated article instantly gains 10 independent domain trust votes.
Merging and restructuring eliminates internal competition where your own pages fight each other for traffic. Typing “affordable cat food recommendations” into search, the second results page shows 4 different links from the same pet site. The daily 300 organic clicks are forcibly divided across 4 pages—no single page can accumulate enough click weight to break into the top 3 front page positions.
Keeping thousands of identical old files wastes the bot’s daily crawl budget. Googlebot allocates approximately 200 page scan slots per day to typical personal blogs. A staggering 150 slots are occupied by unreadable waste from years ago, while the blogger’s two freshly updated long-form articles featuring 3 real-videos wait 10 full days without being indexed.
Processing similar documents requires strict paragraph retention screening:
- Delete outdated product pricing numbers that expired 3 years ago
- Keep the 4 real photos with the blogger personally appearing in reviews
- Consolidate scattered parameters from five articles into one 20-row data table
- Preserve genuine interaction comments with hundreds of words from readers at the old page bottom
Finding “Cloned Pages”
Sitting at a computer manually checking 5,000-6,000 articles one by one is excruciating manual labor. Export all page links from your site backend into a single export, saved as an Excel spreadsheet with dozens of columns—the entire investigation process becomes much easier.
Open the webmaster’s dedicated data dashboard and pull historical records. Extend the calendar filter to the past 16 months, click the export button in the upper right corner, and your hard drive gains a CSV raw file packed with 60,000 visitor search queries.
Sort the “Impressions” column in descending order in your spreadsheet software. Focus on rows where impressions break 8,000 but actual clicks are under 20. A sub-0.25% CTR makes the phenomenon of multiple URLs fighting for traffic painfully obvious.
Searching “how to choose used DSLR lenses” alone, the same website has three pages stuck at positions 14, 17, and 19, collectively pulling just 3 miserable visitors per day.
Typing a specific character code into the site search box reveals your site’s true state. Manually type site:yourdomain.com mechanical keyboard red switch, press Enter, and view the index list on screen.
Eight pages of self-domain article URLs appear consecutively. The top 5 links show title text that is a staggering 85% identical. Identical water-coach content wastes server hard drive space.
Use a few-megabyte web crawler tool to perform a blanket scan of the site’s 800 short links. Set the program to extract only each page’s bold main headline, generating a pure TXT document recording titles and word counts.
- Circle outdated URLs still挂着 “2018’s most comprehensive,” “2019 latest” expired year numbers in titles
- Select dry news snippets with under 400 characters of Chinese text
- Extract old posts with messy formatting and only 1 blurry low-resolution image throughout
- Filter out the first 50 batch copy-paste products with identical opening paragraphs
Pull up the backend retention chart recording visitor browsing habits. Among 3 review articles about the same sunscreen, two show per-person average dwell time of just 18 seconds. Bounce rate floats perpetually at a high 92%.
Check if pages have recommendation votes from other websites. Throw the five similar URLs into the backlink checker. URL A carries 15 high-authority external hyperlinks, while the other four URLs show data of flat zeros.
Readers swiped up twice on their phones, saw nothing but empty pleasantries they’d seen two days ago, and毫不犹豫 pressed the back button at the 12-second mark.
The string of English letters in URLs often hides telltale signs. URLs /camera-lens-guide/ and /buy-lens-tips/ published just 3 days apart in October 2021—when you click through, the content similarity probability is a staggering 90%.
Sort 1,200 diary blog posts by word count from low to high. Pages breaking below the 500-word threshold are mostly edge fragments forcefully split from 3,000-word major articles.
Searching for “cookies in a new baker oven” found 18 related webpages. Clicking through and comparing reveals that 14 of them use the same rehashed口水话, endlessly droning on about preheating for 15 minutes and distinctions between bread and cake flour.
- Check whether individual short URLs’ daily average organic visits over the past 30 days broke through 15 people
- Check whether the page bottom has网友 building comments with over 80 characters of genuine experience sharing
- Confirm whether the text-only content includes a 3-minute exclusive first-person hands-on video
- Compare whether the product comparison table attached in the body was newly added within the past two months
How to Consolidate
Spread 5 articles teaching “how to make handmade soap” across your computer monitor. Pull a 90-day visitor report from your analytics software. The top-ranked URL takes 350 visitors monthly like clockwork, carrying 8 external website recommendation links—it becomes the main page to keep.
The remaining four pages can’t even凑齐 10 visitors monthly, about to be dismantled and reorganized. Whatever you do, don’t hit the delete key to clear files—the system backend will spawn 4 “404 file not found” error codes. Passing search bots seeing blank pages will harshly deduct your site’s maintenance foundation score.
Create a new plain text document, copy all Chinese text from those 4 old URLs. Using your mouse as a pen, carve out the truly useful干货 fragments from thousands of words of rambling nonsense:
- 3 silicone mold release photos attached to a 2021 old post
- The extremely clear saponification exothermic reaction principle explanation in the old article’s third paragraph
- Reader-shared rosemary essential oil formula from the webpage bottom comment section
- Cold process air-dry time of 72 hours test numbers recorded in a short article
With the four puzzle pieces in hand, return to the main page’s backend editing interface you selected earlier. Insert the 3 mold release photos above the “mold preparation” paragraph that originally had just two lines of text. Add alt text containing “handmade soap mold release technique” to each image, and use a tool to compress each image to under 100KB.
Fill hundreds of words of reaction principle text into the opening background introduction paragraph. The main article’s word count swelled from a bare 800 words like a sponge absorbing water to a content-rich 2,200 words. Use the backend editor’s accordion feature to turn reader-contributed essential oil formulas into a Q&A dropdown box occupying only half a screen height.
After assembling content, replace all stale mildew-covered data labels with fresh numbers. Change “2022 version” in the title to the current calendar year. Update the outdated “$45” coconut oil material price in the text to the $38 most recently queried on the shopping site.
Finalize the image and text layout, then handle those 4 emptied shell old links. Log into your server backend control panel, find the system rules document named .htaccess. At the very bottom, enter several lines of 301 redirect code, pointing all 4 old URL paths to the freshly completed 2,200-word major article.
Writing permanent redirect rules is like submitting an address change notice to the corner post office. Visitors clicking old bookmarks from 3 years ago in their browser will be smoothly transitioned to the beautifully formatted, image-rich new URL within 0.8 seconds, without a hint of lag.
External peer blogs’ trust vote hyperlinks previously pointing to those 4 old pages will transfer to the new page along this newly laid redirect path. Wait three days and check the external links report in webmaster tools—the main article’s recommendation link count steadily climbs from the original 8 to 14.
When piecing together article content, there are several absolute red lines you must never cross:
- Never forcefully merge pages serving different visitor search intents
- Don’t redirect a shopping cart sales page to a pure diary post
- Keep single consolidation old URL count under 5
- The receiving target URL must 100% match the old article’s topic
After modifying the server’s底层 code file, hundreds of old blog posts throughout your site certainly contain broken hyperlinks. Install a small plugin program that detects expired links, letting it perform a thorough blanket sweep across all historical posts.
After the small plugin ran at full capacity for about 20 minutes, it揪出 58 old diary posts across the site containing old links. Use your mouse to click open each system editing interface, deleting all underlined old URLs and replacing them with the new main page URL.
Open the built-in XML sitemap file in the backend root directory. The 4 eliminated URLs still brazenly躺 in there, occupying the bot’s daily crawl quota. Manually use the backspace key to cleanly delete those 4 rows with old URLs, save and exit the system panel.
Re-upload and resubmit the cleaned sitemap file to Search Console. After roughly熬过 48 hours, typing the old page’s long URL into the search box returns the 2,200-word brand new masterpiece.
Quality Improvement
Removing Padding Pages
Open Google Search Console backend, set the date range to the past 16 months. Check the total clicks and average CTR option boxes. Scroll down with your mouse and click the funnel icon in the upper right of the pages report. Enter the number 10 to filter out all pages with under 10 clicks.
The URLs appearing are mostly old articles written 2-3 years ago. Switch back to the backend to check word counts—articles span roughly 250-400 words. Each article孤零零 features one 600×400 pixel free stock landscape photo. The entire article is forcefully fragmented into 7-8 very short sentence paragraphs.
A search crawler takes about 15-30 milliseconds to fetch a standard HTML page. Crawl budget on hundred-word articles gets consumed quickly.翻翻 the server log file,里面密密麻麻记录着全是 200 status codes. URLs with 200 status codes aren’t being clicked by anyone.
Before cleanup, establish criteria for which URLs to discard:
- Zero clicks for 12 months straight
- Pages closed in under 15 seconds
- Content deviates 80% from the site’s main business
- Bounce rate perpetually stuck around 92%
Save qualified bad URLs as a CSV spreadsheet. Open Excel, enter VLOOKUP function to cross-reference with visitor funnel data. Identify those long-tail links with zero real visitors. Select and circle the useless links, then go to the backend for batch modification.
Change the status codes of useless pages from 200 to 410 Gone. The 410 code issues a complete cleanup directive to crawlers. Bots receiving the signal will delete URLs from the index completely within 1-2 weeks. After modification, resubmit the sitemap xml file.
The spreadsheet will have a few short URLs that look salvageable. Clicking through reveals 3-4 five-hundred-word articles all about the same topic. Visitors mostly exit at the 40% scroll position. Select all text from the 3-4 short articles, copy and paste into a new blank editor.
Delete all greeting pleasantries from article openings.拼 together the text in chronological order, filling in real data from testing. Add 3 hand-drawn data charts, creating a 2,500+ word long article. After publishing the long article, proceed to handle old URLs.
Log into your domain registrar control panel, find and open a plugin called Redirection. Enter 301 permanent redirect code, pointing old URL access paths to the 2,500-word new article. Old visitors who sporadically visited old pages will be带 to the new link.
Wait 21-45 days, monitor the exposure curve on the Search Console chart. New article impressions can surge to 1.5-3x the combined total of the previous three short articles. Watching the page dwell time statistics, most hit 3 minutes 30 seconds and above.
Install a crawler on your desktop for routine quality screening:
- Use free Screaming Frog to crawl 500 internal links
- Set software to filter and抽 out pages with too few words
- Identify病态 pages where text comprises under 10% of HTML
- Check for duplicate H1 headings across short pages
Text under 10% means the page contains too much bloated useless code. Hundreds of words wrapped in thousands of lines of fancy CSS code. Open the page on your phone using 4G—白白 waiting 4.5 seconds and still no images or text appearing. Readers leave before the layout even finishes loading.
Once open-source website building systems install, a bunch of tag pages and category directories always automatically appear. Tag category pages with only 1-2 articles are空壳子 in crawler eyes. Check the full-site index report—tag suffix URLs have extremely high error rates. Too many空壳 pages pull down the entire domain’s quality score.
Enter backend settings and add Noindex tags to all tag pages. Put navigation-only list pages with no readable content on the blacklist. Manually add 150-word introduction text to category directory pages. Explain in plain language which 5 specific topics are contained under this section.
Establish a routine—check four backend data changes monthly:
- Watch the increase/decrease in discovered but not indexed URLs in Console
- 翻翻 daily 404 error log frequency from the server
- Calculate what percentage of articles are under 800 words
- Measure whether the first network byte on mobile exceeds 800ms
Removing “AI Flavor”
Open your site backend text editor, randomly翻出 10 old drafts posted half a year ago. Your eyes scan the first paragraph—满屏都是 “as the pace of the era accelerates” kind of pleasantries. The greeting introduction alone takes up 180 characters,半点 useful information released. Checking the dwell time on the data dashboard, most readers hit the X button and leave at the 12-second mark.
Publishing software usually has fixed API configurations. Press Enter, and a single article quietly consumes about 800 tokens, costing 0.02 cents,吐 out a bunch of plausible but empty talking points. To regain genuine emotional reading experience, you must smash the cold machine shell.
Before screening page text, set several deletion red lines for yourself:
- 50-character-long sentences with strong lecturing tone
- A 1,000-word article with zero first-person “I” in hands-on records
- The main search keyword crammed 3 times within just 50 characters
- Expressions using absolute language with 100% certainty
- Stiff historical materials copied from encyclopedia pages since 1990
Focus on the 150-character pleasant paragraph at the top of the screen, delete it all with backspace. Replace it with the real popup error you encountered while testing software last Tuesday at 9 AM. Clearly type that red 403 error code on the first line of the opening paragraph. When readers click through, they understand the person typing actually spent 2 hours testing that tool.
Move your gaze to the middle text blocks. Machine-generated drafts always favor extremely rigid paragraph formatting. Each paragraph’s length is eerily consistent, all characters falling in the 85-90 range. Scanning 5 consecutive paragraphs, the formatting is整齐得像用模具刻出来, putting people to sleep after two glances.
Use your mouse to break up and reorganize the rigid formatting. Some paragraphs cut down to just 15-character single sentences, others keep 110-character detailed operation sections. Varied visual spacing lets eyes fully relax. Insert a photo taken with your phone’s rear camera between paragraphs 3 and 4, compressed to under 120KB.
Some old articles cater so heavily to search engines they’ve contracted severe machine-writer毛病. The writer tries to fill every 5mm gap in each paragraph with search terms. Press Ctrl+F and search for the main keyword—a 800-word article lights up with 25 glaring highlights. Manually replace the excess 18 main keywords with natural everyday terms.
Relying solely on the naked eye to pick out machine traces is exhausting. Installing a few small plugins to scan saves 30 minutes:
- Put text through a detector to scan areas where similarity exceeds 70%
- Use Hemingway editor to揪 out complex sentences with 20+ words
- Calculate paragraphs where adverb ratio exceeds 5% of total vocabulary
The 6 yellow-highlighted sentences扫出来 by plugins must be rewritten one by one. Replace machine-favorite “maximize this technology’s effectiveness” with plain speak “just use this straightforward method.” Change the empty-sounding “response speed has been significantly improved” to “startup time reduced by 3.2 seconds.” Real test data in hand beats 100 flowery rhetorical phrases.
Long passages of empty grand pronouncements must also be firmly cut. Machine writers habitually add a 200-word macro-level sentiment at article end.翻翻 the backend reading heatmap—连 8% of visitors won’t even scroll to that bottom position. Delete all useless sentimental text cleanly, leaving only 3 simple clear-cut error-prevention suggestions.
After completing all text revisions, read the entire piece aloud. If you feel out of breath at the 450-character mark, add a comma to break up a 30-character run-on sentence.



