The steps are quite simple:
- Add canonical tag: Add it to duplicate pages with
<link rel="canonical" href="Main Page URL"> - Set up 301 redirect: Redirect similar or old URLs to the main page (server or .htaccess configuration)
- Block parameter pages from indexing: Add
<meta name="robots" content="noindex,follow">to duplicate parameter pages - Unify internal links: Have all internal links point to the main URL only
This can generally reduce duplicate indexing by 30%-50%.

Table of Contens
ToggleSetting Up Canonical Tags (Preferred Method)
How to Set It Up
Placing the code at the very beginning of the webpage in the <head> section is the most basic operation. Remember to place it right below <title> and keep it within the first 15 lines of code. When crawlers download the first 20KB of a webpage, they can spot it immediately. If placed in the <body> main content area, crawlers won’t even glance at it and will discard it—wasting hundreds of server visits daily.
Online store URLs often have long strings of letters and numbers after them for tracking and accounting purposes. Servers spend an additional 150 milliseconds reacting to each such long URL. When backend developers write this standard code, they must completely remove everything after the question mark, leaving only the clean version of the URL in the href field.
Common tracking suffixes you might encounter:
sessionid=12345(tracks visitor browsing)utm_source=google(marks advertising source)sort=price_asc(sorts by price low to high)category=shoes(filters specific shoe category)page=2(goes to page 2)
For product manual PDF files that take up 2MB of space, you can’t embed frontend code like regular web pages. Once indexed, PDF files compete with web page versions for ranking. You’ll need to modify the underlying Nginx or Apache server configuration to give these non-web formats a special pass.
The method is to edit the .htaccess root file on the server and add a directive containing Link: <https://site.com/product-page>; rel="canonical". Within the first 50 milliseconds when the server starts serving the PDF, it sends this signal to search engines. With the complete signal carrying HTTP status code 200, the authority transfers smoothly.
When you post your hard-written blog content on Medium or other major forums, your original site’s search traffic gets pulled away significantly. Cross-domain canonical tags handle the handoff between two completely different URLs. In those external forum publishing backends, fill in the full absolute URL of your original article including https, and your home site easily recovers nearly 90% of the initial authority.
For platforms like Shopify, the underlying /collections/all directory often generates thousands of duplicate product pages. Frontend needs to edit the theme.liquid theme file, finding the section from line 25 to 40. Add a segment of rendering code with {{ canonical_url }}, and the system can clarify ownership of tens of thousands of duplicate pages across the entire site in 0.2 seconds.
Where to enter URLs in major content management systems:
- Yoast SEO plugin: Scroll to the bottom of article editor, find the “Advanced” menu.
- Rank Math tool: Click the gear icon on the right sidebar for “Advanced” tab.
- Magento 2 system: Go through Store → Configuration → Catalog.
- Wix builder: “Advanced” markup section at the bottom of standalone page settings.
Within 48 hours after the code goes live, log into Google Search Console. Enter the URL in the search box and press Enter. In the “Coverage” report under “Webpage Indexing,” watch the line “Google-selected canonical URL.” Carefully verify that the URL crawled by the system matches the URL you entered 100% character by character.
The URLs submitted in the sitemap.xml file must be identical to the main URLs you set—off by even a millisecond is unacceptable. If the sitemap sends crawlers URLs with long tracking suffixes while the page points to clean URLs, crawlers will chase conflicting instructions thousands of times daily. Writing a cleaning script to remove irrelevant URLs from the sitemap saves 30% of the site’s daily crawl budget.
Steps to check your new page with Chrome browser:
- Press F12 to open the developer panel.
- Click the Elements tab.
- Press Ctrl+F and search for
rel="canonical". - Carefully check if
https://is missing in thehreffield. - Search through the entire page source to ensure this line appears only once.
For long articles split across multiple pages, old practices often pointed all pages from 2 to 10 back to page 1. Crawlers have updated their rules now. /blog/page/2 (page 2’s URL) must honestly point to itself, filling in href="https://site.com/blog/page/2". Redirecting everything to page 1 means the 20 articles starting from page 2 will be treated as non-existent.
Sites still using the old m.site.com mobile subdomain need to include each other’s addresses in both desktop and mobile pages. Mobile version code must precisely point to the desktop URL. The desktop version should add a max-width: 640px responsive tag to help crawlers match both versions’ content within 0.1 seconds.
For large multilingual sites with hreflang language tags, setting canonical code requires extreme caution. The French version fr/ URL absolutely cannot cross-reference to the English version en/ URL. Each to their own—systems check up to 45% consistency failure rate on large sites for such matching, and slightly misdirecting can destroy the entire multilingual index.
Three Professional Bottom Lines
Adding URLs to pages with no-index tags is a minefield beginners often step on. When a crawler just received the instruction to read page A and runs to page B only to find code explicitly saying “don’t index,” two conflicting directives cause the server to fall into an infinite loop within 0.5 seconds. All historical authority accumulated by both pages A and B over three years is completely reset to zero.
Casually filling in old links during a site redesign easily leads into the 301 permanent redirect dead end. Crawlers follow the address only to find they need three consecutive redirects before seeing complete content. Once the chain exceeds 5 network nodes, the machine program forcibly terminates the current crawl task. The site loses nearly 600 precious crawl quotas daily.
Missing a single letter ‘s’ at the beginning of a URL causes catastrophic traffic loss. Assigning pages with SSL security certificates to unprotected, unencrypted legacy pages. Google’s security review algorithm, upon scanning protocol downgrade actions, confiscates that URL’s security display badge within 24 hours. Original search impressions instantly drop by over 60%.
A slip of the hand when typing URLs easily creates a bunch of troublesome invalid directives:
- Missing trailing slash at the end of the URL causes the system to treat it as two completely different addresses.
- Mixing uppercase and lowercase letters—Store encountering store triggers path recognition errors.
- Test-only dummy domain names copied unchanged into production source code.
- Filling in relative paths containing dots triggers absolute path recognition collapse.
Accidentally enabling two different SEO optimization plugins in the site backend will inevitably result in two conflicting canonical codes in the page head source. When crawler machines read the first 15KB of an HTML file and encounter two commanding main URLs, the algorithm’s action is to destroy both lines of code on the spot. Thousands of similar pages in the system’s underlying layer start competing against each other for ranking resources again.
Adjusting the underlying code of category listing pages often points all pages from the second onwards to the first page. When crawler machines follow the first page’s instructions to go further, they discover all 49 following pages’ code is pointing back. The 1000+ old articles hidden behind page 2 completely lose their tickets into the search index.
URLs containing question marks and dynamic session IDs absolutely cannot be written into attribute fields. Every time a visitor clicks, the backend database randomly generates a new string. Within just one day, the system can artificially create 30,000 useless fake unique URLs. Setting URLs with garbled parameters as the main page causes server memory load to surge 300% within a week.
Checking website code health, experienced professionals follow a standard screening procedure:
- Run Screaming Frog software for deep scans of all 50,000 pages on the site.
- Remove rows with status codes other than 200 OK from the table.
- Export an Excel error warning list of pages missing canonical tags.
- Check the “Index Coverage” section in the console for red-line errors.
For multilingual websites doing international business, language tags must be tightly bound with canonical codes. Japanese version directory code absolutely cannot be assigned across the ocean to English version pages. The algorithm takes 0.3 seconds to compare language character differences between the two pages and will detect the mismatch. The multilingual site architecture costing hundreds of thousands faces an 80% demotion penalty.
Redirecting all out-of-stock discontinued product pages entirely to the site homepage is extremely dangerous. Webmasters think they’re protecting the old pages’ 10 years of accumulated external links, but crawlers compare them against the homepage full of promotional banners and find it has nothing to do with the original shoe-selling page. The algorithm labels this forced pairing as a soft 404. After 15 days, all violating pages are thrown out of the index.
Sites still using the old practice of separate desktop and mobile URLs often reverse the arrows on both sides. Mobile URL doesn’t point to the desktop main site, desktop misses the screen size recognition code. Getting the bidirectional matching arrows wrong means 70% of mobile users will land on severely broken desktop-layout pages.
Right-click and open the browser’s “View Page Source” interface, press Ctrl+F and input the canonical code to check the exact quantity. The number “1/1” in the top right of the screen means you’re safe. If it shows “1/3” or higher, quickly go to the backend to disable the conflicting plugin. This brute-force method often works best in daily troubleshooting.
When articles are batch-collected by content farms using scrapers, the canonical code on the original author’s site acts as an anti-theft lock. Pirated scrapers copy the HTML source code along with everything, and that line with the absolute URL appears on the pirate site too. Search engines compare ownership declarations on both sides within 2 hours and accurately return 95% of search traffic to the original first-post URL.
Code placement and format have iron-clad requirements—there is no room for error:
- Code must be placed in the
<head>region at the very top of the HTML document. - Absolutely do not stuff this tag into the body text block of the page content.
- The filled-in URL must be the decoded, clean URL in its natural form.
- For PDF files, write the canonical header response in the server root directory.
Content Differentiation Rewrite
Split Search Intent
When a website has two articles about the same topic, visitors click to read and leave within seconds—the page bounce rate often stays above 85%. Rewrite one of the articles with a different approach, specifically written for busy commuters. Replace the first 500 characters of explanation entirely with images and text, telling them how to get a hot latte in 3 minutes on a morning commute.
Experts searching for the same machine want to know if the pump pressure is stable. Rewrite the second article as a test report, adding screenshots from 15 Bar pump pressure tests. Place a chart showing the 92°C temperature control curve in the center of the page. Include a 12-step real-shot video of brewing with the 58mm portafilter—both articles look completely different now.
Content regular people understand should avoid too many technical terms. Adjust the article’s reading difficulty level. Remove all complex machine terminology, short sentences should make up over 75% of the full text.
- 1.5L water tank makes 5 cups
- Steam wand with 45-degree tilt angle
- Plastic housing withstands 120°C heat
- Package includes 24-page manual
Professionals spend longer reading articles—even if the page has 2500 words, they can read it with interest. Adding several sets of hardware parameter comparisons can keep average time on page steady at 4 minutes 30 seconds.
- Temperature control system with ±1°C fine adjustment
- Two boilers totaling 1500W
- Brass fittings estimated lifespan 10 years
- Pressure gauge needle response lag 300ms
Submit both rewritten articles to search engines, and the titles visitors see will automatically separate. Search “beginner coffee machine” and get the image-rich guide with a $200 price tag. Search “single-group espresso machine review” and get a tech-heavy long-form article with test tables. Two weeks later, the original 150 daily clicks fighting each other have found their own territories.
Writing machine reviews requires browsing real buyer reviews on e-commerce sites. Find 120 historical comments with 4 stars or below on Amazon, count how many mention water tray leaking. Insert a short video showing the machine running at 65 decibels into the third paragraph. With real video and audio, over half of people willing to scroll down to the second screen increases significantly.
Bold the 7-day no-questions-asked return policy and place it below the red shopping cart button. People making purchases definitely look at the length of the warranty period before ordering. Adding this promise text quietly extends page dwell time by about 40 seconds.
- Safety certification number from the laboratory
- Promise of less than 3% repair rate within 3 years
- 48-hour guaranteed customer service response
Include specific dollar amounts in the page subtitle. Stating “under $500 budget” keeps casual browsers away. People with purchase intent who click in view an average of 2.5 additional pages on the site. No one closes the page immediately after one glance—the penalty for being too similar fades away.
Expand dry product descriptions into illustrated shopping guides, inserting 10 Q&A text boxes. Of the 200 daily visitors, 15 will click to read the boxes carefully. The article’s word count and content density should be just right—remove all flowery padding. Add 5 real photos with measuring tape showing whether the 28cm-wide body fits in a kitchen.
Add a few collapsible panels in the page that visitors can click to expand, containing long factory reports. Only serious researchers click to open the 2MB PDF file. In the backend code, change the original H1 tag and connect the two articles with hyperlinks. People who finished reading the beginner’s image guide naturally click a link to the new product showcase area.
Shift Perspective
An electric standing desk 120cm long and 60cm wide can be written about in two completely different ways. The first article adopts a freelancer’s tone, frequently using “I” to chat casually. Talk about spending 8 hours typing at the computer daily, experiencing such bad lower back pain in the fifth lumbar vertebra that you had to visit an orthopedist.
Turn the weight capacity numbers into scenes from daily life at home. The desktop holds two 27-inch monitors, a 5-pound chubby cat, and a 400ml hot Americano. Press the plastic button on the desk corner and the motor lifts the thick wood panel upward. The coffee liquid surface in the glass wobbles less than 2 millimeters.
The entire article’s tone should be like chatting over beers at a street-side stall. Use plenty of short sentences starting with “we,” “hey,” “see.” Write about your clumsy self spending 45 minutes tightening the 16 long screws at the base, ending up exhausted and plopping down on the hardwood floor gasping for breath.
While mentioning the scene of the whole family sleeping soundly at 11 PM, press the height button and measure with a decibel meter close to the desk legs—the motor noise is only 45 decibels, about the same as an old electric fan on low speed in summer. This sound won’t wake the 3-month-old baby in the next bedroom.
Turn the pen around and write the second article, with the subject entirely switched to “the company procurement guy.” The tone should be like a meticulous warehouse manager with a calculator. Draw a 150 square meter open-plan office layout on paper, calculate that after fitting 20 desks there’s still 80cm of aisle space remaining.
Remove all the fat cat and coffee cups from the first article, replace them with several thick safety inspection documents. Emphasize that the desktop passed BIFMA formaldehyde emission testing. In a sealed 20 square meter room without windows for a full 7 days, the instrument measured the odor index firmly at 0.03mg in the air.
| People viewing the desk | Home personal use | Company bulk purchase |
|---|---|---|
| Load test | Two 27″ monitors plus clutter, 35 lbs total | Withstands 150 lb industrial sandbag |
| Motor lifespan | 4 lifts daily, estimated 3 years | 10,000 continuous lifts, still cool to touch |
| Repair process | Contact customer service for spare power plug | Contract includes full dual motor unit replacement |
The boss buying the desk doesn’t care how tired you got tightening screws—they’re watching the installer work speed. Write that buying 10+ desks gets 3 uniformed technicians from the manufacturer to install. Armed with 3 powerful electric screwdrivers, they finish assembling all scattered steel tubes and test the powered-up desk in just 2 hours.
Content for company buyers needs more payment policy details. Include a screenshot of the tiered pricing: 15% off for orders over $5,000. Write clearly about the 15-character limit for invoice header, and mention the 3-5 business day financial review time for bank transfers.
For content aimed at families, sentences are short:
- My back hurt so badly yesterday I couldn’t bend over
- Button press feedback feels a bit crisp
- Dropped some red curry sauce from takeout, wiped it off easily with a paper towel
For content aimed at procurement agents, everything is hard-nosed terminology:
- Bulk orders include 12 pages of environmental review documents
- Steel frame exterior coated with 2mm anti-rust paint
- Comes with 5-year enterprise-level on-site warranty card
In the personal-use article, the desktop color is called “cherry natural wood style” and pairs beautifully with the cream-colored linen curtains. In the company-oriented content, that same color must be renamed to “scratch-resistant laminate.” Write that scratching the wood with a metal key produces a 15cm white mark, which wipes clean with a wet cloth.
The preset height buttons need different descriptions too. In the personal version, press and hold the “1” button for 3 seconds and the desk stops at 102cm—perfect for someone 175cm tall. Press “2” to lower to 75cm, fitting that $200 second-hand black leather office chair you found.
In content for company procurement, this button function becomes “suitable for multi-person rotation.” Write about the morning shift employee 160cm tall and the evening shift male programmer 185cm tall. They share the same desk, and during shift change, pressing one button in 8 seconds switches to their comfortable viewing height.
Photos on the personal page are close-ups of warm yellow desk lamp light on wood grain texture. A dog-eared novel opened to page 30 casually placed beside it. Photos on the company procurement page show 20 empty desks neatly arranged in two rows under fluorescent lighting. Clean plastic cable trays on the floor, not a single extra black wire visible outside.
Add Exclusive Value
Other sites have over 300 articles with identical punctuation and wording. I spent 899 yuan buying the same coffee machine, grabbed a screwdriver and removed the bottom shell. Pressed my macro lens close to the heating mainboard—only two fingers wide—clicked the shutter and shot over 20 photos. Chose the clearest enlarged photo and placed it at the very top of my article.
Normally people only see the manufacturer’s retouched beautiful photos online, rarely any real ones with coffee grounds. I posted a photo of the portafilter covered in brown coffee residue. The photo has a timestamp watermark showing 3 PM that afternoon. Visitors who clicked through stopped scrolling immediately.
Went to the coffee shop owner at the corner downstairs who’s been running it for 5 years. Took out a recorder and captured his 3-minute rambling about the machine’s continuous shot capacity. Went home, listened to the recording and typed about the speed at which water temperature drops.
- 6th consecutive shot, water temp drops to 85°C
- Steam wand’s small holes one-third clogged from long use
- Original portafilter can’t fit 18g of dark roast beans
- Top hot cup warmer holds only 2 cups maximum
The shop owner’s authentic plain language mixed into the article makes plagiarism detection software unable to scan out duplicate paragraphs. The machine sat on the bar counter for 3 months—black dirt accumulated in all the plastic shell seams. Used an old toothbrush dipped in water, scrubbed back and forth along edges and corners for 12 full minutes, typing all those dirty hands-on details into the page.
Bought a small kitchen digital scale that measures to 0.1g. For 30 consecutive days, weighed the coffee beans consumed every morning and recorded on paper. Filled those 30 numbers into a spreadsheet, drew a line chart with ups and downs, and placed it steadily in the center of the page.
Added the specific date when the original rubber seal ring broke in the middle section of the article. The first ring lasted until day 105 when a small crack appeared on the edge. Went to the hardware store on the corner, spent 2 yuan on a same-sized silicone ring. Wrote 150+ characters about the hands-on experience saving dozens of yuan.
The water tank lid always falls off—buyers often complain in the reviews. Found a small square magnet 3cm by 3cm, applied glue and pressed it on the back of the lid. Took 4 step-by-step photos of the modification with a phone and arranged them in order below the text.
Heated up the machine first thing in the morning, held the phone close to the shell and recorded a short video. No background music throughout—just the hum of the water pump. The video is exactly 45 seconds, uploaded into the page’s video player for people to click and watch.
- Machine warm-up takes 42 seconds
- Liquid flows after 6 seconds once switched on
- Steam tube heating requires another 15 seconds
- Waste grounds tray full capacity exactly 300ml
Buyers want to hear how noisy the machine is in a kitchen. Throughout the day, backend data showed the video progress bar was dragged back and forth over 200 times. Added a screenshot of the 72-decibel noise reading measured with the meter close to the machine.
Also calculated the ongoing parts consumption costs after purchase. Need to replace the soft water filter 4 times a year, each original filter costs 80 yuan. Annual filter costs total 320 yuan—these clear calculations occupy over 100 characters in the page.
Bought a 199-yuan manual coffee press to use as a control. Opened the same bag of Ethiopian coffee beans and weighed out 15g for each side. The manual press requires about 15kg of force, finishing one cup with sweat dripping down—measured both sides and displayed the numbers side by side.
The espresso from the machine has a crema layer about 4mm thick. The manual press crema is a thin layer only 1mm, completely dispersed within 2 minutes. Measured both measurements with a ruler, took a comparison photo showing the scale clearly visible, and uploaded it to the page.
Ordered a cheap 9.9-yuan plastic cleaning brush and paired it with a 50-yuan small can of cleaning powder. The shopping receipt screenshot with thick blur applied is placed near the bottom of the page. People reading the article stare at the long list of expenses and start calculating their pocket money.
Found a service center list that nobody normally bothers to organize. Called service hotlines for the top 10 cities nationwide, recorded all 10 street addresses and placed them at the article’s end. Next to each, included operating hours: open at 9 AM, close at 6 PM.
Taught hands-on how to remove stubborn old coffee stains from the portafilter. Boiled a pot of 100-degree water on the stove, removed the metal filter basket and soaked it in a basin for 20 minutes. Wrote about using a toothpick to pick out the 5 black hard particles from the mesh holes—full screen烟火气 (lively atmosphere).
- Soak in boiling water 20 min to soften old grease
- Scrub 30 times back and forth with hard brush on both sides
- Air dry on windowsill takes 4 hours
Using Noindex Tags
Best Use Cases
An online store with 500 short-sleeve items has color and size options in the sidebar. Visitors freely combine choices and the backend generates 250,000 URLs with filter parameters. Search robots have a daily quota of only 100,000 visits—exhausted entirely by identical product listings. Adding no-index code to pages with over 3 combined options is standard practice.
The original product page retains sole listing authority. The technical team’s weekly report shows useless page error rate dropped below the 5% safety threshold. Programmers tested 4 new page layouts on a test subdomain. Forgot to add one directive, and 4,500 pages with garbled code went live to public search within two weeks.
Test-phase site headers must have no-index directives forcefully written in. Use server-side files to hard-code this rule, providing 100% blocking guarantee. WordPress-built sites automatically generate archive pages sorted by date and author. An 800-character diary gets copied unchanged to 5 different URLs.
Install a small plugin in the backend to set all archive directories without standalone articles to no-index. The main category directory keeps indexing authority and search traffic decline risk drops 30%.
The 2022 Singles Day promotional page is full of expired 50-yuan coupons. The page’s click rate is below 0.01%, staying in the main site drags down the overall 15-point quality rating. Adding no-index tags to expired campaign pages is cost-effective. Returning customers can still find past promotional details through their bookmarks, while search programs delete it from the index within 72 hours.
When visitors type a string in the site search box, the system generates a dynamic results page. Machine crawlers go wild through search boxes and create 15,000 disheveled useless pages overnight. Adding a no-crawl declaration at the very top of the search template file solves this. To clean up the 12,000 search URLs already incorrectly indexed, submit a 6-month block application through the backend.
Here’s how to handle different types of pages:
| Page Content Characteristic | Special Characters in URL | Handling Method | Estimated Cleanup Time |
|---|---|---|---|
| Login form page | Contains login | Add code in page header | 48 to 72 hours |
| Test backup pages | Contains variant | Server directive | 7 to 14 days |
| Employee internal directory | Contains staff | Plugin backend global setting | 3 to 5 days |
| E-book file download packages | Is pdf suffix | Config file block rule | Over 15 days |
For dozens of pages of PDF or Word format whitepaper download links, you can’t write conventional webpage tags inside them. In the server’s config file, add response header directives to block the 3 specific file suffix types.
A newly launched forum has 100,000 registered users, most of whom never posted after signing up. 80,000 blank personal profiles with only names and no posts severely drag down the site’s trust score.



