Server Side SEO

 

Status codes

The server builds the HTML page, and sends it to the browser. Things can go wrong: that’s the reason why also a status code is sent along. Status codes are grouped by the 100’s.

The 200 series indicate things went well.

The 300 series is about multiple pages related to each other. A page might have moved to a new URL. If the webserver is able to propose you a suitable page instead, you get a 301 or 302 in return.The 400 series are errors where it is likely the “client” is the source of the error, whereas 500 series are errors where it is more likely the server is at cause. This split is not always clear though.

 

Some examples of codes:

  • 2xx: Success (Page found)
  • 200: OK
  • 204: No response (no info to send back at this time)
  • 3xx: Redirects
  • 301: Moved - permanent redirect
  • 302: Found - temporary redirect
  • 4xx: Errors caused by the client
  • 400: Bad request
  • 401: Unauthorized
  • 403: Forbidden
  • 404: Page not found
  • 5xx: Server errors
  • 500: Internal error
  • 501: Not implemented
  • 502: Service temporarily overloaded

 

Redirects

Content is moved to another URL. E.g.: a product that has disappeared from your product range.

Google has a preference for older page’s. The history of an URL where the page disappeared is something you want to redirect to another URL so you don’t loose this visitor.

Make sure your webserver performs a redirect en uses statuscode 301 so the search engine will realize this page will not come back and it has to update it’s database.

You will often see code 302 being used instead. This is wrong, as this instructs the search engine that the redirect is temporary, and it should stick to the old, disappeared URL.
 

In a nutshell:

  • Products disappear, websites are updated, the URL becomes invalid…
  • Status code
  • 301 Moved permanently
  • The search engine understands this is a permanent move. It will update its database
  • 302 Moved temporarily
  • The search engine will not take the trouble to update it’s database.

 

Duplicate content

Occurs if 2 or more URL’s have the same content. Some typical causes for this:

  • A display and a print version on your website of the same page
  • The same product appears in more then one category, using a different URL.

The consequence of not taking action on this is a drop in pagerank.

You can prevent this drop in pageranking by telling the search engine (using a tag) which version is the canonical version ( “golden” version).

This will prevent a duplicate content penalty.

 

Registering your website with the Search Engine

Search engines will find a website and start indexing it

  • Because they find a link on another website towards your website, and follow that link
  • Because you explicitly told the search engine about your website (submit the URL)

You should use both methods – submitting yourself to speed up the process of being known by the search engine, and incoming links from other websites to give your page a better ranking
 

Submit your URL to Google (and others) for indexing

 

Robots

Robots.txt is a text file with a predefined syntax you have to place at the root of your website. This file instructs the search engine on what needs indexing and what not.

Some examples of syntax:

This one instructs all search engines (the “*”) to index everything:

User-agent: *
Disallow:

This one instructs all search engines not to index this website

User-agent: *
Disallow: /

This one instructs the search bot of Google to index everything, except for 3 directories

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

If you are not the administrator of the website, you might not have the authority to place this file at the root of the webserver. If that’s the case, use HTML META tags to keep robots out of your documents. Those you can use, because those tags are part of the pages HTML.

 

XML Sitemap

  • 2 kinds of Sitemaps
1.The one on the website, for end users
2.XML-sitemap, tells search engines the URL’s of your website
  • Will help the search engine to find pages not accessible through the website hierarchy (this will mess up ranking though)
  • Important for quickly changing websites (newspaper, job websites,…)
  • www.sitemap.org