With the enormous success of content management systems (WordPress, Joomla, Drupal etc.) anyone can easily create a website and start communicating online. A few plugins for security, loading time, SEO and presto! You’re online. However, these tools give you little advice for optimizing your website for Google. Emerging companies or low budget advertisers do not always have the forethought to go to web marketing agencies who are often considered too expensive, to ensure their referencing on Google.
You will see in this article that the things to consider to ensure a good indexing on Google can sometimes be delicate if you do not have technical background knowledge. We therefore advise you to always make a backup of your site before messing around with sensitive files.
1 / How does Google see your website when it visits it?
It is important to check what Google sees when identifying factors that may prevent it from properly analyzing your site. To do this, you can use software such as Lynx, which is a text-based browser, this is to say it displays your site in the same way that the search engine robots see it.
2 / Do not track Google robots
We strongly advise against using session IDs to track the path taken by robots on your site. It is possible that you use (even without knowing) this type of process to analyze the behaviour of the users on your site. It should be noted that the crawling mode of Google bots is very different from that of visitors, and that session IDs are not always indexed correctly by search engines.
To make it simple (at least to try), a session ID is a unique identifier, often inserted in a cookie or URL to trace the activity of a visitor on your site. Here is an example session ID: www.mylittlebigweb.com//index.php?sid=6543dfICujmeud83ebe894e5
Session IDs change each time a visitor comes to your site. Robots analyzing your site will find that your URLs change systematically while your content remains the same. Google may therefore interpret this data as duplicate content and this will hamper your SEO in search results.
The best solution to fix this problem is to remove these session IDs by adding a few lines of code to your site. For example, site owners in php can enter the following lines of code in their .htaccess file:
Php_value session.use_only_cookies 1
Php_value session.use_trans_sid 0
3 / Let Google know about changes made to your site
There is no need to consume bandwidth if no changes have been made to your site since the last crawl of the search engines. Conversely, it is advisable to send a signal to Google if you have modified your content since their last visit. To do this, verify that your web server supports the If-Modified-Since HTTP header.
In effect, before displaying a web page, your browser sends an HTTP request to the server. The if-modified-since header lets you add a condition to the query by asking it to compare the date the robots last visited with the date the page was last modified. If the date of the last modification of the page is older than the date of the last visit of the robots, the server returns a code 304 which indicates that the file can be recovered from the cache, thus reducing the loading time of your page. If the page has been modified, the server sends a 200 code and the content will be crawled by robots again.
4 / Let robots know which content to index or not
It is possible that your site contains pages that you do not want to appear in the search results. So you need to tell Google which pages should be referenced and which pages should not. To do this, you must use a robot.txt file on your web server making sure that this file is suitable for your site and that it does not block access to robots.
To find out if you are using your robot.txt file correctly, log in to your Google Webmaster tool section. Select your site from the home page and click “Blocked URLs” in the “Exploration” tab. Then click on the tab “Robot.txt file test”, copy the contents of your robot.txt file and paste it into the first field. Specify the site to test in the URL box and select the user-agents of your choice from the “User-agents” list.
If you do not have access to the root of your site, you can place the following Meta tag in the <head> section of your page:
<Meta name = “robots” content = “noindex”>
If you only want to deny access to Google’s robots but want other search engines to browse your pages, you can insert the following tag:
<Meta name = “googlebot” content = “noindex”>
5 / Make sure your site is correctly displayed on all web browsers
You will be surprised to see display differences between Internet browsers (Google Chrome, Firefox, Mozilla, Safari etc.). Even two different versions of a browser can offer different display modes, thus altering the user experience. Your HTML and CSS codes must therefore be written with care to ensure that your pages will be displayed in the same way on all browsers and their future versions. The use of CSS makes it possible to distinguish the content of your site and the way you present it, thus improving the final rendering and the loading speed of your page.
You can use validation tools to ensure that your HTML and CSS codes have been correctly written. If you see errors in one of your codes, you can use tools such as HTML Tidy that can help you quickly fix your code. While we advise you to pay attention to the writing of your HTML and CSS codes, be aware that this does not influence how search engine robots will explore and index your site.
These technical tips will help you make sure that search engines have access to your site and that no element reduces your performance. Although these points can be analyzed using free tools, it is your responsibility to take the necessary precautions before performing any type of manipulation.
Even if the explanations you find on the Internet can be very clear, things do not always go as described in the articles. Do not blindly follow the explanations given by unknown Internet users whose level of skill is also unknown.