Google returns, via its webmaster blog, on good practices in writing URLs and focuses on the errors to avoid to facilitate the work of indexing engines and in particular Google Bot …
The web contains a lot of content, for reasons of resources, the engines can only index a part of it and it is the links and URLs that will help him in this work. In general, we will avoid complexification of URLs and redundancy to avoid duplicate content.
Several practices are to be avoided, it is for this purpose that Google has made a presentation on the practices to be avoided in matters of links. Here is a summary :
- Avoid session Ids and other user info : Google recommends removing session ids and other variables that have no influence on the display of the page to put them in a cookie. The effect will be the same and the URLs will be lightened.
- Remove endless links : the classic example of this kind of links: calendars. On some sites, there are calendars. Each year, each month and each day for the next few years are paginated and create a multitude of URLs even if the dates contain no data. Just block access to these pages via the robots.txt file.
- Only index successful pages : some pages such as forms, contact pages, etc. cannot be deemed relevant or not by an engine. Google therefore advises to block them via robots.txt. On this point, I have a flat, why not add relevant content on these pages: a paragraph of presentation, explanation of the form … and thus make them more meaningful in the eyes of the engines.
- One URL per page : a URL must refer to only one content and content must only be accessible by one URL. If your CMS generates multiple URLs for the same page, consider using the “canonical element” attribute.
More details are given in the presentation…
Source: Google Webmaster Blog