This is a hard thing to debug from outside the Googleplex, but you are currently...

mmavnn · on Jan 12, 2015

Interesting; that's autogenerated by Octopress (or Jakyll). I'd never noticed it's missing the '/'.

mmavnn · on Jan 12, 2015

Known Octopress issue, apparently: http://hackingoff.com/blog/octopress-default-seo-flaws/

Thanks, patio11

ovi256 · on Jan 12, 2015

Is it irony or something else that a system of supposed vast computing power and learning (and certain real world power via its distribution of search riches) is broken by a tiny thing like this fix for a missing slash ?

thaumaturgy · on Jan 12, 2015

URLs are really, really, really, really hard to get right on a large scale. For a side project I've written my own crawler/indexer and I try to do deduplication where possible, and the reality is that:

    domain.com/this-page-here

can serve entirely different content from

    domain.com/this-page-here/

depending on the server (and application) configuration.

Pretty much the only way to 100% reliably deduplicate URLs is to look at their content, and somehow magically compare content that can change from page load to page load -- which is a whole other problem.

jfoster · on Jan 12, 2015

Exactly. It's so difficult to get URLs "right", and that's quite non-obvious until you do something like writing a crawler.

Another example is whether foo.com/bar is the same as foo.com/BAR. Usually yes, but it's entirely possible that they will serve different content.

Also, which URL parameters should be disregarded, and which should be considered important? A crawler must do quite a bit of nontrivial page introspection in order to figure out the answer to that all on its own.

Often pages that are essentially the same will be a bit different. Timestamps and time-sensitive data (eg. listings on a marketplace) will trip you up, here.

AznHisoka · on Jan 12, 2015

I wouldn't say the crawler is broken at all. It's picky, as it should be. An URL that ends with a / be an entirely different web page than an URL that doesn't end with a /.

Also, you might ask why Google won't ignore the canonical URL if it's an invalid URL.. well, that's what you get with the canonical URL - you're explicitly telling Google this is the "real" url of the web page. You can't have it both ways, and then complain Google is ignoring your canonical tag.

troels · on Jan 12, 2015

Well, if you give a canonical tag out, you're really taking on the responsibility of resolving different url's, so you should make sure you do it right.