Not so long ago I was writing about google Intent and how Google now wants to guess what you actually though about when you searched for it, but wrote a misleading keyphrase in the search field. This time it is something different, related to the search core itself, at least I suppose this is how Google will try to define search signals on the web: until now we heard and read a lot about how it is possible to extract the subject of a page by doing some simple statistics on keywords. The more times acertain keyword occurred, the better we hoped the ranking would be.
This is still true to a certain extent, but the whole search algorithm has to evolve into a much more content-aware state, where the search engine not only sums up the keyword count, but actually starts understanding what it is that you write about on your page. This has been dubbed Google Entity on some sites, but has not seen much of the daylight yet, because it is a very difficult to understand area. How could Google understand the notion of, say, Plasma TVs, and then start deriving it’s every aspect, like it’s usage, it’s manufacturing process, it’s history, everything that comes to mind to a Plasma TV-literit. Well, I don’t have the answer to that, but there definitely is a shift in search engine algorithm paradigm, because SEOs have no idea what it is that drives Google’s understanding of the pages anymore. It is probably in a very early stage, but Google does know a lot about the subject you are writing about, we could say, even more than you wrote and then there would be the question of ‘is that bad? Does that attract penalties?’ I’d think not directly, but in comparison to other sites which do have all the facts. But in a near future I can imagine Google comparing your facts with theirs too rather than your competitor’s exclusively. If you write about a Plasma TV and add one photo plus a line of simple specs, you might end up at the bottom of the search results queue, since others have better described that Plasma TV, and it’s every aspect and Google already knows that they have spoken about each ‘chapter’ of the Plasma TV ‘book’, while you have cut some corners in your haste.
Understanding the entities would mean a certain degree of AI, and a vast knowledge base I would presume, because how else would Google know such facts? Maybe Google is not only about self-designated score components represented on a numerical scale anymore. I’d say this is a thing of the future, but watch out for a search engine that understands the entities of life and can tell whether you have exhausted the subject or just hastily thrown together what information you could find.
UPDATE: I’ve just checked and I seem to have been right about the knowledge base: they have acquired Metaweb and it’s www.freebase.com along with it. Checking out freebase.com quickly lead me to understand how the automation of understanding entities must work: each entity, as in a person, or a place or product has a vast amount of information on it’s freebase page / in it’s database table. Each small piece of information on such a page is also a link to a new entity page. This is how the interlinked entities form a web of connections and it is mathematically possible and pretty easy to extract the needed information about an entity, which in our case is a search phrase, and it is also possible to tell what other search phrases it is linked to and what information is there about it. Freebase has extensive info on many things, but I expect Google to mainly use the technology and theory behind it to gather a huge amount of information for it’s knowledge base.