Google is cracking down on gibberish. Not the fake language you used with your friends in middle school, but the nonsensical writing you find all over the web as people try and fill out their pages with lots and lots of content. Of course, experts will tell you that you need quality content for SEO (and they’re right) but that motto of the bad digital marketer has always seemed to be, “why do things right when you can do them wrong but faster?”
To crack down on these people, Google has filed the following patent:
Identifying gibberish content in resources
Invented by Shashidhar A. Thakur, Sushrut Karanjkar, Pavel Levin, and Thorsten Brants
Assigned to Google
US Patent 8,554,769
Granted October 8, 2013
Filed: June 17, 2009
This specification describes technologies relating to providing search results.
One aspect of the subject matter described in this specification can be embodied in methods that include the actions of receiving a network resource, the network resource including text content; generating a language model score for the resource including applying a language model to the text content of the resource; generating a query stuffing score for the reference, the query stuffing score being a function of term frequency in the resource content and a query index; calculating a gibberish score for the resource using the language model score and the query stuffing score; and using the calculated gibberish score to determine whether to modify a ranking score of the resource.
Now, we know that Google has used something like this for a long time now. They’ve been pretty good at picking up on bad writing, but this patent is going to take it to the next level. We suggest you go through all of your pages and really review them for gibberish writing. If you’ve used a mechanical turk, translated material, or any other kind of cheat. Ditch it now, or pay the price in big penalties later.