Quote:
Originally Posted by Tent Pitcher
The problem isn't really in interpreting a TLD as adult-oriented or not, it is that you can't base a site's categorization on the domain name - and Google doesn't and won't. The domain name is a very small (mostly non-existent) part of the overall relevancy equation that is applied to sites as they are indexed.
|
That's a very well thought out reply. However, I still think I may be right in some important respects. Emphasis on "may be".
Google is moving towards semantic search -- i.e. to interpret and anticipate the intention of the person entering a query. At this stage, Google already lumps together various particular query phrases and keywords into topic-centered "Broad" search sets. Many AdWords advertisers already see their ads appear for queries that do not have any literal overlap with keyword or phrase inputs. That means that Google does interpolate keywords when making ad assignments. And if it works for ad assignments, then it works the other direction too -- meaning that Google is probably at least considering options for interpreting search queries as if they correspond to or represent other literal phrases.
Given the explosion of new gTLDs that will happen in the next few years, I find it hard to imagine that Google would not be looking into the TLD as an increasingly important factor in search. Let's assume, just for the sake of argument, that .hotel becomes successful. If I search for "chicago places to stay", wouldn't Google consider displaying Chicago.hotel purely on the basis of the domain name? Let's assume that the keywords "places to stay" and "hotel" do not appear anywhere in the on-site content for Chicago.hotel. (Perhaps the copy writer was asleep at the wheel and only used phrases such as "visiting Chicago" and "rooms" and "motels".) Nevertheless, Google would ideally know that "places to stay" is semantically similar to "hotel" from the TLD. Then, after verifying some other signs of relevance in the site content and authority from the links, Google would presumably index Chicago.hotel even for a query that does not contain overlapping keywords.
Semantic search has to make inferences from literal search queries to vocabulary sets associated with some topic of interest. One of the pieces of information a search engine can use to help make these logical leaps is the domain name itself. Theoretically, .XXX can stand for the entire "adult" bundle of keywords and phrases as a sort of * / wild card. Naturally, other on-site factors will have to corroborate this before Google will index a website.
Let me give an example to show what I'm thinking about. As you mentioned, if Lesbians.xxx is all about something else -- used tractors, for example -- then Google should and will throw out the adult associations gleaned from the .XXX TLD. But what if the website built on Lesbians.xxx contains phrases like "girls having sex with girls" but NO instances of the word "lesbians". Let's suppose that a site promoting conservative values contains the words "girls having sex with each other". Finally, suppose I search for [girls sex each other porn]. That's not an exact match for any of the content on either of these websites. But if Google were a human being, that human being would refer me to Lesbians.xxx on the basis of .XXX = porn plus "girls sex each other" = "girls having sex with girls". And if Google were a human being, that human being would not refer me to the site on conservative values, in spite of the resemblance between the phrases. Why the difference? Because the domain name itself conveys important information.
If somebody came up to us and said in broken English, "girls sex each other porn", we know which of the 2 websites we would recommend -- Lesbians.xxx versus ConservativeValues.org. And we know that we would base part our recommendation on the domain name itself.
I'm not saying that Google does this. I'm saying that you or I would do this in person. And what we would ideally do as human guides is what Google tries to emulate with its algorithm.