GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)
-   Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)
-   -   How smart is Google(bot) when it comes to text (https://gfy.com/showthread.php?t=686144)

Dirty F 12-12-2006 02:28 PM

How smart is Google(bot) when it comes to text
 
Lets take this text for example. Its from a new sites.

England Prostitutes in the red light district of Ipswich, England, are being warned by police to stay off the streets, after the bodies of two more women were discovered today.

If i copied that and put it on my blog Googlebot will find out and kick my ass for it right?

What if i changed it like this:

England Prostitutes in the red light district of Ipswich, England, are being warned by the police to stay off the streets. This happened after the bodies of two more women were discovered today.

Is this enough to create a unique text in Googles eyes?

who 12-12-2006 02:32 PM

No it's still 90% identical,, if that's what you're asking..?

HomeFry 12-12-2006 02:34 PM

Smarter than you!

Arab_Sex 12-12-2006 02:35 PM

Have no idea where your going with this

Dirty F 12-12-2006 02:36 PM

Quote:

Originally Posted by Arab_Sex (Post 11514685)
Have no idea where your going with this

Yeah i know its really hard.

Dirty F 12-12-2006 02:37 PM

Quote:

Originally Posted by who (Post 11514671)
No it's still 90% identical,, if that's what you're asking..?

Is there a certain % how different it should be?

scottybuzz 12-12-2006 02:38 PM

I think if you copied that from somewhere else and popped it on your blog and changed those 2 things, I think googlebot would know and give you lower serp than the original peice of text.

i think this

"England Prostitutes in the red light district of Ipswich, England, are being warned by " is long enough for it to see cut and paste.

scottybuzz 12-12-2006 02:39 PM

btw franck, i think ur cool

who 12-12-2006 02:39 PM

It's debatable but below 60% seems to be where most people say it becomes 'unique' in google's eyes.

Dirty F 12-12-2006 02:39 PM

Quote:

Originally Posted by scottybuzz (Post 11514711)
btw franck, i think ur cool

Thats fucking great. I think youre a dickwad.

Dirty F 12-12-2006 02:40 PM

Quote:

Originally Posted by who (Post 11514714)
It's debatable but below 60% seems to be where most people say it becomes 'unique' in google's eyes.

I see. Thanks.

Agent 488 12-12-2006 02:45 PM

http://en.wikipedia.org/wiki/W-shingling

micker 12-12-2006 03:14 PM

Quote:

Originally Posted by markov
are being warned by police to stay off the bodies of two more women were discovered today. England Prostitutes in the bodies of two more women were discovered today.

Do what Markov says...

JD 12-12-2006 03:17 PM

i'd say about half needs to be unique

scottybuzz 12-12-2006 03:30 PM

Quote:

Originally Posted by Franck (Post 11514716)
Thats fucking great. I think youre a dickwad.

:1orglaugh :1orglaugh :1orglaugh :1orglaugh :1orglaugh

may i stick my wad up your dick?

borked 12-12-2006 03:33 PM

Quote:

Originally Posted by Franck (Post 11514653)
Lets take this text for example. Its from a new sites.

England Prostitutes in the red light district of Ipswich, England, are being warned by police to stay off the streets, after the bodies of two more women were discovered today.

If i copied that and put it on my blog Googlebot will find out and kick my ass for it right?

What if i changed it like this:

England Prostitutes in the red light district of Ipswich, England, are being warned by the police to stay off the streets. This happened after the bodies of two more women were discovered today.

Is this enough to create a unique text in Googles eyes?


I reckon Googlebot is smarter than your signature for one.

As for your example, then yes, the particular example you gave would pass as you split a compound sentence into two. Googlebot works off summarising content.

As it's a bot, it doesn't know what the fuck the content is, just how it's formed. As such, it can only summarise sentences, and if the summary comes out the same then the sentence is considered the same. As it's two sentences, then it passes.

eg a summarise to 1% of the intital phrase is identical - the sentence is a perfectly formed compound sentence which cannot be said simpler in 1 sentence.

Your new example, can be simply summarised as "England Prostitutes in the red light district of Ipswich, England, are being warned by the police to stay off the streets." since the fact that follows is irrelevent.

Maybe GB even strips all adjectives from a sentence before analysing it, in that case your first new sentence would have problems....


If this shit is important to you, you should really buy a mac - it's built in....
or write your own content...

borked 12-12-2006 03:40 PM

even to summarise my post to a lax setting wouldn't pass:

summarised as:

As for your example, then yes, the particular example you gave would pass, as you split a compound sentence into two.... As such, it can only summarise sentences, and if the summary comes out the same then the sentence is considered the same.... eg To summarise to 1% of the intital phrase is identical - the sentence is a perfectly formed compound sentence which cannot be said simpler in 1 sentence. Your new example, can be simply summarised as "England Prostitutes in the red light district of Ipswich, England, are being warned by the police to stay off the streets."... Maybe Googlebot even strips all adjectives from a sentence before analysing it, in that case your first new sentence would have problems.

woj 12-12-2006 03:41 PM

like others have said, you need to change way more than just adding 2 extra words...

edgeprod 12-12-2006 03:47 PM

It's amazing to think that Google can compare EVERY sentence versus every OTHER sentence it has indexed, and then decide who the "legitimate" owners are.

Most of what is "common knowledge" of what Google "does" is likely just FUD: fear, uncertainty, and doubt.

Look at Google ITSELF, for example. They run http://news.google.com right? Isn't that just RSS feeds? Isn't it duplicate content?

Look at Autoblogger sites as another example. Clearly, customers rank highly in SERPs, clearly Bliggo betas have kicked ass and taken names, and clearly, the majority of content from THOSE sites is pulled from feeds.

It just doesn't add up, in my opinion. Your results may vary, but mine have been very good using what others complain may be "duplicate content."

Hope that helps.

borked 12-12-2006 03:56 PM

Quote:

Originally Posted by edgeprod (Post 11515119)
It's amazing to think that Google can compare EVERY sentence versus every OTHER sentence it has indexed, and then decide who the "legitimate" owners are.

Most of what is "common knowledge" of what Google "does" is likely just FUD: fear, uncertainty, and doubt.

Which is what I was pointing to - I don't care how big a dairy farm Google is running - it's absolutely NOT possible to run sentence checks against an index beofre a site gets entered into the index. Far far too much computing time.

Summarise and then check and flag up for further analysis later if needs be yes, but not at all sentence by sentence cross-check.

loreen 12-12-2006 04:02 PM

Great information here.
Thanks :)

edgeprod 12-12-2006 04:13 PM

Quote:

Originally Posted by borked (Post 11515184)
Which is what I was pointing to - I don't care how big a dairy farm Google is running - it's absolutely NOT possible to run sentence checks against an index beofre a site gets entered into the index. Far far too much computing time.

Summarise and then check and flag up for further analysis later if needs be yes, but not at all sentence by sentence cross-check.

For sure. Google puts out a lot of information (and disinformation) that people latch on to and parrot over and over.

If they scare you into thinking they're doing something, you probably aren't going to do it. This allows them to get the "effect" without having to code the "cause."

I think they DO check for duplicates, on some level, probably on the same IP or block of IPs. It wouldn't be, for example, advantageous, to create a bunch of clones of the same site on your server. This, also, is speculation, based on nothing more than what Google has said, true or not.

fallenmuffin 12-12-2006 04:21 PM

There is a percentage but not sure what it is.. its probably on a sliding scale. If Google hurt sites that had news feeds that would just be in poor taste. However most bloggers/writers when asked to copy their work require a 15% change in text.

Lazonby 12-12-2006 04:35 PM

Quote:

Originally Posted by Franck (Post 11514653)
Lets take this text for example. Its from a new sites.

England Prostitutes in the red light district of Ipswich, England, are being warned by police to stay off the streets, after the bodies of two more women were discovered today.

If i copied that and put it on my blog Googlebot will find out and kick my ass for it right?

What if i changed it like this:

England Prostitutes in the red light district of Ipswich, England, are being warned by the police to stay off the streets. This happened after the bodies of two more women were discovered today.

Is this enough to create a unique text in Googles eyes?

I think Google is the least of your worries in this case.

Take a look at the small print of the site you are taking the text from and you'll see that it's a copyright issue to alter the text in any way shape or form. Keep altering the text of a BBC/ AP/ [insert wire service here] report and you'll find your host pulls your site pretty quickly rather than face a suit.

Don't worry about altering the text - as far as google is concerned, your serp will be based on RELEVANCY rather than originality. Copy the text word for word, but don't quote the entire article. Instead, copy part of it and link the rest in a new window.

Dirty F 12-12-2006 04:48 PM

Great thread threadstarter!

StarkReality 12-12-2006 05:05 PM

Quote:

Originally Posted by scottybuzz (Post 11514705)
I think if you copied that from somewhere else and popped it on your blog and changed those 2 things, I think googlebot would know and give you lower serp than the original peice of text.

i think this

"England Prostitutes in the red light district of Ipswich, England, are being warned by " is long enough for it to see cut and paste.

Google doesn't mind identical sentences, especially when on different domains, even identical paragraphes of text aren't a problem from my observation.
The only way for a bot to detect duplicate content is creating some sort of hash(es) for a page, everything else would be way too complicated. So, pages done by a doorway generator or blogs only fed by one sponsor feed will certainly suffer from a DC penalty, but in general DC seems to be a real hype and everybody is blaming DC for worse rankings.

Links to the wrong sites, massive direct recip linking, too much crosslinking of your own sites and huge increases of backlinks in a short time hurt alot more than any DC will ever do !

StarkReality 12-12-2006 05:10 PM

I may add: Always check for NPH proxies targetting you, they can shoot you from google very fast...my worst concern about DC

edgeprod 12-12-2006 05:17 PM

Quote:

Originally Posted by StarkReality (Post 11515586)
in general DC seems to be a real hype and everybody is blaming DC for worse rankings

Seems to be, but I'm keeping an open mind either way.

ROBO2017 12-12-2006 07:03 PM

This conversation is terrible. The original question is nonsense. You’re worried about two sentences.

ROBO

Dirty F 12-12-2006 08:15 PM

Quote:

Originally Posted by ROBO2017 (Post 11516277)
This conversation is terrible. The original question is nonsense. You?re worried about two sentences.

ROBO

BS. Those 2 sentences were just examples. I couldve written 50 sentences and it would still be same. If you dont understand that then theres something wrong with you and not this thread.

chupachups 12-12-2006 08:29 PM

Markov Chains

edgeprod 12-12-2006 11:06 PM

Quote:

Originally Posted by chupachups (Post 11516626)

Yeah, the beta testers for the next version of Autoblogger Pro Stand Alone are playing with the Markov Engine we wrote into the software.

It's interesting stuff for sure.


All times are GMT -7. The time now is 02:05 AM.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123