GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)
-   Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)
-   -   Help! My site is dying because of this Content Scrapers! (https://gfy.com/showthread.php?t=1293621)

mightyjoe 02-03-2018 10:58 PM

Help! My site is dying because of this Content Scrapers!
 
They are the one who ranked now on google for my keywords. Its really makes me cry seeing my post titles, tags and rss feeds linking to different sites (.pl, .tk, .ru .cz)

Please help me blocked these fuckers. Some of them using Cloudflares so impossible to get their real IPs.

Can I blocked them by their domain names? How? Please help...

Spunky 02-04-2018 12:46 AM

Quote:

Originally Posted by mightyjoe (Post 22192323)
Its really makes me cry seeing my post titles, tags and rss feeds linking to different sites (.pl, .tk, .ru .cz)

Please help me blocked these fuckers. Some of them using Cloudflares so impossible to get their real IPs.

Can I blocked them by their domain names? How? Please help...

Stop crying and no it will never work

mightyjoe 02-04-2018 12:48 AM

Quote:

Originally Posted by Spunky (Post 22192351)
Stop crying and no it will never work

Sorry dude, can't stop. Its the only I make for living and now I'm broke as hell :(:(:(

Can't my managed hosting do something?

bigPunk 02-04-2018 12:50 AM

I had and still have this issues with some of my sites... there are ZERO things we can do about that.

Konda 02-04-2018 02:26 AM

https://en.wikipedia.org/wiki/Canonical_link_element

Paul Markham 02-04-2018 04:06 AM

Unless your content is so different and amazing, even if you stopped them stealing from your site. Customers prefer free.

Klen 02-04-2018 04:30 AM

Pro tip for everyvone - block all countries to access your site which does not belong to tier one countries, and that should drasticly reduce chance of being hacked/scrapped/whatever. If you want to target a specific non tier 1 market, better to make specific site on local language .

And to answer OP - best would be to identify from which ip or geos scrappers are accessing, or to identify patterns which they use to get your content. I can go into more details if needed.

Klen 02-04-2018 04:31 AM

Quote:

Originally Posted by bigPunk (Post 22192359)
I had and still have this issues with some of my sites... there are ZERO things we can do about that.

Are you saying ... how that is ... IMPOSSIBLE ? :1orglaugh

JohnnyNight 02-04-2018 04:36 AM

Quote:

Originally Posted by KlenTelaris (Post 22192475)
Pro tip for everyvone - block all countries to access your site which does not belong to tier one countries, and that should drasticly reduce chance of being hacked/scrapped/whatever. If you want to target a specific non tier 1 market, better to make specific site on local language .

And to answer OP - best would be to identify from which ip or geos scrappers are accessing, or to identify patterns which they use to get your content. I can go into more details if needed.

A smart man...

Thank you..

mightyjoe 02-04-2018 06:27 AM

Quote:

Originally Posted by KlenTelaris (Post 22192475)

And to answer OP - best would be to identify from which ip or geos scrappers are accessing, or to identify patterns which they use to get your content. I can go into more details if needed.

I can identify them by their domain name but some are using fucking cloudflare.

They get my content thru tags and feeds. But most of them directly link to my domain name.

mightyjoe 02-04-2018 06:28 AM

I can easily block them by using "deny from' htaccess but some are using damn cloudflare

AdultKing 02-04-2018 06:34 AM

Quote:

Originally Posted by mightyjoe (Post 22192553)
I can easily block them by using "deny from' htaccess but some are using damn cloudflare

They are not going through Cloudflare to scrape your site, only to host your scraped content.

k0nr4d 02-04-2018 06:36 AM

Quote:

Originally Posted by mightyjoe (Post 22192323)
They are the one who ranked now on google for my keywords. Its really makes me cry seeing my post titles, tags and rss feeds linking to different sites (.pl, .tk, .ru .cz)

Please help me blocked these fuckers. Some of them using Cloudflares so impossible to get their real IPs.

Can I blocked them by their domain names? How? Please help...

I made a thread about this issue here with a solution https://gfy.com/fucking-around-and-pr...via-proxy.html
It's actually INCREDIBLY easy to do. I can clone your website with a few lines of code.

mightyjoe 02-04-2018 07:44 AM

Quote:

Originally Posted by k0nr4d (Post 22192557)
I made a thread about this issue here with a solution https://gfy.com/fucking-around-and-pr...via-proxy.html
It's actually INCREDIBLY easy to do. I can clone your website with a few lines of code.

Thank you. So the best solution is blocking all Cloudflare IPs?

JuicyBunny 02-04-2018 07:54 AM

Quote:

Originally Posted by mightyjoe (Post 22192609)
Thank you. So the best solution is blocking all Cloudflare IPs?

NO, you can easily find the real IPs from server logs or from software like wordfence if you run blogs.

Konrad's thread was such a big help to us.
Thanks again, K!

Klen 02-04-2018 07:54 AM

Quote:

Originally Posted by mightyjoe (Post 22192609)
Thank you. So the best solution is blocking all Cloudflare IPs?

As adultking pointed, not sure will that help anything as cloudflare is proxy host, scrapping ip's probably coming from actual server ip's.

sarettah 02-04-2018 07:58 AM

Quote:

Originally Posted by mightyjoe (Post 22192609)
Thank you. So the best solution is blocking all Cloudflare IPs?

That is not what it says in the thread that K0nr4d posted. He does say you could do that but points out you would also block yourself if you use cloudflare.

He posted a javascript solution in the first post:

Quote:

The only way I can think of bypassing this, is by doing javascript like this:

Code:
<script>
if(window.location.href.indexOf("yourd"+"omain.com ") < 0) {
window.location("http://yourd"+"omain.com");
}
</script>
And variations of that were discussed further in the thread.

.

thommy 02-04-2018 09:58 AM

Quote:

Originally Posted by k0nr4d (Post 22192557)
I made a thread about this issue here with a solution https://gfy.com/fucking-around-and-pr...via-proxy.html
It's actually INCREDIBLY easy to do. I can clone your website with a few lines of code.

but there is a way to prevent that - it just will not work with embed content.
but with own content - yes.

tfto 02-04-2018 10:09 AM

Quote:

Originally Posted by mightyjoe (Post 22192609)
Thank you. So the best solution is blocking all Cloudflare IPs?

Digital Ocean kept going through all my mainstream sites. I finally got tired of the shit and blocked the whole IP addy's at the server level.

thommy 02-04-2018 10:18 AM

Quote:

Originally Posted by Konda (Post 22192403)

you are on the right idea but a chanonical can also be copied and changed automatically.
but it can be done this way when a site streams from the own location.

thommy 02-04-2018 10:25 AM

Quote:

Originally Posted by tfto (Post 22192721)
Digital Ocean kept going through all my mainstream sites. I finally got tired of the shit and blocked the whole IP addy's at the server level.

seems you donīt like mobile users

tfto 02-04-2018 11:07 AM

Quote:

Originally Posted by thommy (Post 22192733)
seems you donīt like mobile users

This is definitely not mobile users. This scraper goes through every single page. The hits are within 3-5 seconds of each other, for about 15 mns a night. No mobile device can hit a page every 3-5 seconds, for about 15 mins.. At 4am. We are talking over 150 pages on three of my sites.

mightyjoe 02-04-2018 05:07 PM

Quote:

Originally Posted by sarettah (Post 22192629)
That is not what it says in the thread that K0nr4d posted. He does say you could do that but points out you would also block yourself if you use cloudflare.

He posted a javascript solution in the first post:



And variations of that were discussed further in the thread.

.

Code:
<script>
if(window.location.href.indexOf("yourd"+"omain.com ") < 0) {
window.location("http://yourd"+"omain.com");
}
</script>


Can somebody help me how to put this script? It is on the header before the </head>?

What exactly is "yourd" and "omain.com "?

faperoni 02-04-2018 05:51 PM

Joe, replace "yourd" and "omain.com " with yourdomain but break it up with a + like i the example.

A good way to detect a scraper ip is to echo the user ip to each page. (for example as a html comment so it wont show up in your design) this way if you find scraped content you can inspect it and find and ban the ip. Or even better serve up bullshit content just for that ip... this will waste their time and hopefully leave you alone after a few tries.

Good luck!

JuicyBunny 02-04-2018 05:58 PM

Quote:

Originally Posted by tfto (Post 22192779)
This is definitely not mobile users. This scraper goes through every single page. The hits are within 3-5 seconds of each other, for about 15 mns a night. No mobile device can hit a page every 3-5 seconds, for about 15 mins.. At 4am. We are talking over 150 pages on three of my sites.

Same here. I wonder who is doing that. Way too fast for mobile users. They were hitting 200+ every couple of minutes
We blocked at site level with wordfence and through Cloudflare.

tfto 02-04-2018 08:59 PM

Quote:

Originally Posted by JuicyBunny (Post 22193059)
Same here. I wonder who is doing that. Way too fast for mobile users. They were hitting 200+ every couple of minutes
We blocked at site level with wordfence and through Cloudflare.

Digital Ocean are the ones hitting my sites. I see it in the Wordfence logs and the server logs. I made a bogus page and buried it in two of my sites. Then I blacklisted that page in Wordfence. When the scraper would hit that bogus page, it would automatically block DO's scrapers. Works great.

JuicyBunny 02-04-2018 09:24 PM

Quote:

Originally Posted by tfto (Post 22193171)
Digital Ocean are the ones hitting my sites. I see it in the Wordfence logs and the server logs. I made a bogus page and buried it in two of my sites. Then I blacklisted that page in Wordfence. When the scraper would hit that bogus page, it would automatically block DO's scrapers. Works great.

Thats a smart solution. I'm going to look into adding that. We're also having issues with some assholes over at OVH. Looks like hacked servers/sites so who knows the origin but still blocking them as they come through. What a pain in the dick.

tfto 02-04-2018 09:39 PM

Quote:

Originally Posted by JuicyBunny (Post 22193191)
Thats a smart solution. I'm going to look into adding that. We're also having issues with some assholes over at OVH. Looks like hacked servers/sites so who knows the origin but still blocking them as they come through. What a pain in the dick.

I wish I could find a way to block those fuckin Chinese Bot Nets. Those mother fuckers are relentless and rarely use the same IP.

Bladewire 02-04-2018 09:45 PM


INever 02-04-2018 09:47 PM

Just block China. They NEVER pay/buy.

tfto 02-04-2018 09:50 PM

Quote:

Originally Posted by Bladewire (Post 22193221)


That is the funniest fucking thing I have read on here ever. God Damn you are a fuckin moron!. And BTW you fuckin moron. We posted 3 hours apart. Your whole theory is blown to shit. Like your brains.

onwebcam 02-04-2018 10:03 PM

Quote:

Originally Posted by Bladewire (Post 22193221)

Looks like you finally got'em Blade :1orglaugh:1orglaugh:1orglaugh:1orglaugh Congrats on turning an actual business thread into an idiot savant thread.

tfto 02-04-2018 10:09 PM

Quote:

Originally Posted by onwebcam (Post 22193235)
Looks like you finally got'em Blade :1orglaugh:1orglaugh:1orglaugh:1orglaugh Congrats on turning an actual business thread into an idiot savant thread.

Bladelier is such a fuckin idiot. He has sucked so much dick that he is brain damaged.

onwebcam 02-04-2018 10:11 PM

Quote:

Originally Posted by tfto (Post 22193239)
Bladelier is such a fuckin idiot. He has sucked so much dick that he is brain damaged.

:1orglaugh:1orglaugh:1orglaugh You see this is why I come here. Best entertainment on the net. :1orglaugh:1orglaugh

tfto 02-04-2018 10:15 PM

Quote:

Originally Posted by onwebcam (Post 22193241)
:1orglaugh:1orglaugh:1orglaugh You see this is why I come here. Best entertainment on the net. :1orglaugh:1orglaugh

The stupidity of a few... Bladelier makes someone special seem normal. At some point, he is going to claim that I am YOUR fake nic...

onwebcam 02-04-2018 10:18 PM

Quote:

Originally Posted by tfto (Post 22193247)
The stupidity of a few... Bladelier makes someone special seem normal. At some point, he is going to claim that I am YOUR fake nic...

Pretty sure it's already been done.. BTW I saw your pre-edited post we will just keep that our little secret :1orglaugh

tfto 02-04-2018 10:20 PM

Quote:

Originally Posted by onwebcam (Post 22193251)
pretty sure it's already been done.. Btw i saw your pre-edited post we will just keep that our little secret :1orglaugh

oh. Ok...

onwebcam 02-04-2018 11:08 PM

Quote:

Originally Posted by Bladewire (Post 22193221)

FIXED :1orglaugh:thumbsup:1orglaugh

Paul Markham 02-05-2018 12:47 AM

In 2018 there is so much porn on Tubes that have been put up legitimately, competing with it is tough. Most men jerk off for 20 minutes and that's all they need. Tubes satisfy that need.

Setting a download limit is the only option. I can go to so many sites and pay $30 for all their content. Why bother being a member for more than a month? This was inevitable when B/W got so cheap and accessible. Net Neutrality will change that.

mightyjoe 02-05-2018 01:25 AM

Quote:

Originally Posted by faperoni (Post 22193055)
Joe, replace "yourd" and "omain.com " with yourdomain but break it up with a + like i the example.

A good way to detect a scraper ip is to echo the user ip to each page. (for example as a html comment so it wont show up in your design) this way if you find scraped content you can inspect it and find and ban the ip. Or even better serve up bullshit content just for that ip... this will waste their time and hopefully leave you alone after a few tries.

Good luck!

i put it the exact code only changed the domain thing and put it on my header.php. hope it works


All times are GMT -7. The time now is 11:42 AM.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123