![]() |
Website cloning via proxy
The topic of website cloning has been coming up lately. I've figured out how people are doing it. This is in regards to the complete clone domains, not sites that have your site mirrored in a directory somewhere. There might be more ways to do this, this is what i've encountered though.
There are two variants, first via cloudflare: 1) Your website is added by someone to cloudflare, presumably as a cname - or possibly cnaming a domain that's not on cloudflare that has your domain as a cname. 2) They add the "Add Content" or Add HTML" App within cloudflare, which allows them to append content to your html on all pages. 3) There is another app somewhere in cloudflare that lets you replace words. So in this case, they were replacing clientsdomain.com with clonedomain.com. This really did my head in - $_SERVER['HTTP_HOST'] was returning clonedomain.com. In reality, the script was returning clientsdomain.com and replacing the word "clientsdomain.com" with "clonedomain.com". Literally, you could "clientsdomain.com into a text file, request it via the other domain and it outputted as "clonedomain.com" 4) The replacing of the domain I cannot figure out yet. I don't know how this is done, I can't find any cloudflare app that lets me do this but I suspect it's maybe done using JS somehow in that Add HTML App. the second method is via nginx/varnish and THEN cloudflare in which case they do the find/replace and content adding via their server and then pass it along to cloudflare. In this case it may be possible to honeypot the proxy server by placing a new file, hitting it via the proxy domain and seeing what ip comes up in your server logs. The only way I can think of bypassing this, is by doing javascript like this: Code:
<script> 103.21.244.0/22 103.22.200.0/22 103.31.4.0/22 104.16.0.0/12 108.162.192.0/18 131.0.72.0/22 141.101.64.0/18 162.158.0.0/15 172.64.0.0/13 173.245.48.0/20 188.114.96.0/20 190.93.240.0/20 197.234.240.0/22 198.41.128.0/17 Hope this helps someone :thumbsup |
Very nice explanation and most impressed by your level of knowledge to be able to figure this way out! Thanks for posting this Konrad! :thumbsup
|
Good post K0nr4d, excellent break down of what it is.
Thanks . |
cool, thanx :thumbsup
|
Thats what I was thinking their method was 'like'
Blocking Cloudflare IPs on servers EXCEPT where specifically needed is a good idea. SAMEORIGIN would not help in this scenario. Cloudflare needs to police their clients better -- copyright infringement and fraud most likely are Cloudflare TOS violations. Thanks for looking into this Konrad. BTW a 302 -301 domain redirection will work -- maybe even a page redirection -- this is working for lifeselector -- planed or not. |
Quote:
|
Quote:
. |
Quote:
. |
Do that lifeselector link on that page and look at the headers. Fluke maybe but it goes to a password protected page
The clone renders a blank page for affiliates.lifeselector and the password page is on another subdomain name 'assist'. The wrong CNAME maybe? The problem is scumbags using Cloudlare for bad purposes. If Cloudflare does not clean up their act the USDOJ will eventually. I am not asserting any complicity on Cloudflare's part but if they abandoned their free service and made it a 30 trial with a low cost for small users -- then the scumbags would not use it because payment data is traceable and can be subpoenaed. |
Quote:
d+omain do+main dom+ain dom+a+in and so forth to prevent someone doing find/replace on the JS itself too. |
In this case the framing is not malicious notwithstanding the SEO imprecations.
so use an obfuscated canonical tag rel=canonical to identify were the REAL site/domain is Every site stealing SERPs I checked for xlovecam was using Cloudflare to obscure their host server. Every 'claimed' free password site or free tokens or credits site/domain was trying to hide behind Cloudflare. If it was to big of a problem, and worth the expense, we could sue the domain owners and subpoena Cloudflare. I have absolutely no problem with servers using Cloudflare for its intended purpose -- mitigating ddos attacks and security filtering. |
Thanks Konrad for sharing your information.
It's probably one of the biggest problems at the moment. I disavow them, but they must obviously still benefit from google and/or harm original site. |
Quote:
|
Pretty much everyone agrees now that search engines CAN read JavaScript.
Code:
<script language="javascript"> echo or cat (<input>)|sed 's/\n//g' then .... Code:
echo '<script>if(window.location.href.indexOf("yourd"+"omain.com") < 0) { window.location("http://yourd"+"omain.com"); }</script>' Sure as fuck won't hurt :2 cents: to try rel=canonical . |
Excellent info thanks!
|
Nice info for sure...
On the defensive of your site: You can do a lot with htaccess. FI: I use htaccess to stop hotlinking, put a redirect to a goatsie in it... Barry, as promised on icq: to "help" with scriptkiddies with scanning, i came across this: !you can ZIP bomb a vuln scanner! ZIP compression is really good with repetitive data so if you have a really huge text file which consists of repetitive data like all zeroes, it will compress it really good. It can compress a 4.5 peta byte (4.500.000 giga bytes) file down to 42 kilo bytes. When a browser/scanner extracts or decompresses the content it will most likely run out of disk space or RAM. So firstly create a 10 giga byte GZIP file, or bigger, filled with zeroes. And secondly a PHP script that will deliver it to a client. Code:
<?php |
Code:
root@ds12-ams-2gb:~# whois odir.us Complain to the registry to yank his ticket. usTLD Nexus Requirements Policy for Registrants| About.US - About.US Quote:
|
All hail k0nr4d :bowdown ...
|
Anyone knows how to check if your site is cloned ?
|
Nice info k0nr4d!
|
Bump for a great thread ...
|
Quote:
|
Have a very similar issue.
This is the offender's site: (google cache version) https://webcache.googleusercontent.c...&ct=clnk&gl=uk This is mine: https://www.projectvoyeur.com He has cloned over 100K pages on my site and counting. I contacted cloudflare who responded with their party line about not being the host so I was dead in the water. Previously, when contacting google (DMCA), they have taken action on sites like this. This time however, they did pretty much nothing. The fact that this person is cloning my site and then serving cloaked pages (which is why I showed the cached version above), didn't seem to bother the folk at Google at all. So I kinda figured I was stuck with this. |
Google is too dumb to understand their cloaking. DMCA sometimes work but not always.
Can you DMCA an entire domain? Does anybody have a good standardized message to send to google that works? Has anybody come up with a rock solid way to stop them? I've already implemented banning all cloudflare IPs. One way that does work if you DMCA cloudflare they will cough up the origin host. Then if you DMCA the origin host it frequently gets shut down. It would be great if there was a pro-active way to just prevent it though. The JS canonical seems interesting. In addition to the clones I also see people scraping everyone's titles and throwing them up on a BS site which always redirects to a random tube or advertiser from google serps. |
Nice info!
|
Bump for business
. |
I think there are like 3 possible ways to fight them. Notify about what they are doing:
|
Go to the registry and hit them with the sledgehammer.
|
If you contact the domain registrar does that actually do anything? I've never tried DMCA'ing the registrar.
|
The REGISTRY not the Registrar -- shit flows downhill -- you have never worked in the corporate world?
You go right to the COB or the CEO's office -- shit happens fast when you pull a tiger's tail. Pay an Attorney to write the letter to the COB and send it Certified mail or by DHL (if overseas) -- the shit will hit the fans. |
Thes works
On the top of .htaccess files add this. "Header always append X-Frame-Options SAMEORIGIN" |
Or via a custom scraper made in PHP that uses a pool of rotating client proxies, random, periodically.
|
Craft made a point:
On the server you can set the X-Frame-Options header, which tells the web browser how to treat the page when it is framed. It is possible to set this header to DENY, which blocks all loading of the page via frames. By setting it to SAMEORIGIN you can relax the restrict and only allow framing by pages on the same domain. On the Apache webserver this directive is set like so (on Debian/Ubuntu servers this is /etc/apache2/apache2.conf): Code:
Header always append X-Frame-Options SAMEORIGIN Code:
add_header X-Frame-Options SAMEORIGIN; Unfortunately this header is only supported on more recent browsers Now for legacy browsers you will need to drop back to using a JavaScript framebusting code. It goes without saying however that this can be circumvented by a potential attacker through techniques such as double framing and exploiting cross site scripting filters in some browsers. Code:
if(top != self) { top.location = self.location; } Code:
<style id="antiClickjack">body{display:none !important;}</style> This should work... Still shit falls down quick, do as Barry said; Enough of this shit Quote:
|
Quote:
you can use .htaccess as the following Code:
RewriteEngine On I think when his page visitors see that, they won't be coming back.. :2 cents: |
Js script is ugly and sometimes hard to do (put Js on every page) its much better redirect 301 or serve special page for scammer IPs
Cloudflare IPS never Access your site so redirect based on Cloudflare id useless They use additional proxy and then sends to Cloudflare SO Just grabb proxy IP by puting on your page PHP script that display Client IP and configure webserver to serve special page for particulal ip |
Quote:
Can you or someone PM me the information regarding the registry? Not sure what you are talking about besides registrar. We have sent DMCAs Google, host and registrar. We have to file complaint against site cloning ours. Thanks :thumbsup Have tried K0nrads suggestions which were great but criminals seem to have found a work around even MojoHost cannot fix. |
I got it. Thanks! :thumbsup
|
What a great thread and a good read. Excellent digging Konrad! Great additions from others, too. GFY could be greater than MAGA with more useful threads like this.
Brad |
|
So many opinions in one thread, so what is the best solution?
|
You cannot stop crime or scammers -- that has been proven over and over historically. Every registry (where the registrar buys the names they sell you) has a AUP and TOS and will (or should ) accept certified mail or a FedEx / DHL overnight letter with a complaint at their PHYSICAL ADDRESS. ICANN says they need a physical address to accept mail at -- do the legwork yourself or hire a lawyer (or other qualified person) to do it for you. |
Quote:
We plan on using this method ALOT now because as you say scammers and criminals will not go away and it works. If you file the right complaint. |
What about putting some super unique piece of text on your site and using google search api to check if it exists on any other domain to check if anyone has done this to your site?
|
Quote:
|
I've had my sites heavily affected by this shit over the years.
You'll probably call me crazy but - http://www.clone-site.com - I wrote all that in a day. Will expand on it during the following days, and I'm actually going to clone a few sites on subdomains, just to demonstrate it all. Silly as it can be, but my goal is to get this shit recognized as an issue and to get google to provide some kind of "report tool" for affected sites. Yeah, I know - but I'm an optimist. :2 cents: |
I'm not technical like the rest of our team here.. so hopefully I don't mangle this-
The feedback I'm hearing internally here is that Java code is the best way to address it. While it may need to be addressed on a case by case basis, writing a piece of code like that or to break the Iframe has been the solution we have used for more difficult ones. A clever hacker can work around Java obfuscation, too, though. One of our techs says "I've broken down sucuri's java obfuscation with a simple PHP script and a system call to the `node` java interpreter." Sincerely, Brad |
Quote:
I learned a long time ago shit flows downhill FAST. |
Quote:
But: one site of mine did not react to the javascript. They had learnt to use the Code:
<iframe sandbox ...> |
Quote:
You and Barry should hook up and create a solution to kill these fuckers. Its obvious Google is less than receptive to helping. |
Quote:
Still, we need a solution for people who are scraping and downloading our content to their sites. So far, no cure for that except to complain to cloudflare and the registries. I'd like to block these guys before they create issues. |
All times are GMT -7. The time now is 03:12 AM. |
Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc