From: Makarius <makarius@sketis.net>
In the past few days we've had more attacks on our servers, presumably by dumb
AI-bots that cannot read robots.txt
This happens mainly in the late evening (Royal Bavarian Time).
Notable services that are compromised:
https://isabelle.sketis.net/components
https://isabelle.sketis.net/repos/isabelle
https://isabelle.sketis.net/repos/isabelle-release
I am presently busy to sort out genuine problems on Isabelle2025-1 release
candidates, and don't feel like participating in this non-sense out there. It
requires more research what can be done about it, to return to sanity.
Makarius
From: Makarius <makarius@sketis.net>
On 17/11/2025 00:04, Makarius wrote:
In the past few days we've had more attacks on our servers, presumably by dumb
AI-bots that cannot read robots.txtThis happens mainly in the late evening (Royal Bavarian Time).
In the past few days the situation had improved in the morning, but not today.
Notable services that are compromised:
https://isabelle.sketis.net/components
https://isabelle.sketis.net/repos/isabelle
https://isabelle.sketis.net/repos/isabelle-release
The problem are our hgweb servers (via Apache and its builtin Python engine),
this also applies to https://isabelle.in.tum.de/repos which is presently unusable.
We need to find a proper solution for hgweb + Apache specifically.
For now I have disabled https://isabelle.sketis.net/repos altogether. The main
Isabelle repositories are still available via https://isabelle-dev.sketis.net
--- but that is for members only: I had to close it some months ago due to
dumb AI-bots (Phorge is even more vunerable due to its use of PHP + MySQL).
It should be possible to work with isabelle-dev.sketis.net under program
control using SSH URLs, after suitable SSH public keys have been uploaded. I
will see how to change the important isabelle_cronjob (nightly builds)
accordingly.
In the mid 1990ies the Open Internet has emerged, where everybody could
participate easily with self-hosted servers. The AI War that started in
Jan-2025 has destroyed that --- but I am not going to participate in any war
of any kind.
Makarius
From: "Achim D. Brucker" <adbrucker@0x5f.org>
https://isabelle.sketis.net/components
https://isabelle.sketis.net/repos/isabelle
https://isabelle.sketis.net/repos/isabelle-release
The problem are our hgweb servers (via Apache and its builtin Python
engine), this also applies to https://isabelle.in.tum.de/repos which
is presently unusable.
We need to find a proper solution for hgweb + Apache specifically.
I would not call it a "proper" solution, but I am currently using Anubis
(https://anubis.techaro.lol/) with quite some success. Of course, it's
an arms race - computing the challenges set out by it are not that
expensive. Hence, when enough website use it, the crawlers will
implement the challenge solving part ...
Achim
From: Makarius <makarius@sketis.net>
On 17/11/2025 11:28, Achim D. Brucker wrote:
https://isabelle.sketis.net/components
https://isabelle.sketis.net/repos/isabelle
https://isabelle.sketis.net/repos/isabelle-release
The problem are our hgweb servers (via Apache and its builtin Python
engine), this also applies to https://isabelle.in.tum.de/repos which is
presently unusable.
We need to find a proper solution for hgweb + Apache specifically.I would not call it a "proper" solution, but I am currently using Anubis
(https://anubis.techaro.lol/) with quite some success. Of course, it's an arms
race - computing the challenges set out by it are not that expensive. Hence,
when enough website use it, the crawlers will implement the challenge solving
part ...
Anubis emerged early 2025 as a counter-attack, and I don't like it. An "arms
race" is war against war, and ultimately won't work.
There must be a proper solution. For us it means that our own programs (or
"daemons") can access the repository servers, without too much additional
complication.
I am presently thinking of SSH and maybe RSYNC, as well-known non-HTTP
protocols. There is also an rsync server that hardly anybody remembers now (we
actually have one to mirror the Isabelle website).
Makarius
From: "Achim D. Brucker" <adbrucker@0x5f.org>
On 17/11/2025 10:41, Makarius wrote:
On 17/11/2025 11:28, Achim D. Brucker wrote:
I would not call it a "proper" solution, but I am currently using
Anubis (https://anubis.techaro.lol/) with quite some success. Of
course, it's an arms race - computing the challenges set out by it
are not that expensive. Hence, when enough website use it, the
crawlers will implement the challenge solving part ...Anubis emerged early 2025 as a counter-attack, and I don't like it. An
"arms race" is war against war, and ultimately won't work.
I agree - but it does work at the moment as a relatively easy to install
workaround lowering the load on attacked servers a lot. And it allows to
provide web access for humans, for which I have not seen much better
solutions out there yet. Needing a solution quickly, I either had to
swallow the "Anubis" pill, allow access only to authorised users, or
shut it down completely. Anubis was (and likely still is) the better of
the three evils.
There must be a proper solution. For us it means that our own programs
(or "daemons") can access the repository servers, without too much
additional complication.I am presently thinking of SSH and maybe RSYNC, as well-known non-HTTP
protocols. There is also an rsync server that hardly anybody remembers
now (we actually have one to mirror the Isabelle website).
For downloading Isabelle components, rsync seems to be a viable option -
on my servers, I have not seen high crawler scraping loads for
"data/software" archives (tgz/zip, etc) nor for REST APIs. I am using
Anubis purely for the web interfaces designed for human consumption.
Again, this will likely change earlier than we all fear.
For pure programmatic access, any form of weak authentication should
work at the moment.
These AI crawlers are really a threat to the open web ...
Achim
From: Makarius <makarius@sketis.net>
On 17/11/2025 11:20, Makarius wrote:
We need to find a proper solution for hgweb + Apache specifically.
The Mercurial mailing list has this thread "HELP: Fighting AI scrapers":
https://lists.mercurial-scm.org/pipermail/mercurial/2025-September/106698.html
It is interesting to read, but also converges to Anubis, without any brilliant
ideas. Brilliant ideas is what we need, though. I probably need to study the
hgweb implementation.
Makarius
From: Makarius <makarius@sketis.net>
On 17/11/2025 19:32, Makarius wrote:
On 17/11/2025 11:20, Makarius wrote:
We need to find a proper solution for hgweb + Apache specifically.
The Mercurial mailing list has this thread "HELP: Fighting AI scrapers":
https://lists.mercurial-scm.org/pipermail/mercurial/2025-September/106698.htmlIt is interesting to read, but also converges to Anubis, without any brilliant
ideas. Brilliant ideas is what we need, though. I probably need to study the
hgweb implementation.
I've now invested 1h to study the hgweb implementation of
isabelle.sketis.net/repos, notably mercurial-6.1.4/mercurial/hgweb/hgweb_mod.py
The answer is rather plain and simple:
changeset: 83654:52cd371a36dd
tag: tip
user: wenzelm
date: Wed Nov 26 20:00:47 2025 +0100
files: Admin/Mercurial/mercurial-6.1.4-hgweb.patch
description:
adhoc patch for hgweb.wsgi: provide "hg clone" via HTTP without suffering from
Denial-of-Service attacks on website content (e.g. by Non-Intelligent Agents);
diff -r f152543f9e16 -r 52cd371a36dd Admin/Mercurial/mercurial-6.1.4-hgweb.patch
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/Admin/Mercurial/mercurial-6.1.4-hgweb.patch Wed Nov 26 20:00:47 2025 +0100
@@ -0,0 +1,16 @@
+diff -ru mercurial-6.1.4/mercurial/hgweb/hgweb_mod.py
mercurial-6.1.4-patched/mercurial/hgweb/hgweb_mod.py
+--- mercurial-6.1.4/mercurial/hgweb/hgweb_mod.py 2022-06-16
15:09:43.000000000 +0200
++++ mercurial-6.1.4-patched/mercurial/hgweb/hgweb_mod.py 2025-11-26
19:40:54.320858407 +0100
+@@ -371,6 +371,12 @@
return res.sendresponse()
++ else:
++ rctx.tmpl = rctx.templater(req)
++ res.status = b'500 Internal Server Error'
++ res.headers[b'Content-Type'] = rctx.tmpl.render(b'mimetype',
{b'encoding': encoding.encoding})
++ return rctx.sendtemplate(b'error', error=b'Usage: hg clone URL DIR')
++
+
In other words: there is no longer a website to browse (nor to attack), only
the "wireprotocol" of the hg client via HTTP.
Thus all-important automatic jobs with "hg clone
https://isabelle.sketis.net/repos/isabelle" should work again --- and perform
better than before, because we disregard the AI non-sense altogether.
The error page could be a bit nicer, saying more clearly what is wrong and
what needs to be done instead, but the main focus is still the Isabelle2025-1
release.
Makarius
From: Makarius <makarius@sketis.net>
On 26/11/2025 20:14, Makarius wrote:
The answer is rather plain and simple:
In other words: there is no longer a website to browse (nor to attack), only
the "wireprotocol" of the hg client via HTTP.
I should cite one of my sources for this approach:
https://www.fsf.org/bulletin/2025/spring/defending-savannah-from-ddos-attacks
"""
To all of the companies crawling the Internet: there is a better way! Do not
scan code repositories over the web: clone them using version control tools
such as git, cvs, svn, Mercurial, or bzr. Follow the rules set forth in the
robots.txt files.
"""
Such a friendly suggestion is not going to work, though. Instead, the policy
needs to be enforced, by not offering an open webspace in the first place.
The rules in robots.txt used to be the foundation of the Free Internet. It was
a nice time, but it is over now.
Instead we can eventually provide proper repository management and browsing as
part of the regular Prover IDE. Professional development works better on the
good old desktop, not on a web interface.
Makarius
Last updated: Dec 10 2025 at 12:50 UTC