Stream: Mirror: Isabelle Development Mailing List

Topic: isabelle-dev.sketis.net and isabelle.sketis.net down for ...


view this post on Zulip Email Gateway (Mar 26 2025 at 18:55):

From: Makarius <makarius@sketis.net>
My Websites with Isabelle development resources are presently down for
maintenance. I need to figure out how to get rid of dumb "AI" bots --- This is
becoming a pest.

The Caddy webserver has some add-ons specifically for that. Lets see if/how
this works ...

Makarius

view this post on Zulip Email Gateway (Mar 26 2025 at 20:56):

From: Makarius <makarius@sketis.net>
On 26/03/2025 19:55, Makarius wrote:

My Websites with Isabelle development resources are presently down for
maintenance. I need to figure out how to get rid of dumb "AI" bots --- This is
becoming a pest.

The Caddy webserver has some add-ons specifically for that. Lets see if/how
this works ...

The websites are back for now. The bombardment is down to approx. 20% CPU load.

I've had initial success with
https://github.com/JasonLovesDoggo/caddy-defender/tree/ranges-file and its
bundled address ranges
https://github.com/JasonLovesDoggo/caddy-defender/blob/ranges-file/ranges/data/generated.go
--- after some tinkering with the /etc/caddy/Caddyfile.

It needs more research into fine points. E.g. I've had problems including
"aliyun" address space for Alibaba Cloud.

Conclusions so far:

* The Caddy webserver is very easy to extend and rebuild into a single
executable, using the "xcaddy" tool. The magic command line is:

xcaddy build --with github.com/jasonlovesdoggo/caddy-defender --with
github.com/mholt/caddy-ratelimit

* This is my second encounter to the Go ecosystem, and it looks fairly nice
so far.

* There is a major war out there: dumb machines against smart humans.

Makarius

view this post on Zulip Email Gateway (Mar 26 2025 at 21:38):

From: Makarius <makarius@sketis.net>
On 26/03/2025 21:56, Makarius wrote:

* There is a major war out there: dumb machines against smart humans.

That is a war, but more war does not present a solution. Here is an article
including "nuclear responses":
https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies

Right now the very simple measures of caddy-defender are sufficient to keep
the "AI-noise" the webserver < 10-20% CPU time.

Makarius

view this post on Zulip Email Gateway (Mar 26 2025 at 21:58):

From: Makarius <makarius@sketis.net>
Here is another guy under fierce attack the "global AI gang":

https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html

"""
Please stop legitimizing LLMs or AI image generators or GitHub Copilot or any
of this garbage. I am begging you to stop using them, stop talking about them,
stop making new ones, just stop. If blasting CO2 into the air and ruining all
of our freshwater and traumatizing cheap laborers and making every sysadmin
you know miserable and ripping off code and books and art at scale and ruining
our fucking democracy isn’t enough for you to leave this shit alone, what is?

If you personally work on developing LLMs et al, know this: I will never work
with you again, and I will remember which side you picked when the bubble bursts.
"""

This guy is running https://sourcehut.org

Speaking of "the bubble bursts": Some days ago, I've had a discussion with a
non-tech person about the "AI hype". When she said "that AI is here to stay
for decades" I spontaneously answered "it is so hot already, that it might
have burnt down within 2 years".

Anyway, the isabelle-dev website looks good for the moment ...

Makarius

view this post on Zulip Email Gateway (Mar 27 2025 at 13:36):

From: Makarius <makarius@sketis.net>
There have been a couple of private notes, hints, and recommendations. This is
my current understanding of possible approaches:

(1) Block particular User-agent names from the other side, e.g. this list
https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.txt --- not
via robots.txt (which is often ignored), but hardwired into the Webserver.

Weakness: AI bots often don't identify themselves properly.

(2) Block particular address ranges, e.g. according to this list
https://github.com/JasonLovesDoggo/caddy-defender/blob/main/ranges/data/generated.go

Weakness: this is neither correct nor complete, i.e. legitimate users could
use a cloud address, bad addresses could be missing. Maybe it is better not to
block completely, but to reduce the bandwidth for such addresses.

(3) Aggressive counterstrikes, e.g. via proof-of-work imposed on the other
side, e.g. see https://anubis.techaro.lol

Weakness: answering war by more war is not a winning strategy.

Makarius

view this post on Zulip Email Gateway (Mar 27 2025 at 13:40):

From: Tobias Nipkow <nipkow@in.tum.de>
Thank you for bringing all this up. This must also be the reason why some of our
servers (AFP, ...) are sometimes unreachable. I knew that it has to do with
crawlers, but I was unaware of the global nature of the problem and that AI bots
are causing it.

Tobias

On 27/03/2025 14:34, Makarius wrote:

There have been a couple of private notes, hints, and recommendations. This is
my current understanding of possible approaches:

(1) Block particular User-agent names from the other side, e.g. this list
https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.txt --- not via
robots.txt (which is often ignored), but hardwired into the Webserver.

Weakness: AI bots often don't identify themselves properly.

(2) Block particular address ranges, e.g. according to this list https://
github.com/JasonLovesDoggo/caddy-defender/blob/main/ranges/data/generated.go

Weakness: this is neither correct nor complete, i.e. legitimate users could
use a cloud address, bad addresses could be missing. Maybe it is better not to
block completely, but to reduce the bandwidth for such addresses.

(3) Aggressive counterstrikes, e.g. via proof-of-work imposed on the other
side, e.g. see https://anubis.techaro.lol

Weakness: answering war by more war is not a winning strategy.

Makarius

smime.p7s

view this post on Zulip Email Gateway (Mar 27 2025 at 23:33):

From: 伊藤洋介 <glacier345@gmail.com>
Thanks, Makarius.
I didn't know that AI causes such serious problems...

--
Yosuke ITO
伊藤 洋介
glacier345@gmail.com

2025年3月27日(木) 23:14 Makarius <makarius@sketis.net>:

There have been a couple of private notes, hints, and recommendations.
This is
my current understanding of possible approaches:

(1) Block particular User-agent names from the other side, e.g. this
list
https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.txt ---
not
via robots.txt (which is often ignored), but hardwired into the Webserver.

Weakness: AI bots often don't identify themselves properly.

(2) Block particular address ranges, e.g. according to this list

https://github.com/JasonLovesDoggo/caddy-defender/blob/main/ranges/data/generated.go

Weakness: this is neither correct nor complete, i.e. legitimate users
could
use a cloud address, bad addresses could be missing. Maybe it is better
not to
block completely, but to reduce the bandwidth for such addresses.

(3) Aggressive counterstrikes, e.g. via proof-of-work imposed on the
other
side, e.g. see https://anubis.techaro.lol

Weakness: answering war by more war is not a winning strategy.

Makarius


Last updated: Apr 19 2025 at 01:36 UTC