minus-squarepunrca@piefed.worldtoSelfhosted@lemmy.world•Based on this graph, and this graph alone, guess at what time I completely blocked OpenAI crawlerslinkfedilinkEnglisharrow-up30arrow-down3·15 days agoIt’s best to use either Cloudflare (best IMO) or Anubis. If you don’t want any AI bots, then you can setup Anubis (open source; requires JavaScript to be enabled by the end user): https://github.com/TecharoHQ/anubis Cloudflare automatically setups robots.txt file to block “AI crawlers” (but you can setup to allow “AI search” for better SEO). Eg: https://blog.cloudflare.com/control-content-use-for-ai-training/#putting-up-a-guardrail-with-cloudflares-managed-robots-txt Cloudflare also has an option of “AI labyrinth” to serve maze of fake data to AI bots who don’t respect robots.txt file. linkfedilink
punrca@piefed.world to Web Development@programming.devEnglish · 19 days agoIntroducing pay per crawl: Enabling content owners to charge AI crawlers for accessplus-squareblog.cloudflare.comexternal-linkmessage-square7linkfedilinkarrow-up150arrow-down12
arrow-up148arrow-down1external-linkIntroducing pay per crawl: Enabling content owners to charge AI crawlers for accessplus-squareblog.cloudflare.compunrca@piefed.world to Web Development@programming.devEnglish · 19 days agomessage-square7linkfedilink
It’s best to use either Cloudflare (best IMO) or Anubis.
If you don’t want any AI bots, then you can setup Anubis (open source; requires JavaScript to be enabled by the end user): https://github.com/TecharoHQ/anubis
Cloudflare automatically setups robots.txt file to block “AI crawlers” (but you can setup to allow “AI search” for better SEO). Eg: https://blog.cloudflare.com/control-content-use-for-ai-training/#putting-up-a-guardrail-with-cloudflares-managed-robots-txt
Cloudflare also has an option of “AI labyrinth” to serve maze of fake data to AI bots who don’t respect robots.txt file.