Yes, this would essentially be a detecting mechanism for local instances. However, a network trained on all available federated data could still yield favorable results. You may just end up not needing IP Addresses and emails. Just upvotes / downvotes across a set of existing comments would even help.
The important point is figuring out all possible data you can extract and feed it to a “ML” black box. The black box can deal with things by itself.
https://www.oglaf.com/feeds/rss/