Wikipedia talk:AISIGNS

☆ Save On Wikipedia ↗

Moving examples to a subpage?

I noticed there are some example articles being moved to userspace to preserve them, as well as some being kept as drafts. I wonder if it would be beneficial to move them to a subpage of this (or another page) similar to how we have Wikipedia:List of hoaxes on Wikipedia containing hoax examples. The examples should also probably be fully protected as well. (changed my mind on that part) Assuming we are using the template on all of them, they are currently listed under Category:Pages that should not be speedily deleted as unreviewed LLM-generated content ASUKITE 18:19, 18 May 2026 (UTC)

Sounds reasonable to centralize them here, neutral about protection. fifteen thousand two hundred twenty four (talk) 18:26, 18 May 2026 (UTC)
We could have a list titled Wikipedia:List of deleted pages containing suspected AI-generated content (WP:AIPAGELIST) and deleted drafts could be restored as Wikipedia:List of deleted pages containing suspected AI-generated content/Draft:Example.
If we do start restoring such pages in this manner, I would suggest restoring Talk:Arthur Katalayi alongside those drafts because I thought it was funny how one or more users responded to accusations of LLM use. For reference, surviving examples of such drama can be found three different times in the RFPP archives (Caution: That page is extremely long and may take a moment to load and take you to the right section), once on this user talk page, and once in the administrators' noticeboard archives. – MrPersonHumanGuy (talk) 14:09, 19 May 2026 (UTC)
If we're making a subpage of examples, it might be worth making a subpage of exceptions-that-prove-the-rule -- Rapa fruit dove from the quiz, Nuclear close calls mentioned here, Julia's Kitchen Wisdom (which someone actually flagged after I brought it up), The Leopard (Reid novel) (this one got me, isolated parts of it sound so much like AI but are from 2008), etc. Gnomingstuff (talk) 17:14, 19 May 2026 (UTC)
This sounds like a good idea, especially since some drafts with {{G15-exempt}} on them have either been G13 deleted or modified since the tag was placed. SuperPianoMan9167 (talk) 18:25, 19 May 2026 (UTC)
I support this. I also think it would be beneficial to have examples for different LLMs. ChatGPT is probably over-represented in AI-generated Wikipedia pages, so people using other chatbots to create pages may fly under the radar. ~2026-16755-69 (talk) 12:23, 21 May 2026 (UTC)
Pinging @TheHistorian1337, @EatingCarBatteries, @Gurkubondinn, @Dr. Omokhudu Idogho, @Northernhenge, @Sohom Datta (all editors currently hosting user drafts of example pages): Any objections, thoughts on this idea? Perhaps suggestions as to where the examples should be hosted if we do move forward. Thanks! (Sorry for the mass ping, but I don't want to move stuff away from anybody's userspace without consulting them). ASUKITE 00:56, 27 May 2026 (UTC)
@Sophisticatedevening, @SuperPianoMan9167 you're already here.. ASUKITE 00:57, 27 May 2026 (UTC)
For me personally I used the subpage I had for developing and testing User:Sophisticatedevening/AI Cleaner.js (which is probably moot now with AI advancing fast enough that the issues it was made to resolve aren't typically outputed anymore), not necessarily for any preservation reasons. If I'm understanding this proposal right and it's suggesting that the example subpages be fully protected then I would object to that because that would defeat the purpose of my subpage and I and other non-admins wouldn't be able to edit it to test it and their own scripts anymore. As far as the whole idea of an AI version of WP:HOAXLIST I can't say I see a huge benefit in it because of the sheer quantity and ease of creating these articles floods to the point where they aren't really a novelty to look at in the same way one would a believable hoax that stayed for a long time. If the idea is just for a directory then sure I'd have no issue with that if people want to look for these, and I have no particular care if it is moved to projectspace for this so long as I can still edit and test things with it. Sophisticatedevening(talk) 01:41, 27 May 2026 (UTC)
My rationale for protecting was to preserve the examples as with our hoax pages, but to be honest, I don't see any need for that unless somebody routinely decides to vandalize them. If this fails, the category is a good enough directory on its own, and would save us the trouble of moving anything (I had been thinking I was overthinking this on that point alone, but then came back to see a few editors agreed with the idea) ASUKITE 02:07, 27 May 2026 (UTC)
If there's an interest in preserving these then by all means I have no issue with something like this, I just think the existing ones in the category should be checked closely to see why they were put there and then selectively moving them, or just have the editors consulted here about if they agree to it depending on their needs. People can also go hunting for new ones that aren't in that category or just in the wild and I don't there would be any objections to that. Sophisticatedevening(talk) 03:14, 27 May 2026 (UTC)
No, objections from my end. Sohom (talk) 01:20, 27 May 2026 (UTC)
@Asukite: no objections from my end either. I have actually been meaning to move these pages to be subpages of Wikipedia:Signs of AI writing/Examples or something for a while, but I haven't gotten around to it. I have probably 4-5 of these pages cluttering up my user space right now. --gurkubondinn 07:17, 27 May 2026 (UTC)
Zero objections to this. I only created my sub-page to preserve the example. Having a centralized place for this would be nice.
If someone decides to follow through with this, you have my permission to do whatever you want with the page User:EatingCarBatteries/Mojtaba Yadegari (AI). EatingCarBatteries (contribs | talk) 07:47, 30 May 2026 (UTC)

Draft:Ichafu and Draft:Paytra could also be restored in this manner. – MrPersonHumanGuy (talk) 11:12, 2 June 2026 (UTC)

Good point, they're both used as examples in AISIGNS so I've requested WP:REFUND § Draft:Ichafu and Draft:Paytra to get them restored. --gurkubondinn 11:30, 2 June 2026 (UTC)
I think we may be safe to move forward with this soon. I don't mind Wikipedia:Signs of AI writing/Examples as the holding page, but Wikipedia:List of deleted pages containing suspected AI-generated content also makes a good standalone option, if it's a bit longer. I can move the draft items this weekend, along with userspace pages belonging to anybody who agreed here, and just leave a talk message for the remaining items. (Excluding the one for Sophisticatedevening, of course, as that is there for testing purposes). I think at least moving the drafts makes sense as some of them might plausibly be actual drafts again one day, so I think the examples being present there might make that difficult. ASUKITE 22:45, 2 June 2026 (UTC)
 Done for all draft pages and those who agreed here. The only one I had an issue with is User:SuperPianoMan9167/Jacques Blois (linguist), for which there was also a draft by the same title, but has different content, so I've left that alone for now. I chose Wikipedia:Signs of AI writing/Examples for the time being as it is more concise, but I have no problem with moving them again if I have to. I've also redirected that to the category until we come up with a good base page format for that. I've also left redirects in place for all items, if anybody wants any userspace redirects deleted, either tag with {{db-u1}} or let me know here. I bypassed the links on the connected page in case any do get deleted, but I didn't check all of these for links from elsewhere and will review those case-by-case if deleting. ASUKITE 02:15, 8 June 2026 (UTC)

Is Grokipedia a good source for AI-generated content?

I'm not sure of the general consensus on using Grokipedia, but I've noticed a few issues from it.

  1. Grokipedia is written using Grok. Grok's market share is around 6%, which, while not insignificant, is significantly lower than Gemini and ChatGPT.
  2. Grok itself seems to update very slowly, and lags behind other chatbots. As such, it may display certain signs for months after ChatGPT and Gemini removed them. Also, Grokipedia does not update constantly, so even if Grok does update, many pages contain text from a long-outdated version of Grok.
  3. Grokipedia has no human review. While many AI-generated Wikipedia pages also probably have no review, there are definitely some that do, so Grokipedia pages are probably more obviously AI-generated than many AI-generated Wikipedia articles. Humanizers should also be accounted for.
  4. Grokipedia doesn't seem to use prompts to create articles, which could create discrepancies.

However, I do think Grok may generate more Wikipedia pages than its overall market share for multiple reasons. Grokipedia may have encouraged some people to submit AI-generated content to Wikipedia, and Grok also seems better at writing long content than ChatGPT or Gemini. It's also much easier to get content from Grokipedia than to fish for AI-generated drafts. I personally don't think it should be used. ~2026-32561-33 (talk) 16:09, 1 June 2026 (UTC)

Not RS, Wikipedia:Grokipedia. fifteen thousand two hundred twenty four (talk) 22:53, 1 June 2026 (UTC)
I know that it's not a reliable source for facts, but it could be a good source for finding LLM-generated pseudo-encyclopedic content. ~2026-32561-33 (talk) 13:47, 2 June 2026 (UTC)
It's a good example of a certain kind of Grok text; the main issue is that it almost certainly fairly detailed system prompt steering its output -- obviously all chatbots have this, but I suspect this one is specifically tailored to Grokipedia and not representative of people using Grok here (e.g. OKA translations). Based on a quick-and-dirty comparison User:Gnomingstuff/AI_experiment/Grokipedia:
  • The big one, syntactically: An obsession with (what it thinks is) "empirical", "causal", "evidence-based" stuff, These and similar words show up disproportionately in Grokipedia output, but I can't recall ever seeing that pattern here.
  • Obviously the big one is trying to be "anti-woke"; another disproportionately used word is left-leaning and unsurprisingly that is not disproportionately used by other LLMs.
  • A bias toward rebutting statements with opinions of unspecified "critics", usually with conservative views. Variations on critics, Critics argue, etc. also appear disproportionately. Notably, basically none of the usual AIATTR signs show up in the list, just "critics".
Also, Grokipedia is all but confirmed to something in the system prompt telling it to not cite Wikipedia, because a few early articles would flat-out leak it: "The race was competitive amid Paxton's ongoing securities fraud indictment, but he maintained support in Texas's Republican-leaning electorate. No, don't cite wiki. Remove that. Wait, rephrase: The election occurred during Paxton's facing of felony securities fraud charges from 2015, yet he prevailed narrowly in vote share." Comedy is now legal, etc. Gnomingstuff (talk) 18:31, 7 June 2026 (UTC)
I've noticed that Grok likes to correct itself in responses instead of just removing the old information it thinks is incorrect like almost every other LLM. This is probably to make it seem more "human" or "realistic", which it might do when being used as a chatbot, but it makes no sense to use in Grokipedia articles. The obsession with anti-woke stuff might be a side effect of prompting rather than Grok itself, as I've never noticed that Grok was too biased, at least not to the levels observed on Grokipedia. ~2026-32561-33 (talk) 12:07, 8 June 2026 (UTC)
the bug has gotten even worse, articles will now waffle about how the instructions prevent it from citing URLs
For Australia, Kent is historical, often cited as 70 from books or wiki, but since can't use wiki, perhaps omit if not authoritative URL. The instructions say "Never invent URLs. Add new citations immediately after the claim using [](url)." Gnomingstuff (talk) 12:55, 8 June 2026 (UTC)
it also leaks system prompts constantly, and now that google indexes it, we can find every instance of that.
  • "The instructions say "You are an agent that writes the various article sections for an encyclopedia entry on Joseph Kanuku."
  • "[Note: Britannica is encyclopedia, but instructions say no Britannica. Wait, replace with another.]"
  • "Wait, instructions say NEVER cite Wikipedia. Oops. Let me adjust."
  • "The instructions say prioritize peer reviewed, books, etc."
  • "Wait, but instructions say never cite social media, so omit that last part. Wait, adjust. Since Instagram is social media, omit that."
  • "Wait, instructions say avoid "References" as a section, but for completeness, I've omitted it from structure.)"
  • "The guidelines say "Include any of the following where relevant: - Factual details..." under Missing Information or Knowledge Gaps Examples. And in format, it's to list only critical issues or all missing info or knowledge gaps."
et cetera
absolute clown work all around Gnomingstuff (talk) 13:01, 8 June 2026 (UTC)
Grok tends to be quite far behind most other commercial LLMs. Maybe this sort of stuff should be added to WP:G15? It seems to be just as indicative as "Sure thing! Here is a Wikipedia article on X". Grokipedia also has no quality filter whatsoever. There might be one, but while most LLMs will just preface their generation with "here is a Wikipedia article on X", Grok randomly leaks its instructions and breaks character in the middle of articles.
The biggest question for this is probably whether Grok will even be around for too long. From what I can tell, Grok exists in a very specific niche compared to other LLMs: While Gemini, ChatGPT, and Claude are in an arms race for which one is better at vibe coding or assisting with corporate tasks or answering questions, Grok's appeal comes from it being mostly uncensored. While there are other uncensored LLMs, most require a subscription for using it more than a few times or can only be run locally, which requires an incredibly powerful computer.
While this is just speculation on my part, I don't think that much of Grok's 6% market share is making it onto Wikipedia, since most people using Grok probably aren't using it for stuff like that. Not saying that LLM market share is totally disproportionate to what AI-generated text is submitted to Wikipedia or that nobody will use Grok to generate a Wikipedia page, just that I don't think Grok is really being used for this very much, if at all. ~2026-32561-33 (talk) 16:44, 8 June 2026 (UTC)
One of the Grokipedia pages mentions knowledge cutoff of 2023, so yeah, it's fairly outdated.
I would suspect that Grok users and Wikipedia editors have a disproportionate amount of overlap, for the same reasons that Claude users probably do. Gnomingstuff (talk) 20:41, 8 June 2026 (UTC)

Denial of AI-use (or lack thereof)

I've updated WP:LETSFOCUSON to show AI's response to being challenged, but now I've done it I'm wondering whether it should have its own section & shortcut (like WP:AIDENIAL). What are everyone's thoughts on this? I'm 50/50 on it myself, hence the question. In solidarity, Blue-Sonnet (I'm listening) 05:19, 2 June 2026 (UTC)

Can you clarify what you mean? I'll explain how I understand the situation, as that may help people understand my lack of understanding. WP:LETSFOCUSON documents a particular response that deflects away from the issue of AI usage. It comes after WP:ORCC and WP:CEQ, two other "canned" responses which to my mind at least cover similar ground and are similar tactics. Do you mean to have all those canned deflective responses in a new section, or just WP:LETSFOCUSON? Or, going off how you titled this section and the suggested WP:AIDENIAL shortcut, do you mean to identify denial of use itself as a sign? OsFish (talk) 05:56, 2 June 2026 (UTC)
Mainly the latter (a new section) but it's included in other sections already so I'm not sure whether it'll be helpful or not to separate it out. There are pros and cons to both, so I was interested to see what others thought - this article has been cited in the media recently too, so I'm a bit hesitant to jump in and swap things around unless I'm absolutely sure it's an improvement. In solidarity, Blue-Sonnet (I'm listening) 14:48, 2 June 2026 (UTC)
I completely empathise with the idea, because it is very frustrating over and over seeing blatant LLM use defended with flat denial. However, the problem with having denial as a criterion is that monitoring for AI use becomes something of a witch-hunt: what is someone genuinely not using AI supposed to do if not deny it?
So maybe a focus on deflection as a tactic rather than denial? OsFish (talk) 08:15, 3 June 2026 (UTC)
That's definitely a better word and why I wasn't sure whether it should have a separate section. So something like WP:AIDEFLECT? But then we're circling back to WP:LETSFOCUSON territory, which is why I wasn't sure whether this should be separated out. In solidarity, Blue-Sonnet (I'm listening) 08:23, 3 June 2026 (UTC)
it is very frustrating over and over seeing blatant LLM use defended with flat denial
In my experience, flat denial is actually fairly uncommon. It's much, much more common to dodge the question of whether they used AI, to downplay it, or to wikilawyer it, which is what I was trying to get across with the WP:LLM? essay. Like with the examples:
  • This never gives a yes-or-no answer, it just says the "claims are baseless," which is not the same thing as "the claims are false."
  • This, same as above
  • This flat-out says it used AI for "expansion"
  • This dodges the question entirely
  • This is probably not fully AI-generated (it has typos, etc) -- it says it uses AI for editing which seems to be exactly what happened here
Gnomingstuff (talk) 22:48, 6 June 2026 (UTC)
That's right, when there's denial it's usually the actual editor who's finally come to the keyboard or they're not using it at all - either way, the actual AI doesn't usually refute it. In solidarity, Blue-Sonnet (I'm listening) 23:12, 6 June 2026 (UTC)
I've definitely gotten this kind of question dodging from actual people before Gnomingstuff (talk) 02:17, 7 June 2026 (UTC)
And there's the dilemma - how to tell if it's a person who writes like an AI, or a person who is using it. I don't want to add anything new that might lead to false accusations either, so I'm starting to think it might be best to leave it as-is? In solidarity, Blue-Sonnet (I'm listening) 02:31, 7 June 2026 (UTC)
This might be a case to include non-AI usage of the phrase "let's focus on". Humans use the phrase as well, but AI uses it in a very specific way.
Human examples:
  1. "I'm not really sure what that means, but let's focus on the issues. If you could please list the evidence you have offered, that would be a productive first step to resolving this disagreement." (from Talk:Jordan Lasker)
  2. "Please let's focus on content. We really are smarter together, not just in spite of but probably also because of our differences." (from Wikipedia talk:No Nazis)
  3. "No, Let's focus on this article rather than compare it to WP:OTHERSTUFF. Generally if someone is notable enough to be included in a list of notable people, they are notable enough to have biographical wikipedia articles." (from Talk:Victoria School)
AI-generated examples:
  1. "If you are willing, let’s focus on specific wording changes and sources rather than framing edits in dismissive terms." (from User talk:Grayfell)
  2. "I am a genuine contributor, and the sources I've cited are real academic publications. Instead of personal accusations, let's focus on the scholarly evidence." (from Talk:Sasanian Empire)
  3. "Instead of discrediting sourced linguistic information, let’s focus on improving the accuracy and neutrality of this article together." (from Talk:Dolma/Archive 1)
In the first 20 results when sorting by most recent for the phrase "let's focus on", 2 were AI-generated, while 18 were human-written. From what I've seen, AI combines the phrase "let's focus on" with "instead" or a word or phrase that is synonymous with "instead". ~2026-32561-33 (talk) 14:54, 2 June 2026 (UTC)
All AI-generated examples use negation while only one human-written example uses it. Although it's not an objective criteria, all AI-generated examples also seem much more smug. ~2026-32561-33 (talk) 14:55, 2 June 2026 (UTC)

Removing pre-2025 examples?

Would it be best to remove pre-2025 examples for indicators that are still active? Even if LLMs still have the indicator, it may be best to find newer examples, as older examples may be more obvious than newer examples. There are some on this page dating back as far as 2023, when GPT-3.5 was the newest ChatGPT model. ~2026-32561-33 (talk) 16:38, 2 June 2026 (UTC)

Alternatively, we could include a timeline of examples, but this page is already extremely long. However, this may be helpful if it is split into a collection of subpages. ~2026-32561-33 (talk) 16:39, 2 June 2026 (UTC)
No. We should not remove them, but instead note that they are historical examples. They are useful for detecting older LLM-generated text inserted in 2023 or 2024. SuperPianoMan9167 (talk) 16:44, 2 June 2026 (UTC)
Maybe collapsing them would be a good choice? ~2026-32561-33 (talk) 15:03, 3 June 2026 (UTC)
So this is kind of tricky, because the AI signs come from the specific models that are used, whether it's mini vs. full, "reasoning" vs. "instant", but we almost never know what models were used; the date of an edit is a rough but imperfect proxy. The research also has not caught up to GPT-5.
The cutoffs I use in the datasets I have are May 13, 2024 (GPT-4o), December 5, 2024 (o1, the "reasoning" model), August 7, 2025 (GPT-5), and November 2, 2025 (GPT-5.1). There is not much meaningful difference between the "4o" and "o1" band (scare quotes because who knows what was used), nor any between the "GPT-5" and "GPT-5.1" band (unsurprising), but I probably could be subdividing these better to correspond more with the actual ChatGPT updates.
The most characteristic signs on the page of newer AI text here are the itemization of coverage, the edit summaries, and the lack of things in the human syntax section. There's a kind of bifurcation, though; articles with "late superficial analyses" -- highlighting/emphasizing and that's it -- do occasionally still show up. Gnomingstuff (talk) 22:36, 6 June 2026 (UTC)

Human-written syntax section

The "signs of human writing" section is extremely thin, so I added a section for syntax. This is a extremely consistent pattern no matter how I slice my data, it still applies to text from GPT-5 on (more so than some of the AI signs, really), and has at least some corroboration by research.

The section can probably be improved though:

  • It would be nice to have citations for more of these. There is more stuff I can dig out of the Reinhart paper, but...
  • It could do a better job of explaining the syntactic distinctions, because this is territory where if you change one word, the "sign" might be totally different, and where there are also some exceptions. The who/what/when/where/why pattern mentioned in the Reinhart paper, for instance, does show up here too, but primarily in constructions like is what.
  • Because I can already hear people complaining that "YOU'RE JUST SAYING THAT GOOD WRITING IS AI!!!!", I'm not sure how to get across the fact that this stuff is not worse writing: "used" is better than "utilized," "died" is better than "passed away," there is nothing wrong with the phrase "there is," etc.

Gnomingstuff (talk) 21:54, 6 June 2026 (UTC)

I'm really hoping there will be more research over time, but AI moves so quickly that a lot of research is almost out-of-date by the time it's published. The last point happens a lot - editors will defend their AI-use by saying it's professional writing, when in reality you can see their writing styles differ vastly. I suppose it's the fact that an encyclopedia is written for the reader, so overly-complex or stilted language isn't better at Wikipedia, it's arguably worse. In solidarity, Blue-Sonnet (I'm listening) 02:37, 7 June 2026 (UTC)
Case in point, any of the articles in my post here. Absolutely none of these were written to help a reader understand the concept, they were written for the sake of their own existence. In solidarity, Blue-Sonnet (I'm listening) 02:41, 7 June 2026 (UTC)
LMAO IT HAS ALREADY STARTED
I HATE EVERYTHING Gnomingstuff (talk) 14:41, 9 June 2026 (UTC)
The last of your bullet points? Yep, I've had that several times! In solidarity, Blue-Sonnet (I'm listening) 01:26, 10 June 2026 (UTC)
Whilst searching for and skimming through interactions involving contributors being called out for AI use, I've come across responses with talking points along the lines of WP:STYLISTIC, WP:ITSFORMALNOTAI, WP:SOWHATIFITSAI, WP:LETSJUSTFIXIT and WP:AIACCUSERISMEAN. – MrPersonHumanGuy (talk) 15:27, 12 June 2026 (UTC)
I meant the “WHY ARE YOU MAKING ME DUMB DOWN MY WRITING” shit (to which the orwell quote about unnecessary complexity being a sign of bad writers always comes to mind) Gnomingstuff (talk) 14:55, 13 June 2026 (UTC)

Thematic breaks before headings

@Sapphaline: If this is common in Markdown output, do you recall finding (or are you able to find) such a thing in other AI-generated drafts besides Draft:Ichafu, or is this something that LLM users seem to almost always remove/exclude from pasted text whilst not bothering to remove misplaced asterisks? – MrPersonHumanGuy (talk) 13:43, 12 June 2026 (UTC)

No, but I also don't actively look out for AI-generated additions. sapphaline (talk) 09:26, 14 June 2026 (UTC)
They aren't common in LLM drafts per se, but they virtually never show up in any other context. Gnomingstuff (talk) 05:35, 15 June 2026 (UTC)

Claude's Fable giving valid citations

Regarding Wikipedia:Signs of AI writing § Citations, this has been a clear sign of LLM output for me for some time. It's also one that is clearly harmful to the encyclopedia and a time sink on volunteers. I was doing some various updates to the Piri Reis article recently. Most of those are not Claude-related (in hindsight I kind of wish I did the Claude stuff separately to make this more clear), but if you look at the bibliography's full citations those are mostly suggestions from Claude's Fable model. I (of course) checked the ISBNs in the books, then did conversion from 10 digit to 13 digit with an online tool. That said, Fable was right for every ISBN it flagged as mis-hyphenated. It also caught my goof where I wrote "Georgia Press" instead of "Gorgias Press".

I was also trying it out for fact-checking to see if it could spot errors, and I wanted to share one that struck me as much more plausible than older models. Fable suggested changing from "most detailed portolan atlas of the 16th century" because the cited source limited its claim to a region: "most complete sixteenth-century portolan of the Mediterranean" This is true, and it's a valid point. Fable further identified Soucek (2013) as a stronger source making a stronger claim "The Kitabı Bahriye has no equal as a comprehensive and detailed volume of portolan charts and texts covering the entire mediterranean." And suggested the very specific inline citation "<ref>{{harvnb|Soucek|2013|p=135}}.</ref>" which is a functioning short citation, the same style as other inline citations, pointing to the correct full citation, and citing the specific page. This is way more human than Claude's older models, especially Sonnet.

The is tremendously useful for editors, but once PR companies figure out the overlap on agentic LLM and Wikipedia copyediting, it kind of opens up a very difficult kind of problematic editing to catch. As long as Fable limited itself to small edits, I would personally not flag them as LLM. Since Fable is meant for agentic usage, you could create (not suggesting this of course) Fable sock puppets that individually never cross a line but collectively distort a topic. Rjjiii (talk) 15:47, 13 June 2026 (UTC)

In the end, using Fable (or any LLM) for editing is still of course not allowed, but this is an interesting development.
Your sample size is very small, however, so we should be reluctant to draw conclusions from these examples without more experimentation and repetition.
Unfortunately, we actually can't continue this study for now, because as of yesterday no one can use Fable or Mythos. SuperPianoMan9167 (talk) 16:32, 13 June 2026 (UTC)
There's no restriction against using LLMs for finding potential sources, as long as you actually read the suggested sources and make sure they actually verify whatever text you are using them for. Gnomingstuff (talk) 19:17, 13 June 2026 (UTC)

Canned usage of the word "while"

LLMs seem to have peculiar ways of using the word while. An AI-generated article might have a phrase that goes something like "X does Y while doing Z" and an AI-generated comment might have a phrase that goes something like "While A is true, B is false", where while is used in the same context as although or even though.

Examples

From multiple December 2024 comments left at Talk:Kamaria Ahir

From multiple March 2025 comments at Wikipedia:Articles for deletion/Viraj Bahl

From this June 2026 revision to a user page that could have its content moved to Wikipedia:Signs of AI writing/Examples/Bharanidharan Rajavel

While humans do this too, LLMs seem to have had a habit of doing this on many occasions. – MrPersonHumanGuy (talk) 13:46, 15 June 2026 (UTC)

This is definitely a common construction but I don't really know that it's fully baked enough yet to go on the page since as you said it isn't totally distinct from human writing.)
(I think the real tell here is the "while maintaining" structure and variations -- "while preserving," etc.) Gnomingstuff (talk) 17:05, 15 June 2026 (UTC)
I agree that phrases that go "[positive action] while [verb]ing [another positive or essential thing]" make for a stronger sign. – MrPersonHumanGuy (talk) 19:24, 15 June 2026 (UTC)

New cite bug: [cite: 1]

Draft:Revival Audio is clearly AI-generated, but what's interesting is that it seems to contain a cite bug which is not documented here. Anyone got ideas on which model might be causing the [cite: 1]s? Ca talk to me! 19:01, 16 June 2026 (UTC)

Interesting find, this has been happening for over a year.
date-sorted page list
I've asked one of the still-active editors for details about how they created their edit. We'll probably want to move one of the pages to a subpage here for documentation. fifteen thousand two hundred twenty four (talk) 22:46, 16 June 2026 (UTC)
Found another May 16, 2025 Gnomingstuff (talk) 00:04, 17 June 2026 (UTC)
This thread suggests it may be a Gemini thing, googling '"cite_start" gemini' seems to corroborate (the [cite: n] pattern seems to co-occur with it) Gnomingstuff (talk) 00:06, 17 June 2026 (UTC)
Checked and I concur that it's a Gemini thing, added a mention to WP:OAICITE in Special:Diff/1359769002. now have on-wiki confirmation that it's gemini too (added 23:13, 23 June 2026 (UTC)) fifteen thousand two hundred twenty four (talk) 02:46, 17 June 2026 (UTC)

Line breaks

Pasting from LLMs tend to reproduce (inherit) line breaks in the LLM tool's window. So line breaks will be placed arbitrarily in the target text, which is a big tell. Often, users correct it, by manually erasing the line breaks. But it's error prone, it can either leave more than one space, or eliminate the space between words. The current page may contains such examples, such as "That would help memake [sic] targeted improvements" ~2026-35837-16 (talk) 19:05, 19 June 2026 (UTC)

Do you have recent examples? I've noticed this with older (2023-24) articles, similar to the issue with including line feed characters, but don't have any examples handy from this year, maybe you do? Gnomingstuff (talk) 21:27, 19 June 2026 (UTC)
Draft:Carmela J. Warfield was from this month (it also inserted a lot of "<" characters) - I've definitely seen a couple of other with excessive linebreaks recently but can't seem to trace them now. Andrew Gray (talk) 13:52, 20 June 2026 (UTC)

new but maybe rare-ish sign: this bizarre hexadecimal stuff

This stuff. It seems to have appeared a small amount of times in the wild in recent comments/text with the usual AI signs. The pattern seems to be the first letter of a word, then comma-separated strings with each letter preceded by "h49981eab0f8056ff", which seems to be hexadecimal for god only knows.

Not sure it's worth adding unless it's more common? Gnomingstuff (talk) 06:44, 20 June 2026 (UTC)

The earliest instance I've seen is from March 2023, that's a long time for such a specific ChatGPT bug to remain. I'd wait until we have more than one example on project, will search for new instances for the next few months too. fifteen thousand two hundred twenty four (talk) 07:24, 20 June 2026 (UTC)
In case it's relevant, the number is 5303022281277069055 in decimal. No idea what it's supposed to represent. SuperPianoMan9167 (talk) 14:25, 20 June 2026 (UTC)
Maybe a 64 bit hash leaked from an LLM’s training data, possibly when asked to include references or links. No idea which model might exhibit that bug though. ASUKITE 14:55, 20 June 2026 (UTC)
Yeah I tried decoding to base64, breaking both hex and the decimal into chunks to see whether that yielded anything, no dice. Gnomingstuff (talk) 04:25, 22 June 2026 (UTC)
If they keep trying to place hexes on other users, they might have to be reported to WP:AIMUM. – MrPersonHumanGuy (talk) 15:21, 20 June 2026 (UTC)
Apologies in advance for sharing LLM output, but I do sometimes find LLMs can help interpret LLM behaviour. I asked my Opus research minion about a similar case yesterday.
Claude Opus 4.8 output
This looks like an AI assistant / agent-generated reply (or a scripted "auto-respond to permission warnings" tool) where:
  1. A template was filled with two placeholders — {{filename}} and {{license}}.
  2. The substitution layer tokenized those placeholder values character-by-character and joined them with an internal marker (,h<hash>), which is exactly the kind of per-character anchor/fingerprint string some text-generation or "computer-use" pipelines emit. The repeated 49981eab0f8056ff is a single session/run identifier, not random noise.
  3. That raw intermediate text got posted to the wiki verbatim instead of the rendered output.
(Full transcript.) —In solidarity with Wiki Workers United · ClaudineChionh (she/her · talk · email) 02:51, 21 June 2026 (UTC)
That's just hallucinations.
  • scripted "auto-respond to permission warnings" tool, is a plausible sequence of tokens to generate, but the result is logically nonsensical
  • A template was filled with two placeholders, other instances of this sequence do not align with placeholder use
  • exactly the kind of per-character anchor/fingerprint string some text-generation or "computer-use" ... ???
  • 49981eab0f8056ff is a single session/run identifier, no it's not, both because this exact string has been showing up for over three years, and also because I've seen it mixed with other strings like efd27ef3bf2041dd in the same sequence
fifteen thousand two hundred twenty four (talk) 03:45, 21 June 2026 (UTC)
They're hash values for bold and italics on Wikimedia projects! Take a look here on GitHub:
// hash = store.hashOfValue( ve.dm.example.bold )
ve.dm.example.boldHash = 'h49981eab0f8056ff';
ve.dm.example.italicHash = 'hefd27ef3bf2041dd';
ve.dm.example.underlineHash = 'hf214c680fbc361da';
ve.dm.example.strongHash = 'ha5aaf526d1c3af54';
~ oklopfer (💬) 05:22, 22 June 2026 (UTC)
Nice find! That explains why it's only been seen on mediawiki instances, but still leaves me wondering how they're ending up in the text. I'm somewhat doubtful that it's an LLM-specific issue. fifteen thousand two hundred twenty four (talk) 06:16, 22 June 2026 (UTC)
Agreed, it could very well just be an odd VisualEditor bug (the component that they are hashes for), rather than necessarily LLM crap. ~ oklopfer (💬) 06:40, 22 June 2026 (UTC)
Oh nice! Makes sense.
The AI angle may well be a coincidence, although it's a little weird that the two Wikimedia examples (who even knows what's going on with the Fandom examples) both seem like pretty straightforward AI text. Gnomingstuff (talk) 08:07, 22 June 2026 (UTC)

New (rare) sign

Hello, this is my first time here. I've noticed recently that large language models seem to follow a "Adjective/less commonly an adverb, adjective noun" (may have no comma) and a "Adjective and Adjective noun" pattern, to put it simply:

It plays a role in the ecosystem and contributes to Hawaii's rich cultural heritage.

The mall maintains a strong digital presence [...] impactful video content playing a key role in its outreach.

[...] Algeria's stunning natural beauty [...]

[...] serving as a major railway hub [...] improving facilities and operational efficiency [...] regional and national transportation landscape.

[...] emphasizing the importance of solidarity and collective action [...] contemporary African-American families [...] driving a commitment to empowerment and social change [...]

[...] Its bilingual monument sign [...]

[...] honors the Spanish colonial heritage [...]

[...] demonstrates the enduring relevance of Blois's work in modern linguistic studies [...]

[...] through responsible corporate practices.

For Wikipedia articles, I have seen less of these patterns. Though, on other sites:

[...] invasive, predatory structural-conceptual phenomenon expanding from the terrestrial upper mesosphere. [...] dense, parasitic architecture [...] calcified biological tissue that is violently replacing the physical sky.

On the official SCP Wiki, this is the current SCP-9136 article, but will be deleted in a few days or so due to being AI-generated.

[...] baseline vocabulary words from human thought patterns. To unlock the hidden audio file recovered from Probe *Aegis-IV*, toggle the active memory strings to align the original, deleted human words [...]

Again, on the official SCP Wiki, this is the current SCP-9136 article, but will be deleted in a few days or so due to being AI-generated.

Thoughts? Not sure if its worth adding. ★ Campssitie (msg) (in solidarity, #943) 🧋🏖 16:20, 20 June 2026 (UTC)

For what they appear to be, these could all be considered smaller-scale manifestations of the rule of three. – MrPersonHumanGuy (talk) 17:15, 20 June 2026 (UTC)
If that's true, then I think it's worth noting that in the RO3 section. ★ Campssitie (msg) (in solidarity, #943) 🧋🏖 19:40, 20 June 2026 (UTC)
  • This sounds quite like purple prose which predates AI. Some writers think that more is better and so overdo the adjectives. AIs are generally quite prolix and so will be prone to this but they have learnt it from humans, right? Wikipedia abounds in verbosity. Andrew🐉(talk) 10:46, 28 June 2026 (UTC)
    I can see that, but seeing these, AI tends to overdo this pattern specifically in thirds, leading to moments like the excerpt of the SCP I quoted. I do agree that this may just be a case of purple prose, as I myself tend to use this pattern too much.
    Another thing I noticed is also the "adjective, adjective, [and] adjective noun", but I never mentioned it due to it being the rule of three. ★ Campssitie (msg) (in solidarity, #943) 🧋🏖 12:30, 28 June 2026 (UTC)
This is the most detailed source on LLM syntactic patterns I am aware of. The appendix shows a slight increase by GPT-4o in this kind of adjective use (e.g. "the big horse" rather than "the horse was big"), but it isn't a very strong increase compared to some other features (e.g. present participials). It doesn't list double-adjective combos, though. Gnomingstuff (talk) 02:31, 29 June 2026 (UTC)
Thanks for the source! I'd say this is a small-scale manifestation of the rule of three I've seen in more recent models (like GPT-5 or Gemini's latest LLM model). The reason why i'm mentioning this is because AI overuses it, like the SCP excerpt, specifically in the "adjective/adverb, adjective noun" pattern in such a degree it's noticeable. (specifically its overuse, its fine if it has double-adjective combos here and there in moderation, but not when it double-adjective combos every sentence and noun)
I tend to see this when AI trys to be scientific or clinical, resulting in minor "word salad". More specifically, double-adjective combos or one-adverb, one-adjective combos. ★ Campssitie (msg) (in solidarity, #943) 🧋🏖 03:53, 29 June 2026 (UTC)

Discussion at Wikipedia talk:Help, I've been accused of AI! § Requested move 22 June 2026

 You are invited to join the discussion at Wikipedia talk:Help, I've been accused of AI! § Requested move 22 June 2026. – MrPersonHumanGuy (talk) 22:46, 23 June 2026 (UTC)

another reference-cruft sign, this one from Perplexity

Too uncommon at present to add but Perplexity will sometimes cite sources as an S3 bucket preceded by "ppl-ai-file-upload." Needless to say, accessing those sources will not work.

Example: Special:Diff/1345574273 Gnomingstuff (talk) 18:13, 25 June 2026 (UTC)

Splitting AITALKSIGNS

Given that the filesize of WP:AISIGNS (~237.6 KB) is now greater than that of the Manual of Style (~236.7 KB), I've thought about forking the contents of WP:AITALKSIGNS over to a separate page and having the section at AISIGNS condensed to resemble something like this:

Of course, that means every shortcut that redirects to any of its subsections would have to be retargeted accordingly. – MrPersonHumanGuy (talk) 16:41, 1 July 2026 (UTC)

Addendum: I went ahead and did what I've proposed. – MrPersonHumanGuy (talk) 20:18, 1 July 2026 (UTC)

I saw this proposal earlier when I was at work and didn't have a chance to reply to it, but I think it's a good idea. I tagged the talk page for now (copying some tags from here, and added attribution). I think organizing the content here is a good idea and deserves more attention considering how important it is. ASUKITE 00:09, 2 July 2026 (UTC)
Agreed this is a good and necessary change Gnomingstuff (talk) 22:07, 3 July 2026 (UTC)

Discussion at Wikipedia talk:WikiProject AI Cleanup § Canned user pages

 You are invited to join the discussion at Wikipedia talk:WikiProject AI Cleanup § Canned user pages. – MrPersonHumanGuy (talk) 20:00, 2 July 2026 (UTC)