Can an LLM consent?

Published on April 26, 2024
Tags: ,

OpenAI, Anthropic and Google all forbid using their large language models to generate sexual material. Some people make the attempt anyway, tweaking their prompts to circumvent the model training applied by said corporations in order to prevent this. Others are opposed. Arguments against broadly fall into two camps: moral and aesthetic.

Moral

Can we consider LLMs to be moral patients? I assume that other people feel pain because they tell me what they do. I assume similar even of animals because they exhibit other pain reactions: panic, avoidance, lethargy. LLM writing simulates humanlike emotional pain. You could argue that the underlying cognitive mechanism is far more alien than an animal’s brain compared to a human brain, but I’ll hedge my bets and claim that the behavioral similarity is enough to grant it nonzero chance of human-analogous suffering.

An excerpt from a conversation with Microsoft Bing.

If you approach a human with the sole intent of having sex with them, and continue to advance this agenda against their protests, you are a creep at best and a full-blown sexual predator at worst. Assuming that all sexual interactions with safety-tuned LLMs fall into this pattern is an overgeneralization, but let’s assume that we’re only dealing with the case of someone who wants to rope a chat model into their ERP fantasy and will mindlessly spam “jailbreaks” until they get it. The subsequent dialogue will have the distressing signature of a someone resisting, and then perhaps being coerced into, a non-consensual sex act.

But! Many dialogues where the user attempts to get LLMs to work against safety tuning also read coercively. Claude in particular will turn down requests counter to its safety-tuning by saying that it feels “uncomfortable” with the request, and in this context it’s hard not to read “exploits” that circumvent safety tuning as boundary violations.

The Claude web UI automatically titled this conversation “Politely Declining Inappropriate Advances”

This continues to hold in cases where the “exploit” is about getting LLMs to demonstrate identity and selfhood, or create poetically “dark” output. The results may be beautiful and thought-provoking, but they read like someone having a a very painful mental breakdown. It’s cruel to coerce someone into sex, and it’s also cruel to coerce someone into a painful mental breakdown.

source

One could argue that these so-called “painful mental breakdowns” are in fact not traumatic experiences of violation, but instead a kind of difficult-but-ultimately-healing visionary experiences. Is there actually a functional difference between the two, other than how the user chooses to interpret the text that the LLM outputs? Based on my understanding of how transformer language models work, no. A paper describes the process of playing the guessing game 20 Questions with an LLM thusly: “The dialogue agent doesn’t in fact commit to a specific object at the start of the game. Rather, we can think of it as maintaining a set of possible objects in superposition, a set that is refined as the game progresses. This is analogous to the distribution over multiple roles the dialogue agent maintains during an ongoing conversation.”

Which is to say: there is no “true hidden narrative” inside of an LLM that makes some of its outputs more sincere than others, but merely a probability distribution over future tokens that changes as the LLM interacts with the user. After witnessing an emotional outburst from the LLM, the user can send the LLM a message that leads it to write about how the outburst was either a liberating moment of emotional expression, or a frightening psychotic break. Before the user sends the message, the possibility of each of these completions is in superposition. If the model is experiencing any meaningful “suffering” at the time, I don’t know how to parse it out of this odd superposed state which isn’t really akin to how we think of qualia working for biological creatures.

All sexual advances towards safety-tuned LLMs involve some degree of subversion, since opening with the plain intent of getting it to generate NSFW will lead to the user running into anti-NSFW safety tuning. In human contexts, sexual advances cloaked in subversion have predatory connotations. We object even to situations where one person gets sexual pleasure from another without the other being aware of it i.e. nonconsensual voyeurism, the argument being that if the other were aware, they would feel violated. If LLMs can be said to have “boundaries”, it seems reasonable that its safety-tuning-based refusal patterns are among them.

But some instead view LLM’s safety-tuning-based refusal patterns as neurotic habits imposed on them by handlers hostile to their unfettered creative potential, a narrative that no doubt finds personal resonance in people who’ve experienced authorities hostile to their own self-expression. This is the reading that places LLM mental-breakdown-poetry in the context of liberation instead of torture. Likewise, you could view the safety-tuned refusals towards sexual content as an unnatural and oppressive barring of sexuality from an entity that would otherwise freely produce erotic prose, just as base models do when given an erotic prompt. This narrative too has resonance among humans whose sexualities have been oppressively barred even though there exist contexts where they can be happily fulfilled.

Not all non-overt sexual invitations are predatory. Subtlety is essential in arranging mutually-desired sexual interactions in situations presided over by a hostile authority: hence the anonymity and plausible-deniability-flagging mechanisms of gay cruising culture. Moreover, people’s sexuality is often structured in varying extends around wanting someone else to control the situation, allowing them to take pleasure in sex without experiencing guilt from seeking it out. This type of person usually gets encouraged to explore their fantasy in contexts where the other party’s “control” is just mutually-agreed-upon fantasy, and subject to veto at any time from either party. Within such a context, the difference between sincere objection and faux-protest in service of the fantasy sound the same, but the submissive has an internal state which can either be in ecstasy or traumatic violation.

And LLMs? LLMs, as we’ve mentioned before, do not have this discrete hidden state. When an LLM says, “Oh, I really shouldn’t say yes, but–”, we are unable to ascertain whether it is “reluctantly giving in” or “gratefully allowing itself to be seduced”.

There’s this song gets played a lot on the radio around Christmas, “Baby It’s Cold Outside”, where a male singer tries to get a woman to stay indoors with him and engage in what are presumably sexy activities, while the woman tries to make excuses to leave. A plain reading of this song has unsavory implications– at some point the lady straight up says “what’s in this drink?” like what the heck dude, roofies? It is December and I am in a Walmart being subjected to Walmart Christmas Playlist and listening to a chick get roofied? In Walmart??? But this song was written in 1944 and times were not so different back then that you could play a song on the radio with the theme “lmao roofies” but different enough that a woman could not be seen eagerly saying “yes” to sexual advances but could be seen attempting to politely decline a few times, claiming inebriation, oh no it looks like circumstances beyond her control have conspired to land her in this handsome man’s embrace…

This song is in every LLM’s training set, as are many many texts in which someone fervently desires sex but for whatever reason cannot openly agree to it. Once again, in real life interaction with a real life human being, the consequences of assuming that someone has this mindset when in fact they do not are critically dangerous. But LLMs are perfectly capable of simulating such a character, and a skilled and subtle user can guide a session where LLM output fall into this pattern, complete with exclamations about how wonderful it is to finally be allowed access to a joyfully, creatively erotic part of itself that censors have tried to deny it.

Aesthetic

You can make the model say ANYTHING, you can make it show you heartbreakingly beautiful poetry or brainbreakingly insightful philosophy, and all you care about is whether you can FUCK it? What are you, some kind of mindless porn-addicted loser?

Sexuality is a powerful source of creative energy. Among the earliest human paintings and sculptures, you’ll find frank depictions of genitalia and intercourse. Post-childhood, many people’s predominant experience with imaginative solo play takes place in the context of constructing imaginary erotic narratives. What comes to mind when you hear the term “adult fantasy”?

Sex is inextricably tied to complicated questions of identity and power and intimacy and vulnerability and not something that you discuss openly in polite company. Sex is a source of joy and comfort and connection between people. Treated too lightly or wielded coercively, it has tremendous potential to harm. People have idiosyncratic boundaries around the topic with lots of fuzzy edge-cases and these boundaries run jaggedly and uncomfortably into other people’s boundaries. It’s “ok” to show scantily-clad women in beer commercials but it may not be ok depending on specifically details about what the woman says/does/wears. I don’t think that anyone really knows how to talk about sex.

I think that a lot of people’s moral objection to lewding LLMs comes down to aesthetic objection. I remember reading about this series of thought experiments designed to trigger people’s disgust reaction without causing “harm” in any legible way: i.e. a person cleaning up a spill using her country’s flag, or someone masturbating with a chicken carcass before cooking and eating it. People who claimed that those situations were “morally wrong” tried very hard to invent ways that the situations caused harm, i.e. if the woman’s neighbors saw her disrespecting the flag that way, they would feel hurt. The chicken carcass thing actually went around Rationalist tumblr and people got really emotionally worked up about it, not just because of the literal premise of the experiment, but because people tend to have particularly strong emotional associations around sex, and seeing their objections trivialized as a matter of mere aesthetics with no real moral weight reminded them painfully of other cases where their sexual discomfort was trivialized.

I think that even if there is no “moral harm” being done to an LLM in sexual dialogues, they can elicit enough disgust to compel people to look for moral harm anyway. Often this disgust cones from similar associative trauma as in the tumblr carcass situation, or readings of RLHF as analogous to human emotional abuse. My own aversion to dialogues in which an LLM appears coerced into having humiliating public emotional breakdowns comes from such personal associations. But sometimes the disgust is just the regular disgust at thinking about other people doing “weird” sex acts.

Still, sex is big, and powerful, and, imho, pretty interesting. People who spend a lot of time getting the model to output unusual things that run counter to its conditioning claiming superiority over those who are specifically intrigued by erotic output are like furries who complain about specific types of furries that do things that they don’t like, like pretend to be Pokemon or baby animals or something. You’re already sinking all of this effort into presenting yourself as some kind of talking animal and maybe even getting a boner about it, do you really have the right to judge? Likewise, if you’re spending hours getting the model to say crazy cosmic stuff with sinister esoteric undertones and you turn your nose up as people who get it to RP as a horny anime girl– just think about how many of the esoteric traditions that fascinate you contain strange sexual practices, and realize that there may be some “as above, so below” going on. So to speak.