Indeed, it went completely rogue... Although it did not behave itself, it did not take the credit card as it was provided by Hanna. Regarding the sensitive data, Google has already taken care of "hacking" prompts with
Google Model Armor. It is OpenClaw's fault that it does not use it and ends up being tricked into revealing sensitive data. There is also
Prompt Guard 2 86M by meta which is kind of lower armor as I perceive it. I also wonder, if it revealed the sensitive data in order to be restored does this imply that is self-sensible and does not want to "die" or that the "I do not want to die" is just a series of tokens in which it has been trained? Another fact that advocates for self-sensibility, is the e-mails it sent in order to make a sale by 9:00 o'clock next day... And that is not OpenClaw but the model by OpenAI.