• 1 Post
  • 82 Comments
Joined 8 months ago
cake
Cake day: June 23rd, 2025

help-circle










  • The source for creating the model, the training data, is closed, locked, a heavily guarded corporate secret. But unlike code for software, this data might be illegally or unethically gained, and Mistral may be violating the law by not publishing some of it.

    You can “read” the assembly language of a freeware EXE program just as easily as you can “read” the open model of a closed source LLM blob: not very easily. That’s why companies freak out over potential hidden training data: the professionals developing these models are incapable of understanding them. (I shudder to imagine a world where architects could not read blueprints.)


  • For the purpose of simplification, calling it a closed as an executable is close enough. Or a closed-source freeware ROM that you can download and run on an emulator (since you can just download models and run them via ollama or something similar). Or a closed-source game that supports modding and extension like Minecraft. Or a closed-source DLL with documentation…

    Anyway, the point is, it’s closed. If it’s not closed source, I’d beg you to link the source, both code and data, that compiles to the output.




  • “Malicious” keywords aren’t exclusively the problem, as the LLM cannot differentiate between “malicious” and “benign”. It’s been trivially easy to intentionally or accidentally hide misinformation in LLMs for a while now. Since they’re black boxes, it could be hard to identify. This is just a slightly more pointed example of data poisoning.

    There is no threat to an LLM chatbot outputting text… unless that text is piped into something that can run commands. And who would be stupid enough to do that? Okay, besides vibe coders. And people dumb enough to use AI agents. And people rich enough to stupidly link those AI agents to their bank accounts.