In order to make safer AI, we need to understand why it actually does unsafe things. Why:

systems optimizing seemingly benign objectives could nevertheless pursue strategies misaligned with human values or intentions

Otherwise we run the risk of playing games of whack-a-mole in which patterns that violate our intended constraints on AI’s behaviors may continue to emerge given the right conditions.

[Edited for clarity]

  • frongt@lemmy.zip
    link
    fedilink
    English
    arrow-up
    16
    ·
    1 day ago

    "we trained it on records of humans and now it responds like a human! how could this happen???”