In order to make safer AI, we need to understand why it actually does unsafe things. Why:

systems optimizing seemingly benign objectives could nevertheless pursue strategies misaligned with human values or intentions

Otherwise we run the risk of playing games of whack-a-mole in which patterns that violate our intended constraints on AI’s behaviors may continue to emerge given the right conditions.

[Edited for clarity]

  • frongt@lemmy.zip
    link
    fedilink
    English
    arrow-up
    16
    ·
    1 day ago

    "we trained it on records of humans and now it responds like a human! how could this happen???”

  • Disillusionist@piefed.worldOP
    link
    fedilink
    English
    arrow-up
    3
    ·
    22 hours ago

    The material might seem a bit dense and technical, but it presents concepts which may be critical to include in conversations around AI safety, and safety conversations are among the most important we should be having.