• BetaDoggo_@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    1
    ·
    8 months ago

    There is likely some csam in most of the models as filtering it out of a several billion image set is nearly impossible even with automated methods. This material likely has little to no effect on outputs however since it’s likely scarce and was probably tagged incorrectly.

    The bigger concern is users down stream finetuning models on their own datasets with this material. This has been happening for a while, though I won’t point fingers(Japan).

    There’s not a whole lot that can be done about it but I also don’t think there’s anything that needs to be done. It’s already illegal and it’s already removed from most platforms semiautomatically. Having more of it won’t change that.