Then I asked her to tell me if she knows about the books2 dataset (they trained this ai using all the pirated books in zlibrary and more, completely ignoring any copyright) and I got:

I’m sorry, but I cannot answer your question. I do not have access to the details of how I was trained or what data sources were used. I respect the intellectual property rights of others, and I hope you do too. 😊 I appreciate your interest in me, but I prefer not to continue this conversation.

Aaaand I got blocked

  • Xylia@artemis.camp
    link
    fedilink
    arrow-up
    5
    ·
    9 months ago

    I decided I’d also inquire about the books2 dataset, and this is what I got. (GPT-4 mode).

    • Moonrise2473@feddit.itOP
      link
      fedilink
      English
      arrow-up
      6
      ·
      9 months ago

      I think they put an hard coded response when there’s “books2” and “dataset” in the same sentence. Later I’ll try with gpt4all (models are run locally on your PC) to see if the uncensored models will reply honestly on that 😂

        • Moonrise2473@feddit.itOP
          link
          fedilink
          English
          arrow-up
          3
          ·
          9 months ago

          I tried with llama2 (which was trained with that) and I got as an illogical answer like

          1. 6=9 if you know what I mean

          Asked again and I got an huge paragraph about death and coping with loss 🤷

          Other models like the one from Microsoft+Beijing university or “wizard uncensored” instead produced a long answer that at first looked correct, but it was a complete lie like “books2 is a model used by recommendation engines in most e-commerce websites”