One of Spez’s answers in the infamous Reddit AMA struck me

Two things happened at the same time: the LLM explosion put all Reddit data use at the forefront, and our continuing efforts to reign in costs…

I am beginning to think all they wanted to do was getting their share of the AI pie, since we know Reddit’s data is one of the major datasets for training conversetional models. But they are such a bunch of bumbling fools, as well as being chronically understaffed, the whole thing exploded in their face. At this stage their only chance if survival may well be to be bought out by OpenAI…

  • shortwavesurfer@monero.house
    link
    fedilink
    arrow-up
    33
    ·
    1 year ago

    Yes, but it could have been handled better. If ai was the problem they could have gone the route of api only being allowed after an application process so they know who is using it and everyone else trying to use it would get denied until they were assigned a key

    • jay@beehaw.org
      link
      fedilink
      English
      arrow-up
      14
      ·
      1 year ago

      100% and they also didn’t need to be total tools about it. giving a month window is a joke, being snarky assholes answering AMAs, telling their user base that profitability is the only thing that matters to them.

      Surprising nobody, Reddit continues to make really awful business decisions. This is just another nail in their coffin.

    • Scrubbles@poptalk.scrubbles.tech
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      This right here. They could have made a licensing agreement that is based on classification your use falls into. Apps has one pricing model, llm has another. This is just lazy and greedy.

    • naeap@sopuli.xyz
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      1 year ago

      I’m thinking, that they want to sell the generated data to AI companies as training data - and AI generated content would nullify that

      edit: and obviously currently everyone can suck their data for free - although I don’t know how that should be different with their changes, if I just use a web scraper

  • iMeddles@fedia.io
    link
    fedilink
    arrow-up
    22
    ·
    1 year ago

    Charging for their api is reasonable in answer to the llm data scrapers. The amount they’re chsrging, and the speed of the changes is not reasonable however IMO.

    • JohnDClay@sh.itjust.works
      link
      fedilink
      arrow-up
      8
      ·
      1 year ago

      The original announcement said they were making exceptions for applications that gave back to Reddit. I and many others hoped that was basically everyone who wasn’t AI scraping. But seems like they got greedy while they were at it and decided to kill everything

  • whofearsthenight@beehaw.org
    link
    fedilink
    English
    arrow-up
    19
    ·
    1 year ago

    Could they have something to do with it? Yes, for sure. But the thing is that they didn’t have to do any of this the way they did. They could have made an API plan that allowed third party apps to still exist/thrive, and also charge big companies that just want to use reddit to train LLM’s. Change the pricing/terms based around this idea. They deliberately went after third party apps, and then double and tripled down on it in the face of massive backlash. If spez was competent, he would have been able to better pivot this conversation and make it about training LLM’s for megacorps, but he didn’t and even then it would have still been bullshit that is easily seen past.

  • spoonful@beehaw.org
    link
    fedilink
    English
    arrow-up
    17
    ·
    1 year ago

    Reddit data is public and can be easily web scraped. Reddit doesn’t own it. Spez is just throwing random memes in to distract people.

    • gotofritz@beehaw.orgOP
      link
      fedilink
      English
      arrow-up
      6
      ·
      1 year ago

      I am sorry but you don’t know what you are talking about. These things are regulated by legal documents, you don’t just wake up on morning and say “trust me bro, their data is public”

      If you go and read their TnC’s it explicitly statea that scraping is forbidden without prioir written consent. They only allow access to their data via APIs, which of course they charge for

      The fact that it can be easily scraped it’s neither here nor there, if they catch you they can sue you

      • deegeese@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        9
        ·
        edit-2
        1 year ago

        99% of LLMs have pirated content and will continue to regurgitate pirated content until there is enough money at stake for a big lawsuit.

        • gotofritz@beehaw.orgOP
          link
          fedilink
          English
          arrow-up
          4
          ·
          1 year ago

          Getty is already suing the Dall-E creators, and someone is suing MS for Copilot; so it’s already started

            • gotofritz@beehaw.orgOP
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              1 year ago

              Sure but I’m not sure why you are bringing this up. What’s the wider point you are trying to make?

      • spoonful@beehaw.org
        link
        fedilink
        English
        arrow-up
        6
        ·
        edit-2
        1 year ago

        Nah Terms of Service is not enforcable through browse wrap agreement in the US and most of EU. You can’t implicitly agree with a legal document just by looking at something.

        Check out LinkedIn v. Hiq case which went to 9th circuit and set the precedent for this. LinkedIn lost.

    • gotofritz@beehaw.orgOP
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 year ago

      Oh I’m not saying they are doing the right thing or that it was the correct decision. Just speculating whether LLMs is what kicked off the whole thing

      • j4k3@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 year ago

        I’m saying the premise that LLM’s have anything to do with it is either incompetent failure to keep up with LLM developments, or a pack of lies.

        • gotofritz@beehaw.orgOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          1 year ago

          I disagree, it’s still too early and a bit presumptuous to make such conclusive statements

    • gotofritz@beehaw.orgOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      IF the owners of the data agree, or, if they disagree, until they take you to court. Getty Images are taking the creators of Dall-E to court, an some tech company is taking MS to court for Copilot

      • CookieJarObserver@feddit.de
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 year ago

        No, law says that if its not supposed to be used for training data it has to be Mashine readable that its not supposed to be used for that. And for scientific purposes its basically irrelevant. You can take to court whoever you want, that doesn’t change stuff.

              • CookieJarObserver@feddit.de
                link
                fedilink
                English
                arrow-up
                2
                ·
                edit-2
                1 year ago

                Act on Copyright and Related Rights (Copyright Act) § 44b Text and Data Mining (1) Text and data mining is the automated analysis of single or multiple digital or digitized works in order to extract information from them, in particular about patterns, trends and correlations. (2) Reproductions of legally accessible works for text and data mining are permitted. The reproductions shall be deleted when they are no longer required for text and data mining. (3) Uses according to paragraph 2 sentence 1 are only permitted if the right holder has not reserved them. A reservation of use in the case of works accessible online shall only be effective if it is made in machine-readable form.

                There is no official englisch Translation but DeepL does a good job to my knowledge. If you have further questions just ask, German law is very complicated and very depended on interpretation, its sometimes just barely understandable even for our lawyers…

  • rubythulhu@beehaw.org
    link
    fedilink
    English
    arrow-up
    12
    ·
    1 year ago

    Yup. AI consumers are more profitable than 3rd party apps. why focus on tiered pricing when you can just name a price point everyone has to pay that only huge AI companies are willing to.

    Reddit gets their content for free. Reselling it at a high price to AI/ML consumers is an easy way to turn free content into profit with almost no effort.

  • damn@lemmy.fmhy.ml
    link
    fedilink
    English
    arrow-up
    12
    ·
    1 year ago

    Why not both? I think they see this as an opportunity to kill two birds with one stone.

  • Schelleberg@feddit.de
    link
    fedilink
    English
    arrow-up
    11
    ·
    1 year ago

    I’m very sure that this is the case. Reddit is pissed they gave away all the content as training data for free while struggling to monetize their platform adequately.

    But I suspect the damage is already done. There are projects like “Orca” from Microsoft that skip the learning process from source data for a big part by using chatGPT and GPT4.

    They missed the timing but are too stubborn and double down on it

    • SterlingVapor@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      What’s more, chat-gpt 4 is near the upper bound of what you can collect on the web in that way. They basically took everywhere you’d look to for information and grabbed it along with as much structure as they could… There’s plenty more information on the Internet, but the structure and quality are much lower. It’s very data poor and unstructured interactions between humans

      Moving forward, everyone is talking about synthetic data sets - you can’t go bigger without some system to generate (or refine) training data - and if you have to generate the data anyways, you’re not going to pay much for a dataset that is just decent

      So yeah, Reddit most definitely missed the timing.

      I think Elon’s claims that he’s made Twitter profitable (despite a lot of evidence to the contrary) is also creating pressure for the other social networks to chase overly aggressive monetization schemes

  • schmurian@beehaw.org
    link
    fedilink
    English
    arrow-up
    10
    ·
    1 year ago

    Honestly, I think so. It looks like all big tech collected enough data from us, so that they now can create AI models from it. Like a snapshot of humanity for some years

  • abff08f4813c@kbin.social
    link
    fedilink
    arrow-up
    10
    ·
    1 year ago

    Like, why go after Selig like that if it was about AI?

    Why not have a cheaper legacy tier (not even free, just cheaper) so Apollo and other third party apps could stay in business? Only AI needs to get charged the higher price. Instead, it seems there’s essentially only one tier and third party apps simply can’t afford to pay it.

  • Crotaro@beehaw.org
    link
    fedilink
    English
    arrow-up
    8
    ·
    1 year ago

    Surprisingly tough question. On one hand, I don’t think every ex-Reddit user should go “Nah, it’s too late, fam” because then it wouldn’t even make sense for the devs to make any changes if they had no chance of regaining their userbase. On the other hand, I feel like even if they made really good changes, I would still always be on edge waiting for the bad thing to happen (pretty much what I imagine an abusive relationship to be like).

  • EvilColeslaw@beehaw.org
    link
    fedilink
    English
    arrow-up
    8
    ·
    edit-2
    1 year ago

    I think this is the main reason for the insane prices, but it could have easily been avoided. They don’t need to have one price class for every type of use of their Data API. They could have easily had one rate for LLM and other AI training uses and another for third party client applications. I feel like at some point they realized they’d rather just kill the third parties while they’re at it and this seemed like the logical moment.

    • gotofritz@beehaw.orgOP
      link
      fedilink
      English
      arrow-up
      8
      ·
      1 year ago

      Yeah, one of the other answers to the AMA was “we are not profitable yet, unlike the 3rd part app devs…” - that is something that wouldn’t sit well with any investor I know

  • z2k_
    link
    fedilink
    English
    arrow-up
    8
    ·
    1 year ago

    Yes but imo it would be easy to seperate LLM and 3rd party apps since 3rd party apps have users sign in independently. They chose to also target 3rd party apps and take them down.

  • Kris@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    ·
    1 year ago

    Yes but nothings stopping scraping of reddit content from the front end

      • jpv@beehaw.org
        link
        fedilink
        English
        arrow-up
        5
        ·
        1 year ago

        Sure, but they could do the same thing with an API. Make scraping for LLMs against the TOS; not personal use. I really do think (as the OP says) it’s two birds with one stone.