• DaveA
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    1 month ago

    Fascinating, I didn’t realise the latency down there was that bad. How hard was it to get the process working across two distant servers like that?

    Lemmy servers don’t send the next activity until the first is received. From memory it was something like 150-200ms for the round trip to Finland and back. That means a maximum of about 5 or 6 activities per second at the best of times. However, when Lemmy receives say a new comment, it then sends a request to retrieve the user details from the user’s instance, and the whole pipeline is held up. The worst I saw was occasional activities taking 8 seconds to complete (I guess whatever data was being fetched was on a slow instance).

    At one point, kbin.Social hammered Lemmy.world with duplicate requests which then tried to federate out, and that was when the problem was noticed (though Lemmy.world does average more than 5 a second so even after kbin issues stopped we couldn’t recover). A guy on matrix Nothing4You (I’m not sure of Lemmy username ) built a pre-fetcher to trigger Lemmy to retrieve details of posts before Lemmy.world tried to federate them out, thus helping those situations where it was taking multiple seconds to retrieve all details. It helped but was not enough to turn the tide, and we were still getting further and further behind. Nothing4You was meanwhile building a complete batching solution, which you can see on github.

    So for me? It was easy, I just signed up for a server and ran an ansible playbook to set it up, then added a docker container to the Lemmy stack, all the while getting personalised help 🙂. I’m not sure how hard it was to conceptualise a solution, build it, test it, and make sure it was fault tolerant, because I didn’t have to!