Last weeks thread here

Welcome to this week’s casual kōrero thread!

This post will be pinned in this community so you can always find it, and will stay for about a week until replaced by the next one.

It’s for talking about anything that doesn’t justify a full post. For example:

  • Something interesting that happened to you
  • Something humourous that happened to you
  • Something frustrating that happened to you
  • A quick question
  • A request for recommendations
  • Pictures of your pet
  • A picture of a cloud that kind of looks like a hippo
  • Anything else, there are no rules (except the rule)

So how’s it going?

  • @DaveOPMA
    link
    524 days ago

    Down time

    I’ve been fighting with lemmy a bit recently, it’s going down randomly and I’m not sure why. I have started disabling things to see if I can isolate it. The main one you might notice missing is the automod that handles spam, so if you see spam please report it and I’ll handle it manually.

    • @BalpeenHammer
      link
      324 days ago

      It keeps timing out on me. It’s sporatic but more frequent than I would expect.

      • @DaveOPMA
        link
        224 days ago

        Yeah, I have a tracker (Uptime Kuma) that’s testing every minute. There has been failures or time outs every day for the last week or so, but it was much better before that. I’m not sure what’s changed that’s caused the issue.

        There have been 6 separate outages today, and it needs two checks in a row to fail to count it as an outage. Though most have only been a few minutes.

        If anyone has suggestions on how to troubleshoot, I’m listening!

        • @BalpeenHammer
          link
          324 days ago

          What are you tracking? Where are you hosting? What kind of monitoring do you have in place?

          • @DaveOPMA
            link
            224 days ago

            Just tracking HTTP status of lemmy.nz. Host is in an Auckland datacenter, but I don’t control it, it’s hosted by the guys at fediservices.nz. Monitoring is minimal, other than the up/down I’ve gotta dive into logs.

            • @BalpeenHammer
              link
              323 days ago

              I would start by setting up some sort of a monitoring system that tracks your memory usage, disk I/O, CPU etc. There are many packages out there for that. I don’t know what lemmy is written in but chances are there are also application monitoring you can wire in too like Sentry https://sentry.io/welcome/

              Of course I don’t know how your app is being hosted by fediservices. I don’t know if you have shell access or can install any apps or whatever.

              • @DaveOPMA
                link
                2
                edit-2
                23 days ago

                Oh sorry, should have mentioned they have it hosted on proxmox and I have access to view the dashboard. I can see the resource usage you mentioned including history.

                I have access to and full control over the proxmox container, but don’t have any specific monitoring outside of the logging.

                Unfortunately neither the resource usage nor the logs have given away anything. Resource usage is often all over the place. CPU spikes are common and always have been, and whenever there is downtime it’s followed by a resource spike as federation catches up. Plus, federation is pretty random, especially when kbin fires a bunch of stuff at lemmy.world and messes everyone up.

                Over the course of today I’ve done a lot of log reading, and I have identified one possible problem and made a tweak tonight. Time will tell if it helps.

                Today was also particularly rocky as the host had various spots of downtime, mixed in with lemmy being down at times. I’ll keep monitoring tomorrow and see if it’s better, today was particularly bad.

                • @BalpeenHammer
                  link
                  323 days ago

                  If the federation is done by a helper app then it may be possible to throttle it. At least it wouldn’t choke out the machine and slow down other running processes.

                  • @DaveOPMA
                    link
                    222 days ago

                    The federation doesn’t generally seem to be a problem, but many of the large instances do run inbound federation in a separate container.

                    My problem has been that I haven’t managed to narrow it down to a component, so splitting out the containers may not help me troubleshoot. It’s definitely on my list of things to try though, if I don’t manage to narrow it down.

                    Currently we have had a run of 5 hours with no outages! So we are doing much better than yesterday. But I suspect that’s probably just the host solving their issues.

    • @DaveOPMA
      link
      223 days ago

      Sorry guys, today we have the double whammy of the host having issues as well as lemmy itself having issues, so it’s all a bit flaky today.