I made a comment about how there’s such a wealth of knowledge that was available on Reddit that makes it so useful and whilst the cached pages of Google and Waybackmachine (though I’ve found it doesn’t have a copy of a lot of pages I want to view), I have some fear of these disappearing eventually along with people going back and scrubbing their old comments and posts in an effort to remove their content from Reddit and I suppose devaluing the platform as the information stored is pretty useful.

I came across this dump of Reddit submissions and comments from 2005-2022 for the top 20K subs: https://academictorrents.com/details/c398a571976c78d346c325bd75c47b82edf6124e

It says it’s about 1.66TB. I haven’t downloaded it to have a look at it because I have no space (lol) but I plan to to hopefully preserve and make use of it. When I have time I might write something to index the data so I can search it for what I need.

Just thought I’d share the dump anyway for anyone with similar concerns.

  • Rangelus
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    The stuff I really want preserved are the niche subreddits. For example, there were some great posts about one specific brand of 3d printer which are very helpful, but certainly wouldn’t be included in this dump.