View Single Post
Old 08-02-2024, 01:06 PM  
xxxtrade
Registered User
 
Industry Role:
Join Date: Sep 2022
Posts: 10
Pornhub converted to .json format

I am building a couple of mega tube sites, and wanted to create the 4 million or so blog posts by programs, and so I downloaded the pornhub complete video list, it has about 5.3 million videos, and honestly I found the file a mess, and the delimiters were a combination of |, and simple semicolons ;, and after much work I finally figured out a process to delimit the file into the individual files, parse them, and then to dump out a JSON structure, in separate files.

Because honestly my programming ability is my edge, I do not want too many people to have this data, but if there is anyone that would like a zip with about 5 million separate files in .json format, that you can then load into a H2 database, which I did as note, the thing is about 1.3 tb with all of the data I extracted, then let me know.

It is not cheap, well I guess a lot less than hiring someone, and I want something like Dash crypto, I am thinking about $750 usd, to cover my many hours of work on this, and not sure how many orders I will take. Let me know if interested, if no takers, I am happy with that as well.

Here is one such conversion, and there might be some errors in there, since they have some international characters in there from other countries and it can be hard to parse the data and I kept running into issues, so it might not be a 100% conversion, but the closest I can get to.

If you are interested, hit me up and I will send over a sample json package, and you can see what it looks like.

the offering is a tar.gz with 5,084,132 JSON files, the filename is the videoid on pornhub.

5084132 ../pornhubjson/filelist.txt
xxxtrade is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote