Skip to content

Alternative drop in replacement YOLO model? #40

@pogue

Description

@pogue

TLDR: Here are a bunch of pretrained YOLO models and other software that might help expand the capabilities of CAPTCHA recognition in CaptchaSolver

Hi!

I came across your project looking for alternative CAPTCHA solvers as I was getting sick of all the Jdownloader popups and this was exactly what I was looking for! However, I looked over some of the closed threads and noticed some people mentioning it wasn't super great at solving some of the newer captchas because of the older YOLOv2 model used.

I don't know how to train a model, nor do I have anywhere near the adequate hardware to do so. But, I thought surely there are some already made newer models for OCR & CAPTCHA solving and I found a few. I have no idea how to test these out in the real world and since I'm on Windows, it makes running Python scripts and other stuff a bit overly complicated.

So, here are some pre-trained models for OCR/CAPTCHA solving, but I don't know if it would be possible to just drop these into your project and start using them or test them out. But, I wanted to share what I found in hopes it might help.

  • AI-CAPTCHA-Solver - This project claims it will solve up to 90% of CAPTCHAs, but since it says it's for "security testing" (of course) it doesn't mention which ones it's good at. But, it includes a pre-trained YOLO dataset in the files. It's been updated recently (7 months ago) so it could be promising.
  • Captcha recognition with Deep learning - This is an older model from around 3 years ago, but still could be another helpful dataset to use. It says it's based on yolo9000.
  • YOLO CAPTCHA - A 5 year old project claiming 89% accuracy
  • KCAPTCHA-Solver - This model uses YOLOv8 (nice) to solve a Russian based CAPTCHA called KCAPTCHA. I don't know if any of the common file hosting sites use this Captcha, but it might be worth just glancing over to see if it might have some use. The open YOLOv8 model they use is only 4 months old and hosted on HuggingFace.
  • Softy-lines/Captcha-Digit-Detector - This is another YOLOv8 open model that describes itself as a Model Card for Pixelated Captcha Digit Detection. The user also includes their datasets used to train the models, and it looks pretty decent from just glancing at it. Softy-lines/Raw-Pixel-Digit-Captcha & Softy-lines/Pixel-Digit-Captcha-Data
  • secemp9/cloudflare_captcha - Cloudflare CAPTCHAs are particularly annoying, so anything that can bypass these would be great. So many file hosting sites are putting their websites behind Cloudflare, making Jdownload fail the download. This one is based on YOLOv4.

There are some other non-yolo based OCR/CAPTCHA models, but I don't know how easy it would be to adapt these into CaptchaSolver.

  • keras-io/ocr-for-captcha - Wikipedia says keras.io is "an open-source library that provides a Python interface for artificial neural networks." They have an OCR model trained to recognize CAPTCHAs, and it could be promising. From reading it, it sounds like it uses the same technology as Yolo, but I don't know enough about it to know for sure. Documentation: OCR model for reading Captchas: https://keras.io/examples/vision/captcha_ocr/ - They also have an example usage of this model on their Github written in Python: https://github.com/keras-team/keras-io/blob/master/examples/vision/captcha_ocr.py
  • Captcha-Recognizer - This project is for automatically solving those CAPTCHAs where you have to slide a puzzle piece around. These are some of the easiest to do, but obviously not having to look at Jdownloader while it's downloading in the background would be helpful (if any file hosting sites even use these types of captchas)

There is also a lot of general OCR projects out there for just for reading text on the screen. Tesseract is a well known project that is used for transcribing historical documents and can be used on Windows PCs as plugins for software like Irfanview just to scan images and quickly output the text. How well it would work on Captchas I don't know. But, it's free, open source, maintained regularly by large universities and corporations (Wikipedia article on Tesseract). There are other models like Mistral OCR and Qwen that excel at OCR. I don't know how well it would work if you signed up for Mistral or sent a request to them via HTTP or their API on their website or Huggingface and it took a snapshot of your CAPTCHA, sent it to them and waited for a response. Could be a interesting experiment.

OCRBench is a comprehensive evaluation benchmark designed to assess the OCR capabilities of Large Multimodal Models

One alternate method I've been using lately is a browser extension Buster, it's a solver for reCAPTCHA. It works by switching to the audio method of solving the captcha. It sends the data to wit.ai, a platform from Meta for audio recognition, listens, fills out the form and submits. All you do is check the first box on the reCAPTCHA (I don't know why the extension can't do this part, tbh), click the Buster icon, and let go of your mouse and it listens, types what it hears, and hits submit. It really works great! I don't know if any of these captchas on file hosting sites offer this method for visually impaired users.

Anyway, if you search Huggingface for "captcha" there are loads of different results, but very few include documentation or have models that solve completely different captcha systems. Github has a lot of different projects but they're very old, unfortunately.

So I hope @cracker0dks can take a look and see if any of this would be useful for expanding or enhancing the project. Keep up the great work! I hope so of this is helpfup. Thanks,
pogue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions