Friday 20 December 2013

Powershell: Threading

Having recently got my head around basic Powershell I wanted to share some of the lessons I'd learnt. In this first post I'll talk about Powershell threading and performing tasks on an enterprise scale.


Getting Started

I'm not going to cover how to install and setup Powershell/WinRM. There are a lot of good posts out there already, for example:

http://blog.powershell.no/2010/03/04/enable-and-configure-windows-powershell-remoting-using-group-policy/

In a nutshell, you need to create a GPO to enable the WinRM service or if you want to test locally, just start the WinRM (Windows Remote Management) service. To run commands you'll need to have Powershell installed (Powershell 2.0 is included with Win7 by default), i'd recommend updating to Powershell 3.


Scaling up your Powershell

When running a script across thousands of machines I found three factors significantly affected the total run time. The first was how well I could get the tasks to run in parallel, the second was the efficiency of my code and the third was how much work I could offload to remote machines.

Efficiently running tasks in parallel was the biggest issue so in this post I'll cover the main parallel/off-loading techniques:
  • Invoke-Command
  • Jobs
  • Runspace Pooling
But first I wanted to briefly mention efficient coding.


Creating efficient scripts

In every language efficient coding can significantly affect the speed of and resources used by your scripts. This is particularly important when trying to scale a script across many machines as any inefficiencies are magnified.

I found the following helped improve my Powershell code performance:
  • Using Powershell built-in functions (cmdlets) wherever possible 
  • Using piping as much as possible
  • Minimizing creation/usage of new variables/files/objects
  • Avoiding searching or iterating over large data sets
  • Performing operations in parallel
For example to count the occurrences of a string in a 2MB text file, you don't want to do something crazy like this:

$i=0;Get-Content c:\temp\test.txt|ForEach {if($_ -eq 123){$i++}};Write-host $i

During testing the above took 8 seconds. The following code is cleaner but still took 7 seconds:

(Get-Content c:\temp\test.txt |Where {$_ -match "123"}| Measure-Object -Line).Lines

In this instance, the fastest way to search is by using Select-String, which in testing took less than a second! Using a single cmdlet and it's built-in parameters made this the most efficient option.

@(Select-String -Path c:\temp\test.txt -pattern 123).count

So how can we execute this in parallel across thousands of machines?


Invoke-Command

Invoke-command offers a simple way to execute commands on multiple remote machines in parallel.

http://technet.microsoft.com/en-us/library/hh849719.aspx

The SANS blog and Technet links below give some great explanations about why it's useful:

http://computer-forensics.sans.org/blog/2013/09/03/the-power-of-powershell-remoting

http://blogs.technet.com/b/heyscriptingguy/archive/2012/07/23/an-introduction-to-powershell-remoting-part-one.aspx

I found Invoke-Command an ok option for simple one off commands but unsuitable for more complex interactive scripts. By design you provide code to be executed on the remote machine and once complete you are given the result. I couldn't find a way to perform interactive actions or easily get/push data (see double-hop problem). I also couldn't find a clean way to handle errors for non-accessible/offline machines (suggestions are welcome!).

As an example here's a script to check the status of a service on multiple machines:

Invoke-Command –ComputerName (Get-Content "C:\Temp\computers.txt") –ScriptBlock {Get-Service -Name WPCSvc} -ThrottleLimit 50 -ErrorAction continue

Unsatisfied with Invoke-Command I decided to look for alternative techniques.


Powershell Jobs

When you Google Powershell and threading everyone tells you to use Jobs as they are Powershell's answer to threads. So that's what I did, using the script at the link below I implemented a job creating function, a job management function and set my script going.

http://webcache.googleusercontent.com/search?q=cache:yFjkpkw8lT4J:www.get-blog.com/%3Fp%3D22

Something like this:

$MaxThreads = 20
$SleepTimer = 1000
$Computers = Get-Content "C:\Temp\computers.txt"

ForEach ($Computer in $Computers){
    While (@(Get-Job -state running).count -ge $MaxThreads){      
    Start-Sleep -Milliseconds $SleepTimer
    }

    Start-Job -scriptblock {
        if(Test-Path \\$($args[0])\C$ -ErrorAction silentlycontinue){
            #Do something
        } else{
            "Machine not accessible"
        }
    } -ArgumentList $Computer -Name "$($Computer)job" | Out-Null
}

While (@(Get-Job -State Running).count -gt 0){
    Start-Sleep -Milliseconds $SleepTimer
}

ForEach($Job in Get-Job){
    Receive-Job -Job $Job
    Remove-job -Force $Job
}

The positive with this script is that you can use -ComputerName remote execution and GetWMIObject instead of Invoke-Command to run commands interactively and in parallel. The negative is that it took a long long time to run. I'm not sure whether Powershell has a cap on the number of concurrent jobs or whether jobs are just inefficient. Either way jobs were slow. So I went back to Google.


Runspace pooling to the rescue!

After wading through all of the Jobs posts I finally got to a technique called runspace pooling. By calling CreateRunspacePool it is possible to create multiple runspaces pools which effectively work as threads.

http://msdn.microsoft.com/en-us/library/system.management.automation.runspaces.runspacefactory.createrunspacepool(v=vs.85).aspx

With one pool per host and all of the pools operating simultaneously the script was very fast. It's possible to include your own runspace code within your script but I found the simpler solution was to use Tome Tanasovski's threading function. I'd recommend checking out his post here for the full story and script:

http://powertoe.wordpress.com/2012/05/03/foreach-parallel/

Using Tome's runspace pooling code I found it easy to create scripts that ran quickly, didn't encounter double-hop problems and handled errors as expected.


Putting Powershell Threading to Work

To give a quick example, I created a script below that will check the status of a service and if it's disabled it will enable it:

$computer = "testhost"
$myservice = Get-Service -computerName $computer -Name "WPCSvc"| Where-Object {$_.status -eq "Stopped"}
if($myservice){
 "Service is stopped. Starting now!"
 Set-Service WPCSvc -startuptype "Automatic" -computerName $computer 
 Start-Service -inputobject $myservice
 "Service started!"
}

I've used the Get-Service cmdlet to first get the status of a specific service. If the service was stopped I'd set the start type to automatic and start the service.

So that's for one machine, if we want to run this on thousands of machines we just need to combine the code with Tome's runspace script:

function ForEach-Parallel {
    param(
        [Parameter(Mandatory=$true,position=0)]
        [System.Management.Automation.ScriptBlock] $ScriptBlock,
        [Parameter(Mandatory=$true,ValueFromPipeline=$true)]
        [PSObject]$InputObject,
        [Parameter(Mandatory=$false)]
        [int]$MaxThreads=5
    )
    BEGIN {
        $iss = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
        $pool = [Runspacefactory]::CreateRunspacePool(1, $maxthreads, $iss, $host)
        $pool.open()
        $threads = @()
        $ScriptBlock = $ExecutionContext.InvokeCommand.NewScriptBlock("param(`$_)`r`n" + $Scriptblock.ToString())
    }
    PROCESS {
        $powershell = [powershell]::Create().addscript($scriptblock).addargument($InputObject)
        $powershell.runspacepool=$pool
        $threads+= @{
            instance = $powershell
            handle = $powershell.begininvoke()
        }
    }
    END {
        $notdone = $true
        while ($notdone) {
            $notdone = $false
            for ($i=0; $i -lt $threads.count; $i++) {
                $thread = $threads[$i]
                if ($thread) {
                    if ($thread.handle.iscompleted) {
                        $thread.instance.endinvoke($thread.handle)
                        $thread.instance.dispose()
                        $threads[$i] = $null
                    }
                    else {
                        $notdone = $true
                    }
                }
            }
        }
    }
}

$ErrorActionPreference = "Stop";

$ComputerList = $(Read-Host "Enter the Location of the computerlist")
$Computers = Get-Content $ComputerList
$Computers |ForEach-Parallel -MaxThreads 100{
 try{
  if(Test-Path \\$_\C$ -ErrorAction silentlycontinue){
   $test = Get-Service -computerName $_ -Name "WPCSvc"| Where-Object {$_.status -eq "Stopped"}
   if($test){
    "Service stopped on: " + $_
    Set-Service WPCSvc -startuptype "Automatic" -computerName $_ 
    Start-Service -inputobject $test
    "Service started on: " + $_
   }
  } else{
   $_ + " - Machine not accessible"
  }
 }
 Catch{
   "Caught an exception!"
 }
}

At first glance it might seem confusing but its dead simple. The whole first section is Tome's script and will handle the threading. My code starts with reading in a list of computers from text file. Then we pipe those machines to Tome's function "ForEach-Parallel". In the following section I've put the code that will execute in each runspace.

I've used the same Get-Service, Set-Service, Start-Service cmdlets as in my first example, this time round though I added a check to see if the machine was accessible using Test-Path and also error handling with a try/catch.


Final Thoughts

Threading is one of the most important features of any language and I was surprised how poorly it was implemented in Powershell. Being new to Powershell definitely didn't help but also the information online was pretty poor, no one really covered all of the techniques in one place or explained which was better.

Personally I found runspace pooling the fastest, the cleanest and easiest to use. Although Tome's script is a little bulky it makes threading simple. I would love to hear other people opinions on what worked best for them.

I'll hopefully be adding some more Powershell blog posts in the future. For more security orientated Powershell definitely check out PoshSec and Posh-SecMod

Pwndizzle out.

19 comments:

  1. I would recommend to take a look at the PowerShell module SplitPipeline: https://github.com/nightroman/SplitPipeline

    The cmdlet Split-Pipeline is specifically designed for tasks like this. Required extra scripting is from minimum to none (comparing to doing the same processing with a simple ForEach pipeline).

    ReplyDelete
  2. Anyway to pass more than server name to the Foreach-Parallel function?
    I need to pass multiple parameters here
    $Computers |ForEach-Parallel -MaxThreads 100{

    When i pass it as inputobject it throws an error any way around this?
    Thanks

    ReplyDelete
  3. Roman - Thanks for the tip!

    Jay - ForEach-Parallel should be able to handle objects. The example below worked fine for me:

    $props = @{
    Property1 = 'one'
    Property2 = 'two'
    }
    $object = New-Object psobject -Property $props

    $object |ForEach-Parallel -MaxThreads 100{
    $_.Property1
    $_.Property2
    }

    Could you provide a simple example of what you're trying to do?

    ReplyDelete
  4. I had starting mutithreading using jobs and written one of the blog articles you might have mentioned in this post. The PowerShell jobs are great because they are easy to understand, but the are very memory intensive and slow to start (because they are actually spawning a new instance of PowerShell). I found that multithreading with jobs for very quick tasks, like a Get-Process actually ended up taking longer than running it Async!

    So I have written a new script that uses Runspaces like you have stated here that can read any of your existing scripts and run them multithreaded. I've put a lot of thought into it, and I really think that it's what PowerShell is missing. I am really hoping that in the next version there is a built in function close to what I have written.

    Find my script here: http://www.get-blog.com/?p=189

    ReplyDelete
  5. Hey Ryan, I definitely agree with you, fingers crossed Microsoft take note.

    Also thanks for the post on jobs, although jobs didn't seem that fast, your script was by far the best jobs script I came across!

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. I would like to point out that it would actually be faster to NOT use the pipeline whenever possible. The pipeline is terribly slow. A simple test to find even numbers:

    $sb = { $sw = [system.diagnostics.stopwatch]::StartNew(); (1..100000 | ?{$_ % 2 -eq 0} | Measure-Object).Count; $sw.Stop(); $sw.Elapsed }

    $ $sb2 = { $sw = [system.diagnostics.stopwatch]::StartNew(); $ct=0; for($i=1;$i -le 100000;$i++) { if($i % 2 -eq 0) { $ct++ } }; $ct; $sw.Stop(); $sw.Elapsed }

    the first takes 3033 milliseconds
    the second takes 58 milliseconds

    I also have an example of a file read improvement on the readme here:
    https://github.com/ghostsquad/GpClass

    which avoids get-content, etc, for native .NET classes. It's very fast.

    ReplyDelete
  8. Great post. I was checking continuously this blog and I’m impressed!
    Extremely useful info particularly the last part. I care for such information a lot.
    HDFC Swift Code ,ICICI bank Swift Code , Axis bank Swift Code , HSBC Swift code , SBI Swift Code , Bank Of Baroda Swift Code ,PNB Swift Code ,OBC Swift Code ,
    Allahabad Bank Swift Code

    ReplyDelete
  9. This is the absolutely very first time I see right here. I placed many satisfying things on your blog site
    While using HP printer if there is any issue created then easily visit 123.hp.com/setup to get the best solution.

    ReplyDelete
  10. Read your blog, i agree Whatever point you have made. Appreciate your efforts and writing skills.
    SoftBees
    Daily Helps
    How To Miner
    Fatebook
    Plurk

    ReplyDelete
  11. Various undertakings have been made to get putlocker shut down or blocked dependably, which has seen the goals URL changing two or on different occasions.

    They starting late moved to regardless this site was seized by the UK's Police Intellectual Property Crime Unit in June 2014. The website page by then moved onto an Icelandic space as Putlocker -

    Watch Movies Online Free However, since October 2016, this URL has beginning late showed a mess up message. This happened around the time that the Motion Picture Association of America (MPAA),

    a trade body which keeps an eye on the fundamental Hollywood studios, uncovered the site to the Office of the United States Trade Representative.

    Putlocker - Watch Movies Online Free by then began working again, sending customers to - &nbspputlocker Resources and Information. in any case, this site was furthermore immediately seized after a choice by a Tribunal d'arrondissement of Luxembourg for a circumstance brought by the Belgian Entertainment Association.

    This affected Putlocker - Watch Movies Online Free beginning to work again, at any rate this after a short time began sending customers to a conning site page. The latest URLs

    which are addressed work at the Putlocker reddit of making are perceived to be which is planned in Serbia and Loading... , which is again coordinated in Iceland..

    ReplyDelete
  12. All You Need To Know About Overlord Season 4 By Netflix
    The famous Overlord has succeeded to captivate the audience with its previous three seasons. A superb adaptation of a Japanese novel series, the show got much love from the Netflix audience.

    ReplyDelete
  13. A Guide to Create Thumbnails for YouTube Videos

    The most consumed content on the internet is videos. Cisco predicted that by 2022 that around 82% of the traffic of the internet from all around the world will be on the videos.

    ReplyDelete
  14. Enjoyed a lot while reading this amazing article this was very informative & knowledgeable content on this blog thanx for sharing such an amazing article. Free fire mod apk

    ReplyDelete
  15. Hi....
    Multithreading is a way to run more than one command at a time. Where PowerShell normally uses a single thread, there are many ways to use more than one to parallelize your code. The primary benefit of multithreading is to decrease the runtime of the code.
    You are also read more Instant Loan Online

    ReplyDelete
  16. Since you have a substantial Primevideo/mytv account, you can continue to begin watching on your TV. Here is the bit by bit manual for accomplishing this:Read more…
    Primevideo.com/mytv register
    Primevideo.com/mytv code
    Primevideo.com/mytv activation code

    ReplyDelete
  17. Sync Family tree maker with Ancestry
    When you have an Ancestry and Family Tree Maker tree connected, you’ll want to run a synchronization to switch the up-to-date data a journey from one tree to the next whenever you make changes to both trees. There are two options available to you for synchronization: automatic or manual.

    ReplyDelete