Binary Search and Powershell
Time for another Powershell Post.
So, what is a binary search and why should you care about it?
Well, it provides us a mechanism to very quickly search through an ordered list of “things” (like event log entries). It is especially useful when looking through LARGE amounts of data.
I know, I know.. There are all sorts of ways to look through/filter/search event logs and everyone has probably written about this already, but I guess I feel compelled to put in my 2 cents.
We ran into the need to be able to pull event log entries from a specific period of time on several servers.
There aren’t really any “native” Powershell methods to query Eventlogs from REMOTE servers (unless you are on CTP and hitting Vista/2K8 systems).
So a little info about a binary search:
Lets say we have a list of things as follows:
|
Index |
Name |
DOB |
|
0 |
Adam | 01/20/80 |
|
1 |
Cheryl | 09/16/77 |
|
2 |
Frank | 04/01/82 |
|
3 |
Ivan | 10/26/70 |
|
4 |
Suzy | 04/10/90 |
|
5 |
Tim | 11/21/62 |
|
6 |
Wendy | 02/02/71 |
And we are wanting to find which index “Suzy” is in the list.
One way would be to start at the top of the list and see if the Name matches. On a short list this may make sense since it is really only 5 compares before we find her.
But if I was looking for “Suzy” in a list the size of a phone book this can take a long time and a lot of horsepower to find her and may take 300,000 compares or more.
A binary search algorithm helps us by breaking the list down by using something we already know about the list: that it is sorted by name.
With a single compare, though, we can cut the list of viable options in half.
All we need to do is take the index of our smallest value [0] and we add it to the index of our largest element [6] and divide by 2, we get the index of our middle element [3]. Now lets compare what we are looking for (“Suzy”) to the value of element [3].
LowerBound Index = 0
UpperBound Index = 6
Mid Index = (6+0)\2 = 3
Lets take a look at this:
|
Index |
Name |
DOB |
|
0 |
Adam | 01/20/80 |
|
1 |
Cheryl | 09/16/77 |
|
2 |
Frank | 04/01/82 |
|
3 |
Ivan | 10/26/70 |
|
4 |
Suzy | 04/10/90 |
|
5 |
Tim | 11/21/62 |
|
6 |
Wendy | 02/02/71 |
Obviously “Suzy” is greater than our value at index [3] “Ivan”. Since we know this, we can throw out all indexes that are [3] and lower.
Our list of viable options now becomes this:
|
4 |
Suzy | 04/10/90 |
|
5 |
Tim | 11/21/62 |
|
6 |
Wendy | 02/02/71 |
Let’s repeat this exercise on our new list:
Lowest index = 4
Largest index = 6
Middle index = (6+4) /2 = 5
|
4 |
Suzy | 04/10/90 |
|
5 |
Tim | 11/21/62 |
|
6 |
Wendy | 02/02/71 |
So now we compare what we are looking for “Suzy” to our element [5] (“Tim”). Well, “Suzy is less than “Tim”, so that means we can throw out indexes that are [5] and higher.
|
4 |
Suzy | 04/10/90 |
Let’s go one more time…. oh wait.. .there is only one thing left….it’s “Suzy”. We have found her.
There ARE .NET methods (of [System.Array]) that enable us to do binary searches of Arrays for Strings. The only problem is that it will be looking for EXACT matches for the string.
This poses a problem for us if we are looking for something that is a little less exact.
Mark’s Modified Binary Search
Okay, leave it to me to mess with a good thing.
My problem is that I want to grab all events from the System event log that occurred between 8pm and 9pm last night.
This is a little more “fuzzy” than exact, so let me show you what I did….heh
I know you all are getting a little antsy for some code, but we will get there soon.. Stay with me.
function Get-DatedLogEntries([string]$ServerName, [string]$EventLogName, [datetime]$OldestTime, [datetime]$NewestTime) { #Grabbing my Eventlog Entries $EventLog = New-Object System.Diagnostics.EventLog($EventlogName) $EventLog.MachineName = $ServerName $Entries = $Eventlog.Entries
There are some cmdlets designed to return eventlog entries, but they are only really effective if you are querying the local server or if you are leveraging the “new” Powershell Remoting, but you have to be running against Vista or Server 2008.
Fortunately, there is a way to grab remote event log entries via .NET (as shown in the code above)
When done, $Entries will contain all of the events in the logfile sorted by Time.
Next, we will setup our “bound”ing values for our array of entries and the times we are looking for.
#Defining my starting boundaries of my array $Ubound = $Entries.count - 1 $Lbound = 0 $Mid = 0 #Setting up my dates $StartTime = $OldestTime $EndTime = $NewestTime
Now we are ready for the real work. Because I want to grab a range of events, I will run through the binary search two times. The first time will tell me what the UpperBound of my list is, and the second will tell me what the LowerBound of my list is as array indexes.
Before we start checking anything, we can make sure our search isn’t for naught.
Many systems have event logs that can roll pretty quickly, so we can just compare the oldest event to our start time. If the oldest event is newer than my start time, then my logs have rolled and I won’t have any data.
if ($Entries[0].TimeGenerated -lt $StartTime) { while (($Ubound - $Lbound) -gt 1) { $Mid = [int] ( ($Ubound + $Lbound) / 2 ) #Calculate my midpoint #Compare my midpoint to my StartTime if ($Entries[$Mid].TimeGenerated -lt $StartTime) { #If my midpoint is less than my Start time, then throw out all events #below and including my Midpoint $LBound = $Mid + 1 } elseif ($Entries[$Mid].TimeGenerated -gt $StartTime) { #If my midpoint is greater than my Start time, then throw out all events #above and including my Midpoint $Ubound = $Mid-1 } else { #If my midpoint is equal to my Start time, then I got lucky and found my time. #I just realized that I may need to do something else with this. May tackle that # later though... $Ubound = $Mid $LBound = $Mid } } # Now I know that the array index of our oldest item is here $Oldest = $LBound
Now we will repeat the process to find out LowerBound. I won’t document it too much since it is nearly identical code to above. If you have questions, please let me know.
$Ubound = $Entries.count - 1 $Lbound = 0 $Mid = 0 while (($Ubound - $Lbound) -gt 1) { $Mid = [int] ( ($Ubound + $Lbound) / 2 ) if ($Entries[$Mid].TimeGenerated -lt $EndTime) { $LBound = $Mid + 1 } elseif ($Entries[$Mid].TimeGenerated -gt $EndTime) { $Ubound = $Mid-1 } else { $Ubound = $Mid $LBound = $Mid } } $Newest = $LBound
So now we have the LowerBound index for the range I am looking for.
Now to finish it up…
# Sometimes we can end up with endpoints that don't meet our time range # We fix that by going through each side and adjusting them until they # are correct while ($Entries[$Newest].TimeGenerated -gt $Endtime) { $Newest-- } while ($Entries[$Oldest].TimeGenerated -lt $StartTime) { $Oldest++ } if (($Entries[$Newest].TimeGenerated -lt $StartTime) -and ($Entries[$Oldest].TimeGenerated -gt $EndTime)) { #Insert Code here if you want to do something when #no events found during the period requested.. $EntriesByDate = $NULL } else { #Create a new array and assign it the $Entries indexes ranging from our 'oldest' to our 'newest' $EntriesByDate = $Entries[$Oldest..$Newest] } } else { #Insert Code here if you want to do something when # Logs have rolled... and none were in the specified range. $EntriesByDate = $NULL } return $EntriesByDate }
I hope I was able to articulate this effectively. As always, let me know if you have questions, comments, or suggestions.
Thanks for Reading and happy shelling.
So, that is basically it. Here is the code in one big block:
## Get-DatedLogEntries Function ## Written by: Mark A. Weaver ## Website: www.vmweaver.com ## Version: 1.0 ## Date: 7/23/2009 ## Purpose: This Function will get event log entries from the specified server using currently logged in ## credentials and return an array of Events that occurred between the 2 times. ## Not much error checking or validation is done, so you please edit to your liking. ## ## Input: ## -ServerName "ServerName" ## -EventLogName "EventLogName" ## -OldestTime [DateTime]OldestTime ## -NewestTime [Datetime]NewestTime ## ## Output: ## Array of Event log entries or Null if none found ############################# ## Updates: ## ## ## ###################################################################### ###################################################################### function Get-DatedLogEntries([string]$ServerName, [string]$EventLogName, [datetime]$OldestTime, [datetime]$NewestTime) { #Grabbing my Eventlog Entries $EventLog = New-Object System.Diagnostics.EventLog($EventlogName) $EventLog.MachineName = $ServerName $Entries = $Eventlog.Entries #Defining my starting boundaries of my array $Ubound = $Entries.count - 1 $Lbound = 0 $Mid = 0 #Setting up my dates $StartTime = $OldestTime $EndTime = $NewestTime if ($Entries[0].TimeGenerated -lt $StartTime) { while (($Ubound - $Lbound) -gt 1) { $Mid = [int] ( ($Ubound + $Lbound) / 2 ) #Calculate my midpoint #Compare my midpoint to my StartTime if ($Entries[$Mid].TimeGenerated -lt $StartTime) { #If my midpoint is less than my Start time, then throw out all events #below and including my Midpoint $LBound = $Mid + 1 } elseif ($Entries[$Mid].TimeGenerated -gt $StartTime) { #If my midpoint is greater than my Start time, then throw out all events #above and including my Midpoint $Ubound = $Mid-1 } else { #If my midpoint is equal to my Start time, then I got lucky and found my time. #I just realized that I may need to do something else with this. May tackle that # later though... $Ubound = $Mid $LBound = $Mid } } # Now I know that the array index of our oldest item is here $Oldest = $LBound $Ubound = $Entries.count - 1 $Lbound = 0 $Mid = 0 while (($Ubound - $Lbound) -gt 1) { $Mid = [int] ( ($Ubound + $Lbound) / 2 ) if ($Entries[$Mid].TimeGenerated -lt $EndTime) { $LBound = $Mid + 1 } elseif ($Entries[$Mid].TimeGenerated -gt $EndTime) { $Ubound = $Mid-1 } else { $Ubound = $Mid $LBound = $Mid } } $Newest = $LBound while ($Entries[$Newest].TimeGenerated -gt $Endtime) { $Newest-- } while ($Entries[$Oldest].TimeGenerated -lt $StartTime) { $Oldest++ } if (($Entries[$Newest].TimeGenerated -lt $StartTime) -and ($Entries[$Oldest].TimeGenerated -gt $EndTime)) { #No events found during the period requested.. $EntriesByDate = $NULL } else { #Create a new array and assign it the $Entries indexes ranging from our 'oldest' to our 'newest' $EntriesByDate = $Entries[$Oldest..$Newest] } } else { # Logs have rolled... and none were in the specified range. $EntriesByDate = $NULL } return $EntriesByDate }