Searching through files for matching text strings (Powershell)


To totally unlock this section you need to Log-in

Sometimes we need to search in notes (such as quick .txt files) or text-based configuration files spread over our system to find something specific, but it can be very time-consuming and tedious to sift through many files, manually. Fortunately there is scripting on our side and, specially, Powershell scripting to aid in this usually very time consuming activity.

Let's consider a directory, "C:\Temp" with many text files created. Each of the files has random text data inside. We’re looking for only the files that contain one particular text string. Additionally, since we don’t know how many matches we are going to find, we’re going to create an array to store the found matches.In order to search for strings or string patterns, we’re going to use the cmdlet Select-String.

$Path = "C:\temp"
$Text = "This is the data that I am looking for"
$PathArray = @()
$Results = "C:\temp\test.txt"

### The following code snippet gets all the files in $Path that end in ".txt".

Get-ChildItem $Path -Filter "*.txt" |
Where-Object { $_.Attributes -ne "Directory"} |
ForEach-Object {
If (Get-Content $_.FullName | Select-String -Pattern $Text) {
$PathArray += $_.FullName
$PathArray += $_.FullName
}
}
Write-Host "Contents of ArrayPath:"
$PathArray | ForEach-Object {$_}

Here’s the breakdown: this will search the directory $Path for any items that include ".txt" in their names and that are not directories.

Get-ChildItem $Path -Filter "*.txt" | 
   Where-Object { $_.Attributes -ne "Directory"}

For every match it finds, it will check the contents of the match using Get-Content and verify any matches with $Text by using Select-String. If it finds a match, it puts the full name of the match into the $PathArray array.

 ForEach-Object { 
         If (Get-Content $_.FullName | Select-String -Pattern $Text) {
            $PathArray += $_.FullName
            $PathArray += $_.FullName
         }
      }

There you have it. The following is another way about exporting results to file:

If you want to export that all to a file instead of on the screen, then simply pipe it into an Out-File cmdlet. For example:

$PathArray | % {$_} | Out-File "C:\Some Folder\Some File.txt"

Searching through all subfolders: If you want to search all subfolders for additional *.txt files as well, then add -Recurse to the original Get-ChildItem command:

Get-ChildItem $Path -Filter "*.txt" -Recurse

The -Include approach

Get-Childitem includes two additional parameters, -Include and –Exclude: their functions are pretty simple and they can be very useful when searching for some specific file types.

The -Include parameter says, "Show me only these files in the search", and -Exclude says, "Keep that stuff out of my way."

As a bonus tip, remember that, to include the search in hidden folders and files, we need to add the –Force parameter to let it examine those folders/files as well.

Get-Childitem –Path C:\ -Recurse –Force -ErrorAction SilentlyContinue

We could now use this same command fo example to show only the Word documents that we can access and all the files that include "software" word in the filename.

Get-Childitem –Path C:\ -Include *software* -Recurse -ErrorAction SilentlyContinue

The above command will pull everything with the letters, software, in it, including folder titles. We can tell it to show only files by using PowerShell. This was introduced in version 3 of PowerShell.

Get-Childitem –Path C:\ -Include *software* -File -Recurse -ErrorAction SilentlyContinue

We can also use the the -Exclude parameter to say, "Don’t show me any TMP, MP3, or JPG" files.:

Get-Childitem –Path C:\ -Include *software* -Exclude *.JPG,*.MP3,*.TMP -File -Recurse -ErrorAction SilentlyContinue

Filtering by date

Finally, in PowerShell we can also filter out files based upon date and time quite easily.

Right now, let’s get the date for which we will do the search, for example 1 November 2020.

$FindDate=Get-Date -Year 2020 -Month 11 -Day 1

With this information, we can first off target two things. First, show me all Word documents, files only, on the entire C: drive, and keep the error messages to yourself, PowerShell.

Get-ChildItem -Path C:\ -Include *.doc,*.docx -File -Recurse -ErrorAction SilentlyContinue

In the below example, we can use Where-Object to show only files that were created since the day that we stored in $FindDate. This will include everything since 12:00 AM the morning of that day. We will compare the list against the LastWriteTime property, which is the "Last Time the File was Written to".

Get-ChildItem -Path C:\ -Include *.doc,*.docx -File -Recurse -ErrorAction SilentlyContinue | Where-Object { $_.LastWriteTime -ge $FindDate }

We can also filter on a specific day with the following approach, by using the AddDays() method to our $FindDate and give it a range of 24 hours:

Get-ChildItem -Path C:\ -Include *.doc,*.docx -File -Recurse -ErrorAction SilentlyContinue | Where-Object { $_.LastWriteTime -ge $FindDate -and $_.LastWriteTime -le $Finddate.adddays(1) }

Finally we can also specify multiple paths in which we will do the search following, for example, the below command:

Get-ChildItem -Path C:\Users, D:\Example1, E:\Example2 -Include *.doc,*.docx -File -Recurse -ErrorAction SilentlyContinue | Where-Object { $_.LastWriteTime -ge $FindDate -and $_.LastWriteTime -le $Finddate.adddays(1) }

Searching content in files

A first method to search text string in files (not in filenames) we can use the Select-String cmdlet that expects the path as well as the pattern parameter to be strings, so we do not need to use quotation marks for either the pattern or the path. For example, we can use the following command to search the C:\fso folder for files that have the .txt file extension, and contain a pattern match for "success" text string:

Select-String -Path C:\fso\*.txt -pattern success

The output of this command will be zero or more lines in the format of file.txt:1:success that will show, in this example, that at Line 1 it founds the "success" text string, in the file.txt.

Select-String cmdlet also support, for -Path and -Pattern, array of strings, so we can, for example, also execute commands like the following, to search content in multiple file types or files and/or for multiple patterns:

Select-String -Path C:\fso\*.txt, C:\fso\*.log -pattern success

Select-String -Path C:\fso\*.txt, C:\fso\*.log -pattern success,failure

In addition to directly using the -Path parameter in the Select-String cmdlet, we can use the Get-Childitem cmdlet for more granular control over the files to be parsed.

In the following command, we use the dir command (an alias for the Get-ChildItem cmdlet, and the -R and -I switches are the same of the -Recurse and -Include used by Get-ChildItem) and provide the path of C:\fso, then we include only .txt and .log files and finally we will pipe the results to the Select-String cmdlet and look for the pattern fail (-Pattern is the default parameter and therefore is omitted in the command). The long version of the command is shown here:

Get-ChildItem -Path C:\fso -Include *.txt, *.log -Recurse | Select-String -Pattern success

Here is an example of the shorter form of the command:

dir C:\fso -I *.txt, *.log -R | Select-String success

Search in Office files

The last part of this article will be focused on how to search content in Microsoft Office files, because the Select-String approach will not work for these kind of files.

A COM approach (ComObject) in Powershell to get the ability to read an search in Worksheets in Excel is the following (the COM approach is usually slow and it requires to have Microsoft Office installed on the system on which the script will run, so keep all of this in mind):

Set-StrictMode -Version latest
Function Search-Excel {
    [cmdletbinding()]
    Param (
        [parameter(Mandatory)]
        [ValidateScript({
            Try {
                If (Test-Path -Path $_) {$True}
                Else {Throw "$($_) is not a valid path!"}
            }
            Catch {
                Throw $_
            }
        })]
        [string]$Source,
        [parameter(Mandatory)]
        [string]$SearchText
        #You can specify wildcard characters (*, ?)
    )
    $Excel = New-Object -ComObject Excel.Application
    $Files = Get-Childitem $Source -Include *.xlsx,*.xls -Recurse | Where-Object { !($_.psiscontainer) }
    Foreach ($File In $Files)
        {
        Try {
            $Source = Convert-Path $File
        }
        Catch {
            Write-Warning "Unable locate full path of $($Source)"
            BREAK
        }
        $Workbook = $Excel.Workbooks.Open($File)
        ForEach ($Worksheet in @($Workbook.Sheets)) {
            # Find Method https://msdn.microsoft.com/en-us/vba/excel-vba/articles/range-find-method-excel
            $Found = $WorkSheet.Cells.Find($SearchText) #What
            If ($Found) {
                # Address Method https://msdn.microsoft.com/en-us/vba/excel-vba/articles/range-address-property-excel
                $BeginAddress = $Found.Address(0,0,1,1)
                #Initial Found Cell
                [pscustomobject]@{
                    WorkSheet = $Worksheet.Name
                    Column = $Found.Column
                    Row =$Found.Row
                    Text = $Found.Text
                    Address = $BeginAddress
                    Path = $File.FullName
                }
                Do {
                    $Found = $WorkSheet.Cells.FindNext($Found)
                    $Address = $Found.Address(0,0,1,1)
                    If ($Address -eq $BeginAddress) {
                        BREAK
                    }
                    [pscustomobject]@{
                        WorkSheet = $Worksheet.Name
                        Column = $Found.Column
                        Row =$Found.Row
                        Text = $Found.Text
                        Address = $Address
                        Path = $File.FullName
                    }                 
                } Until ($False)
            }
            Else {
                Write-Warning "[$($WorkSheet.Name)] Nothing Found!"
            }
        }
        $workbook.close($false)
    }
    [void][System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$excel)
    [gc]::Collect()
    [gc]::WaitForPendingFinalizers()
    Remove-Variable excel -ErrorAction SilentlyContinue
}

To use the above function, for .xlsx and .xls files, we can see the following example (the function will recurse in all folders and subforlders if you will give a simple path with no Excel file specified):

### It will search for the text string defined in the Excel file specified:
Search-Excel -Source "C:\Results\Try.xlsx" -SearchText Try

### It will recurse for all Excel files and will search for the text string specified:
Search-Excel -Source "C:\Results\" -SearchText Try

Here it is, instead, a way to search a specific string in multiple Word files, like Excel files above, in a function form:

Set-StrictMode -Version latest
Function Search-Word {
    [cmdletbinding()]
    Param (
        [parameter(Mandatory)]
        [ValidateScript({
            Try {
                If (Test-Path -Path $_) {$True}
                Else {Throw "$($_) is not a valid path!"}
            }
            Catch {
                Throw $_
            }
        })]
        [string]$Source,
        [parameter(Mandatory)]
        [string]$SearchText
        #You can specify wildcard characters (*, ?)
    )
    $Word = New-Object -ComObject Word.Application
    $Files = Get-Childitem $Source -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
    Foreach ($File In $Files) {
        Try {
            $Source = Convert-Path $File
        }
        Catch {
            Write-Warning "Unable locate full path of $($Source)"
            BREAK
        }

        $Document = $Word.Documents.Open($File.FullName,$false,$true)
        $Range = $Document.Content

            If($Range.Text -Match $SearchText){

                [pscustomobject]@{
                    File = $File.FullName
                    Match = $SearchText
                    Text = $Matches[0] 
                }
            }
            Else {
                Write-Warning "[$($File.FullName)]: Nothing Found!"
            }
        
        $Document.close($false)
    }
    [void][System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$Word)
    [gc]::Collect()
    [gc]::WaitForPendingFinalizers()
    Remove-Variable Word -ErrorAction SilentlyContinue
}

To use the above function, for .docx and .doc files, we can see the following example (the function will recurse in all folders and subforlders if you will give a simple path with no Word file specified):

### It will search for the text string defined in the Word file specified:
Search-Word -Source "C:\Results\Try.docx" -SearchText computer

### It will recurse for all Word files and will search for the text string specified:
Search-Word -Source "C:\Results\" -SearchText computer

The DocumentFormat.OpenXml approach

Another way of searching and extracting data with Powershell will be using DocumentFormat.OpenXml assembly and importing it in Powershell, and then extracting properties and methods to manipulate Word, Excel and Powerpoint files, but this will be investigated more in a next article.

The main features of DocumentFormat.OpenXml are:

  • High-performance generation of word-processing documents, spreadsheets, and presentations.
  • Populating content in Word files from an XML data source.
  • Splitting up (shredding) a Word or PowerPoint file into multiple files, and combining multiple Word/PowerPoint files into a single file.
  • Extraction of data from Excel documents.
  • Searching and replacing content in Word/PowerPoint using regular expressions.
  • Updating cached data and embedded spreadsheets for charts in Word/PowerPoint.
  • Document modification, such as removing tracked revisions or removing unacceptable content from documents.

Summary
Article Name
Searching through files for matching text strings (Powershell)
Description
Let's see some Powershell ways to search for text or files, also in Word and Excel programs, on a Windows system (server and client). A scripting way to this common tedious activity (searching for files and text in files) would be useful to not waste time, most of the time.
Author
Publisher Name
Heelpbook.net