To totally unlock this section you need to Log-in
Sometimes we need to search in notes (such as quick .txt files) or text-based configuration files spread over our system to find something specific, but it can be very time-consuming and tedious to sift through many files, manually. Fortunately there is scripting on our side and, specially, Powershell scripting to aid in this usually very time consuming activity.
Let's consider a directory, "C:\Temp" with many text files created. Each of the files has random text data inside. We’re looking for only the files that contain one particular text string. Additionally, since we don’t know how many matches we are going to find, we’re going to create an array to store the found matches.In order to search for strings or string patterns, we’re going to use the cmdlet Select-String.
$Path = "C:\temp" $Text = "This is the data that I am looking for" $PathArray = @() $Results = "C:\temp\test.txt" ### The following code snippet gets all the files in $Path that end in ".txt". Get-ChildItem $Path -Filter "*.txt" | Where-Object { $_.Attributes -ne "Directory"} | ForEach-Object { If (Get-Content $_.FullName | Select-String -Pattern $Text) { $PathArray += $_.FullName $PathArray += $_.FullName } } Write-Host "Contents of ArrayPath:" $PathArray | ForEach-Object {$_}
Here’s the breakdown: this will search the directory $Path for any items that include ".txt" in their names and that are not directories.
Get-ChildItem $Path -Filter "*.txt" | Where-Object { $_.Attributes -ne "Directory"}
For every match it finds, it will check the contents of the match using Get-Content and verify any matches with $Text by using Select-String. If it finds a match, it puts the full name of the match into the $PathArray array.
ForEach-Object { If (Get-Content $_.FullName | Select-String -Pattern $Text) { $PathArray += $_.FullName $PathArray += $_.FullName } }
There you have it. The following is another way about exporting results to file:
If you want to export that all to a file instead of on the screen, then simply pipe it into an Out-File cmdlet. For example:
$PathArray | % {$_} | Out-File "C:\Some Folder\Some File.txt"
Searching through all subfolders: If you want to search all subfolders for additional *.txt files as well, then add -Recurse to the original Get-ChildItem command:
Get-ChildItem $Path -Filter "*.txt" -Recurse
The -Include approach
Get-Childitem includes two additional parameters, -Include and –Exclude: their functions are pretty simple and they can be very useful when searching for some specific file types.
The -Include parameter says, "Show me only these files in the search", and -Exclude says, "Keep that stuff out of my way."
As a bonus tip, remember that, to include the search in hidden folders and files, we need to add the –Force parameter to let it examine those folders/files as well.
Get-Childitem –Path C:\ -Recurse –Force -ErrorAction SilentlyContinue
We could now use this same command fo example to show only the Word documents that we can access and all the files that include "software" word in the filename.
Get-Childitem –Path C:\ -Include *software* -Recurse -ErrorAction SilentlyContinue
The above command will pull everything with the letters, software, in it, including folder titles. We can tell it to show only files by using PowerShell. This was introduced in version 3 of PowerShell.
Get-Childitem –Path C:\ -Include *software* -File -Recurse -ErrorAction SilentlyContinue
We can also use the the -Exclude parameter to say, "Don’t show me any TMP, MP3, or JPG" files.:
Get-Childitem –Path C:\ -Include *software* -Exclude *.JPG,*.MP3,*.TMP -File -Recurse -ErrorAction SilentlyContinue
Filtering by date
Finally, in PowerShell we can also filter out files based upon date and time quite easily.
Right now, let’s get the date for which we will do the search, for example 1 November 2020.
$FindDate=Get-Date -Year 2020 -Month 11 -Day 1
With this information, we can first off target two things. First, show me all Word documents, files only, on the entire C: drive, and keep the error messages to yourself, PowerShell.
Get-ChildItem -Path C:\ -Include *.doc,*.docx -File -Recurse -ErrorAction SilentlyContinue
In the below example, we can use Where-Object to show only files that were created since the day that we stored in $FindDate. This will include everything since 12:00 AM the morning of that day. We will compare the list against the LastWriteTime property, which is the "Last Time the File was Written to".
Get-ChildItem -Path C:\ -Include *.doc,*.docx -File -Recurse -ErrorAction SilentlyContinue | Where-Object { $_.LastWriteTime -ge $FindDate }
We can also filter on a specific day with the following approach, by using the AddDays() method to our $FindDate and give it a range of 24 hours:
Get-ChildItem -Path C:\ -Include *.doc,*.docx -File -Recurse -ErrorAction SilentlyContinue | Where-Object { $_.LastWriteTime -ge $FindDate -and $_.LastWriteTime -le $Finddate.adddays(1) }
Finally we can also specify multiple paths in which we will do the search following, for example, the below command:
Get-ChildItem -Path C:\Users, D:\Example1, E:\Example2 -Include *.doc,*.docx -File -Recurse -ErrorAction SilentlyContinue | Where-Object { $_.LastWriteTime -ge $FindDate -and $_.LastWriteTime -le $Finddate.adddays(1) }
Searching content in files
A first method to search text string in files (not in filenames) we can use the Select-String cmdlet that expects the path as well as the pattern parameter to be strings, so we do not need to use quotation marks for either the pattern or the path. For example, we can use the following command to search the C:\fso folder for files that have the .txt file extension, and contain a pattern match for "success" text string:
Select-String -Path C:\fso\*.txt -pattern success
The output of this command will be zero or more lines in the format of file.txt:1:success that will show, in this example, that at Line 1 it founds the "success" text string, in the file.txt.
Select-String cmdlet also support, for -Path and -Pattern, array of strings, so we can, for example, also execute commands like the following, to search content in multiple file types or files and/or for multiple patterns:
Select-String -Path C:\fso\*.txt, C:\fso\*.log -pattern success
Select-String -Path C:\fso\*.txt, C:\fso\*.log -pattern success,failure
In addition to directly using the -Path parameter in the Select-String cmdlet, we can use the Get-Childitem cmdlet for more granular control over the files to be parsed.
In the following command, we use the dir command (an alias for the Get-ChildItem cmdlet, and the -R and -I switches are the same of the -Recurse and -Include used by Get-ChildItem) and provide the path of C:\fso, then we include only .txt and .log files and finally we will pipe the results to the Select-String cmdlet and look for the pattern fail (-Pattern is the default parameter and therefore is omitted in the command). The long version of the command is shown here:
Get-ChildItem -Path C:\fso -Include *.txt, *.log -Recurse | Select-String -Pattern success
Here is an example of the shorter form of the command:
dir C:\fso -I *.txt, *.log -R | Select-String success
Search in Office files
The last part of this article will be focused on how to search content in Microsoft Office files, because the Select-String approach will not work for these kind of files.
A COM approach (ComObject) in Powershell to get the ability to read an search in Worksheets in Excel is the following (the COM approach is usually slow and it requires to have Microsoft Office installed on the system on which the script will run, so keep all of this in mind):
Set-StrictMode -Version latest Function Search-Excel { [cmdletbinding()] Param ( [parameter(Mandatory)] [ValidateScript({ Try { If (Test-Path -Path $_) {$True} Else {Throw "$($_) is not a valid path!"} } Catch { Throw $_ } })] [string]$Source, [parameter(Mandatory)] [string]$SearchText #You can specify wildcard characters (*, ?) ) $Excel = New-Object -ComObject Excel.Application $Files = Get-Childitem $Source -Include *.xlsx,*.xls -Recurse | Where-Object { !($_.psiscontainer) } Foreach ($File In $Files) { Try { $Source = Convert-Path $File } Catch { Write-Warning "Unable locate full path of $($Source)" BREAK } $Workbook = $Excel.Workbooks.Open($File) ForEach ($Worksheet in @($Workbook.Sheets)) { # Find Method https://msdn.microsoft.com/en-us/vba/excel-vba/articles/range-find-method-excel $Found = $WorkSheet.Cells.Find($SearchText) #What If ($Found) { # Address Method https://msdn.microsoft.com/en-us/vba/excel-vba/articles/range-address-property-excel $BeginAddress = $Found.Address(0,0,1,1) #Initial Found Cell [pscustomobject]@{ WorkSheet = $Worksheet.Name Column = $Found.Column Row =$Found.Row Text = $Found.Text Address = $BeginAddress Path = $File.FullName } Do { $Found = $WorkSheet.Cells.FindNext($Found) $Address = $Found.Address(0,0,1,1) If ($Address -eq $BeginAddress) { BREAK } [pscustomobject]@{ WorkSheet = $Worksheet.Name Column = $Found.Column Row =$Found.Row Text = $Found.Text Address = $Address Path = $File.FullName } } Until ($False) } Else { Write-Warning "[$($WorkSheet.Name)] Nothing Found!" } } $workbook.close($false) } [void][System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$excel) [gc]::Collect() [gc]::WaitForPendingFinalizers() Remove-Variable excel -ErrorAction SilentlyContinue }
To use the above function, for .xlsx and .xls files, we can see the following example (the function will recurse in all folders and subforlders if you will give a simple path with no Excel file specified):
### It will search for the text string defined in the Excel file specified: Search-Excel -Source "C:\Results\Try.xlsx" -SearchText Try ### It will recurse for all Excel files and will search for the text string specified: Search-Excel -Source "C:\Results\" -SearchText Try
Here it is, instead, a way to search a specific string in multiple Word files, like Excel files above, in a function form:
Set-StrictMode -Version latest Function Search-Word { [cmdletbinding()] Param ( [parameter(Mandatory)] [ValidateScript({ Try { If (Test-Path -Path $_) {$True} Else {Throw "$($_) is not a valid path!"} } Catch { Throw $_ } })] [string]$Source, [parameter(Mandatory)] [string]$SearchText #You can specify wildcard characters (*, ?) ) $Word = New-Object -ComObject Word.Application $Files = Get-Childitem $Source -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) } Foreach ($File In $Files) { Try { $Source = Convert-Path $File } Catch { Write-Warning "Unable locate full path of $($Source)" BREAK } $Document = $Word.Documents.Open($File.FullName,$false,$true) $Range = $Document.Content If($Range.Text -Match $SearchText){ [pscustomobject]@{ File = $File.FullName Match = $SearchText Text = $Matches[0] } } Else { Write-Warning "[$($File.FullName)]: Nothing Found!" } $Document.close($false) } [void][System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$Word) [gc]::Collect() [gc]::WaitForPendingFinalizers() Remove-Variable Word -ErrorAction SilentlyContinue }
To use the above function, for .docx and .doc files, we can see the following example (the function will recurse in all folders and subforlders if you will give a simple path with no Word file specified):
### It will search for the text string defined in the Word file specified: Search-Word -Source "C:\Results\Try.docx" -SearchText computer ### It will recurse for all Word files and will search for the text string specified: Search-Word -Source "C:\Results\" -SearchText computer
The DocumentFormat.OpenXml approach
Another way of searching and extracting data with Powershell will be using DocumentFormat.OpenXml assembly and importing it in Powershell, and then extracting properties and methods to manipulate Word, Excel and Powerpoint files, but this will be investigated more in a next article.
The main features of DocumentFormat.OpenXml are:
- High-performance generation of word-processing documents, spreadsheets, and presentations.
- Populating content in Word files from an XML data source.
- Splitting up (shredding) a Word or PowerPoint file into multiple files, and combining multiple Word/PowerPoint files into a single file.
- Extraction of data from Excel documents.
- Searching and replacing content in Word/PowerPoint using regular expressions.
- Updating cached data and embedded spreadsheets for charts in Word/PowerPoint.
- Document modification, such as removing tracked revisions or removing unacceptable content from documents.