Metadata in PDF files contain essential information about the document, such as title, author, creation date, modification date, keywords, and more. Extracting this metadata can be immensely beneficial for various applications, including document management systems, data analysis, and automation tasks. In this article, we will delve into how to read metadata from PDF using C#. Here’s a breakdown of the process with an example code to help you understand how to read metadata of PDF using C#.
Steps to Read Metadata from PDF using C#
- Configure your IDE to use GroupDocs.Metadata for .NET for extracting metadata from PDF files
- Create a Metadata object using the file path of the PDF file as a parameter for its constructor
- Establish criteria for validating the collected metadata information
- Specify a condition for using the Metadata.FindProperties method
- Loop through each of the properties one by one
Extracting metadata from PDF files using C# provides developers with valuable insights into document properties such as title, author, creation date, modification date, and keywords. This information can be crucial for document management systems, data analysis, and automated workflows. You can follow above instructions on Windows, macOS, or Linux as long as you have .NET installed. There’s no need to install extra software to extract metadata of PDF in C#. Once you set up the recommended library and adjust the file paths accordingly, you can easily integrate the following code into your projects without any issues or complications.
Code to Read Metadata from PDF using C#
using GroupDocs.Metadata; | |
using GroupDocs.Metadata.Common; | |
using GroupDocs.Metadata.Tagging; | |
using System.Text.RegularExpressions; | |
namespace ReadMetadataFromPDFUsingCSharp | |
{ | |
internal class Program | |
{ | |
static void Main(string[] args) | |
{ | |
// Set License to avoid the limitations of Metadata library | |
License lic = new License(); | |
lic.SetLicense(@"GroupDocs.Metadata.lic"); | |
// Pass absolute or relative path of document to Metadata's constructor | |
using (Metadata metadata = new Metadata(@"input.pdf")) | |
{ | |
if (metadata.FileFormat != FileFormat.Unknown && !metadata.GetDocumentInfo().IsEncrypted) | |
{ | |
Console.WriteLine(); | |
// Fetch all metadata properties that fall into a particular category | |
var properties = metadata.FindProperties(p => p.Tags.Any(t => t.Category == Tags.Content)); | |
Console.WriteLine("The metadata properties describing some characteristics of the file content: title, keywords, language, etc."); | |
foreach (var property in properties) | |
{ | |
Console.WriteLine("{0} = {1}", property.Name, property.Value); | |
} | |
// Fetch all properties having a specific type and value | |
var year = DateTime.Today.Year; | |
properties = metadata.FindProperties(p => p.Value.Type == MetadataPropertyType.DateTime && | |
p.Value.ToStruct(DateTime.MinValue).Year == year); | |
Console.WriteLine("All datetime properties with the year value equal to the current year"); | |
foreach (var property in properties) | |
{ | |
Console.WriteLine("{0} = {1}", property.Name, property.Value); | |
} | |
// Fetch all properties whose names match the specified regex | |
const string pattern = "^author|company|(.+date.*)$"; | |
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase); | |
properties = metadata.FindProperties(p => regex.IsMatch(p.Name)); | |
Console.WriteLine("All properties whose names match the following regex: {0}", pattern); | |
foreach (var property in properties) | |
{ | |
Console.WriteLine("{0} = {1}", property.Name, property.Value); | |
} | |
} | |
} | |
} | |
} | |
} |
In conclusion, this article has provided a comprehensive guide on how to get metadata of PDF in C# programming. By leveraging Metadata library, developers can efficiently extract essential information such as document title, author, creation date, modification date, and keywords from PDF documents. Understanding and utilizing metadata extraction techniques in C# empowers developers to build robust applications for document management, data analysis, and automation tasks. We suggest you to experiment with different PDF files and exploring additional metadata properties can further enhance the capabilities of metadata extraction in C# applications.
During our previous discussion, we provided an in-depth tutorial on extracting metadata from PPTX files using C#. For a more thorough understanding of this topic, we recommend referring to our extensive guide on how to read metadata from PPTX using C#.