Read Metadata from PDF using C#

Metadata in PDF files contain essential information about the document, such as title, author, creation date, modification date, keywords, and more. Extracting this metadata can be immensely beneficial for various applications, including document management systems, data analysis, and automation tasks. In this article, we will delve into how to read metadata from PDF using C#. Here’s a breakdown of the process with an example code to help you understand how to read metadata of PDF using C#.

Steps to Read Metadata from PDF using C#

Configure your IDE to use GroupDocs.Metadata for .NET for extracting metadata from PDF files
Create a Metadata object using the file path of the PDF file as a parameter for its constructor
Establish criteria for validating the collected metadata information
Specify a condition for using the Metadata.FindProperties method
Loop through each of the properties one by one

Extracting metadata from PDF files using C# provides developers with valuable insights into document properties such as title, author, creation date, modification date, and keywords. This information can be crucial for document management systems, data analysis, and automated workflows. You can follow above instructions on Windows, macOS, or Linux as long as you have .NET installed. There’s no need to install extra software to extract metadata of PDF in C#. Once you set up the recommended library and adjust the file paths accordingly, you can easily integrate the following code into your projects without any issues or complications.

Code to Read Metadata from PDF using C#

	using GroupDocs.Metadata;
	using GroupDocs.Metadata.Common;
	using GroupDocs.Metadata.Tagging;
	using System.Text.RegularExpressions;

	namespace ReadMetadataFromPDFUsingCSharp
	{
	internal class Program
	{
	static void Main(string[] args)
	{
	// Set License to avoid the limitations of Metadata library
	License lic = new License();
	lic.SetLicense(@"GroupDocs.Metadata.lic");

	// Pass absolute or relative path of document to Metadata's constructor
	using (Metadata metadata = new Metadata(@"input.pdf"))
	{
	if (metadata.FileFormat != FileFormat.Unknown && !metadata.GetDocumentInfo().IsEncrypted)
	{
	Console.WriteLine();

	// Fetch all metadata properties that fall into a particular category
	var properties = metadata.FindProperties(p => p.Tags.Any(t => t.Category == Tags.Content));
	Console.WriteLine("The metadata properties describing some characteristics of the file content: title, keywords, language, etc.");
	foreach (var property in properties)
	{
	Console.WriteLine("{0} = {1}", property.Name, property.Value);
	}

	// Fetch all properties having a specific type and value
	var year = DateTime.Today.Year;
	properties = metadata.FindProperties(p => p.Value.Type == MetadataPropertyType.DateTime &&
	p.Value.ToStruct(DateTime.MinValue).Year == year);
	Console.WriteLine("All datetime properties with the year value equal to the current year");
	foreach (var property in properties)
	{
	Console.WriteLine("{0} = {1}", property.Name, property.Value);
	}

	// Fetch all properties whose names match the specified regex
	const string pattern = "^author\|company\|(.+date.*)$";
	Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
	properties = metadata.FindProperties(p => regex.IsMatch(p.Name));
	Console.WriteLine("All properties whose names match the following regex: {0}", pattern);
	foreach (var property in properties)
	{
	Console.WriteLine("{0} = {1}", property.Name, property.Value);
	}
	}
	}
	}
	}
	}

view raw Read Metadata from PDF using C#.cs hosted with ❤ by GitHub

In conclusion, this article has provided a comprehensive guide on how to get metadata of PDF in C# programming. By leveraging Metadata library, developers can efficiently extract essential information such as document title, author, creation date, modification date, and keywords from PDF documents. Understanding and utilizing metadata extraction techniques in C# empowers developers to build robust applications for document management, data analysis, and automation tasks. We suggest you to experiment with different PDF files and exploring additional metadata properties can further enhance the capabilities of metadata extraction in C# applications.

During our previous discussion, we provided an in-depth tutorial on extracting metadata from PPTX files using C#. For a more thorough understanding of this topic, we recommend referring to our extensive guide on how to read metadata from PPTX using C#.

GroupDocs Knowledge Base

Find Answers by API

Read Metadata from PDF using C#

Steps to Read Metadata from PDF using C#

Code to Read Metadata from PDF using C#