Read Metadata from PDF using Java

Metadata within PDF files comprise critical details about the document, including the title, author, creation and modification dates, keywords, and other pertinent data. Extracting this metadata can offer significant advantages for a range of applications, from document management systems to data analysis and automation tasks. This article will explore the process of how to read metadata from PDF using Java. Here’s a step-by-step breakdown of the procedure, accompanied by an example code to illustrate how to read metadata of PDF using Java.

Steps to Read Metadata from PDF using Java

Set up your IDE to utilize GroupDocs.Metadata for Java to extract metadata from PDF files
Instantiate a Metadata object using the PDF file path as an argument for its constructor
Set rules to check the collected metadata information
Provide a condition for employing the Metadata.findProperties method
Iterate through each property individually

Extracting metadata from PDF files using Java equips developers with valuable information regarding document properties like title, authorship, creation and modification dates, and keywords. This data plays a critical role in document management systems, data analysis, and automated workflows. You can follow the provided instructions on Windows, macOS, or Linux, as long as Java is installed. No additional software installations are required to extract metadata of PDF in Java. After configuring the recommended library and adjusting file paths as needed, integrating the following code into your projects should be straightforward without any complications or difficulties.

Code to Read Metadata from PDF using Java

	import com.groupdocs.metadata.Metadata;
	import com.groupdocs.metadata.core.FileFormat;
	import com.groupdocs.metadata.core.IReadOnlyList;
	import com.groupdocs.metadata.core.MetadataProperty;
	import com.groupdocs.metadata.core.MetadataPropertyType;
	import com.groupdocs.metadata.licensing.License;
	import com.groupdocs.metadata.search.FallsIntoCategorySpecification;
	import com.groupdocs.metadata.search.OfTypeSpecification;
	import com.groupdocs.metadata.search.Specification;
	import com.groupdocs.metadata.tagging.Tags;
	import java.util.Calendar;
	import java.util.Date;
	import java.util.regex.Matcher;
	import java.util.regex.Pattern;

	public class ReadMetadataFromPDFUsingJava {
	public static void main(String[] args) {

	// Set License to avoid the limitations of Metadata library
	License license = new License();
	license.setLicense("GroupDocs.Metadata.lic");

	Metadata metadata = new Metadata("input.pdf");
	if (metadata.getFileFormat() != FileFormat.Unknown && !metadata.getDocumentInfo().isEncrypted()) {
	System.out.println();

	// Fetch all metadata properties that fall into a particular category
	IReadOnlyList<MetadataProperty> properties = metadata.findProperties(new FallsIntoCategorySpecification(Tags.getContent()));
	System.out.println("The metadata properties describing some characteristics of the file content: title, keywords, language, etc.");
	for (MetadataProperty property : properties) {
	System.out.println(String.format("Property name: %s, Property value: %s", property.getName(), property.getValue()));
	}

	// Fetch all properties having a specific type and value
	int year = Calendar.getInstance().get(Calendar.YEAR);
	properties = metadata.findProperties(new OfTypeSpecification(MetadataPropertyType.DateTime).and(new ReadMetadataFromPDFUsingJava().new YearMatchSpecification(year)));
	System.out.println("All datetime properties with the year value equal to the current year");
	for (MetadataProperty property : properties) {
	System.out.println(String.format("Property name: %s, Property value: %s", property.getName(), property.getValue()));
	}

	// Fetch all properties whose names match the specified regex
	Pattern pattern = Pattern.compile("^author\|company\|(.+date.*)$", Pattern.CASE_INSENSITIVE);
	properties = metadata.findProperties(new ReadMetadataFromPDFUsingJava().new RegexSpecification(pattern));
	System.out.println(String.format("All properties whose names match the following regex: %s", pattern.pattern()));
	for (MetadataProperty property : properties) {
	System.out.println(String.format("Property name: %s, Property value: %s", property.getName(), property.getValue()));
	}
	}
	}

	// Define your own specifications to filter metadata properties
	public class YearMatchSpecification extends Specification {
	public YearMatchSpecification(int year) {
	setValue(year);
	}

	public final int getValue() {
	return auto_Value;
	}

	private void setValue(int value) {
	auto_Value = value;
	}

	private int auto_Value;

	public boolean isSatisfiedBy(MetadataProperty candidate) {
	Date date = candidate.getValue().toClass(Date.class);
	if (date != null) {
	Calendar calendar = Calendar.getInstance();
	calendar.setTime(date);
	return getValue() == calendar.get(Calendar.YEAR);
	}
	return false;
	}
	}

	public class RegexSpecification extends Specification {
	private Pattern pattern;
	public RegexSpecification(Pattern pattern) {
	this.pattern = pattern;
	}

	@Override
	public boolean isSatisfiedBy(MetadataProperty metadataProperty) {
	Matcher matcher = pattern.matcher(metadataProperty.getName());
	return matcher.find();
	}
	}
	}

view raw Read Metadata from PDF using Java.java hosted with ❤ by GitHub

In summary, this article has offered a detailed guide on how to get metadata of PDF in Java. With the Metadata library, developers can effectively retrieve crucial information like document titles, author details, creation and modification dates, and keywords from PDF documents. Mastering metadata extraction techniques in Java enables developers to create robust applications for document management, data analysis, and automation. We encourage you to experiment with various PDF files and explore additional metadata properties to enhance the capabilities of metadata extraction in Java applications further.

In a prior conversation, we presented a detailed tutorial on extracting metadata from PPTX files using Java. For a deeper comprehension of this subject, we suggest consulting our comprehensive guide on how to read metadata from PPTX using Java.

GroupDocs Knowledge Base

Find Answers by API

Read Metadata from PDF using Java

Steps to Read Metadata from PDF using Java

Code to Read Metadata from PDF using Java