Category Archives: Uncategorized

Associated Links:

OpenAI

Leave a reply

OpenAI and Windows Resource to HTML form

I wanted to see if OpenAI could convert a dialog box windows resource to an HTML form. Here is the dialog code followed by resulting HTML code:

				
					IDD_BOOKMARK_FROM_LINKS DIALOGEX 0, 0, 211, 125>STYLE DS_SETFONT | DS_MODALFRAME | DS_FIXEDSYS | WS_POPUP | WS_CAPTION | WS_SYSMENU

CAPTION "Bookmark from links"
FONT 8, "MS Shell Dlg", 400, 0, 0x1
BEGIN
DEFPUSHBUTTON   "OK",IDOK,24,104,50,14
PUSHBUTTON      "Cancel",IDCANCEL,79,104,50,14
GROUPBOX        "Options",IDC_STATIC,7,52,197,44
CONTROL         "Remove leaders and page numbers",IDC_REMOVE_LEADERS_AND_PAGE_NUMBERS,
"Button",BS_AUTOCHECKBOX | WS_TABSTOP,19,80,152,10
CONTROL         "Use only first line of text from the link",IDC_USE_ONLY_FIRST_LINE_OF_TEXT_FROM_LINK,
"Button",BS_AUTOCHECKBOX | WS_TABSTOP,19,67,169,10
LTEXT           "From:",IDC_STATIC,15,20,20,8
EDITTEXT        IDC_TRANSCRIPT_PAGE_FROM,40,18,33,14,ES_AUTOHSCROLL | ES_NUMBER
CONTROL         "",IDC_TRANSCRIPT_PAGE_FROM_SPIN,"msctls_updown32",UDS_SETBUDDYINT | UDS_ALIGNRIGHT | UDS_AUTOBUDDY | UDS_ARROWKEYS,51,5,11,14
LTEXT           "To:",IDC_STATIC,88,20,10,8
CONTROL         "end of file",IDC_TO_END_OF_FILE,"Button",BS_AUTORADIOBUTTON | WS_GROUP,106,15,47,10
CONTROL         "page:",IDC_TO_PAGE,"Button",BS_AUTORADIOBUTTON,106,27,34,10
EDITTEXT        IDC_TRANSCRIPT_PAGE_TO,140,26,33,14,ES_AUTOHSCROLL | ES_NUMBER
CONTROL         "",IDC_TRANSCRIPT_PAGE_TO_SPIN,"msctls_updown32",UDS_SETBUDDYINT | UDS_ALIGNRIGHT | UDS_AUTOBUDDY | UDS_ARROWKEYS,160,17,11,14
GROUPBOX        "Pages",IDC_STATIC,7,4,197,43
PUSHBUTTON      "Help",IDC_BOOKMARKS_FROM_LINKS_HELP,135,104,50,14
END

				
					

Bookmark from links







Options

Remove leaders and page numbers


Use only first line of text from the link

From:



Pages

end of file


page:

We are going to be taking some of our plug-in code and port into a cloud environment so I was having a look at how this could be automated or at least semi-automated. The results above are completely unedited. In the next few days we will have a look at the underlying C++ code to see if that can be automated in any way.

Contact Info:

Variable Data Printing Standards

Leave a reply

Variable Data Printing Standards: what matters in 2025

Updated: 16 Aug 2025 • Reading time: ~8 minutes

Variable Data Printing (VDP) shows up in statements, labels, loyalty mailers, shipping docs, tickets, and more. Standards make these jobs reliable across tools, DFEs, and presses. This guide explains which standards exist, what each one is for, and how to choose the right path for your jobs.

TL;DR

PDF/VT is the ISO standard for VDP. Today that means PDF/VT‑1 and ‑2 on PDF 1.6, and PDF/VT‑3 on PDF 2.0 with PDF/X‑6. See the PDF Association’s overview of ISO 16612 and ISO’s page for PDF/X‑6 (ISO 15930‑9).
PPML is an XML VDP language from the PODi community, still supported in some workflows. The spec is hosted by PRINT Technologies: PPML 2.2 PDF.
AFP/IPDS remains common in high‑volume transactional print. See the AFP Consortium and an update on the IPDS reference.
JDF/JMF (and XJDF) handle job tickets and messaging for workflow automation. Start at CIP4 and the JDF/XJDF specifications.
Color management uses ICC profiles. See ISO 15076‑1 and the latest ICC profile spec ICC.1:2022‑05.
In practice, PDF workflows dominate modern VDP because RIPs optimize reused static resources well. See Global Graphics’ white paper High‑performance VDP using PDF.

The landscape at a glance

Area	Primary standard(s)	What it does	Typical use
VDP document format	PDF/VT (VT‑1/2 on PDF 1.6; VT‑3 on PDF 2.0 via PDF/X‑6)	Encodes variable and transactional jobs in PDF with record structure and metadata	General VDP, statements, labels, hybrid jobs
Alternative VDP PDL	PPML	XML language for personalized print	Installed bases, legacy or vendor‑specific flows
Transactional print architecture	AFP with IPDS	PDL plus bi‑directional printer stream	Banks, utilities, large mailers
Job tickets and workflow	JDF/JMF and XJDF	Job intent, resources, status messaging	MIS to DFE automation
Print‑ready PDF base	PDF/X‑4 (ISO 15930‑7), PDF/X‑5 (ISO 15930‑8), PDF/X‑6 (ISO 15930‑9)	Exchange profiles that PDF/VT builds on	Prepress handoff, proofing
Color management	ISO 15076‑1 and ICC.1:2022‑05	Device‑independent color and output consistency	All of the above
Barcodes in VDP	GS1 DataMatrix, GS1 Digital Link	2D carriers for retail and supply chain	Labels, POS, track‑and‑trace

PDF/VT today

What it is. PDF/VT specifies how to carry variable and transactional jobs in PDF, defining document structure and metadata so devices and DFEs can process large runs reliably. Earlier profiles PDF/VT‑1 and PDF/VT‑2 are based on PDF/X‑4 and PDF/X‑5 (PDF 1.6). The newest profile, PDF/VT‑3, aligns with PDF 2.0 via PDF/X‑6. See the PDF Association’s ISO 16612 page and ISO’s listings for 16612‑3 and 15930‑9.

Why you care. PDF/VT works with mainstream PDF RIPs and supports transparency, ICC color, and external resources per the underlying PDF/X rules.

Tip: If you are moving to PDF 2.0 in prepress, target PDF/X‑6 and PDF/VT‑3 so the whole toolchain is consistent.

PPML in brief

PPML is an XML‑based language for VDP created in the PODi community and maintained under PRINT Technologies. If you inherit PPML jobs and your DFE handles them well, conversion is optional. See the spec PPML 2.2. For the historical PPML/VDX exchange format, see the CGATS application note CGATS.20 PPML/VDX and the ANSI preview here.

AFP and IPDS for high‑volume work

AFP (MO:DCA) is a mature architecture aimed at large, fast, reliable transactional print. IPDS is the bi‑directional stream between host and printer for page data and feedback. See the AFP Consortium and IPDS updates here.

Workflow plumbing: JDF, JMF, and XJDF

JDF is the job ticket, JMF is the messaging. They coordinate job intent, resources, and device status across MIS, prepress, and finishing. XJDF is the modernized successor. Start at CIP4 and the specification library.

How PDF/VT and PDF/X fit together

PDF/X‑4 and X‑5 are based on PDF 1.6. X‑5 allows external content and n‑colorant profiles. See ISO 15930‑7 and ISO 15930‑8.
PDF/X‑6 is based on PDF 2.0 and is the foundation for PDF/VT‑3. See ISO 15930‑9.

Color management

Use ICC v4 or later profiles aligned with ISO 15076. Keep device profiles stable and embed or reference correctly per your PDF/X flavor. See ISO 15076‑1 and the latest ICC spec ICC.1:2022‑05.

Barcodes inside VDP: GS1 2D

GS1 DataMatrix guideline: overview and technical intro.
GS1 Digital Link standard: spec and general intro here.

Performance tips that matter on press

Reuse static content aggressively. Cache logos, backgrounds, and fonts so your RIP does less work per record. See Global Graphics’ VDP performance guide.
Stay within PDF/X expectations. Avoid unwanted overprints, ensure correct transparency handling, and size images for final use. See the Ghent Workgroup specs.
Use record structure when your DFE benefits from it. PDF/VT’s DPart and DPM are designed for this. See the PDF Association’s Technical Introduction to PDF/VT.
Designer and developer guidance for efficient VDP PDFs is available from the PDF Association: Best Practice (Designer Edition) and the Global Graphics overview Best Practice VDP.

Choosing the right standard

You have…	Consider	Why
Modern PDF workflow and RIPs, need flexibility across devices	PDF/VT on PDF/X‑4 or X‑6	Plays to PDF strengths and DFE optimization. Easy to preflight and archive.
Existing PPML devices or legacy jobs	PPML	Keep what works if your engine supports it and performance is acceptable.
Enterprise transactional environment with AFP fleet	AFP/IPDS	Tight integration, throughput, and reliability with existing infrastructure.
Complex automation from MIS to finishing	JDF/JMF or XJDF	Ticketing plus device messaging across vendors.

Validation and testing

Preflight against the correct PDF/X flavor before calling a job PDF/VT compliant. See GWG preflight guidance and callas’ white paper Everything you ever needed to know about PDF preflight.
Use test suites when moving platforms. The PDF Association hosts the Cal Poly PDF/VT‑1 Test File Suite.
Barcode verification for GS1 2D is essential on labels and retail. Follow sizing and quiet‑zone rules in the GS1 DataMatrix guideline.

Quick FAQ

Is PDF/VT mandatory for VDP?
No. It is the ISO way to do VDP in PDF, and it aligns well with today’s PDF/X and RIPs. Many shops still run PPML or AFP where it fits.

What changed with PDF/VT‑3?
VT‑3 aligns VDP with PDF 2.0 and PDF/X‑6, which clarifies features and interoperability in modern toolchains. See ISO 16612‑3 and ISO 15930‑9.

Do I need JDF if I already use PDF/VT?
Not required. JDF/JMF or XJDF handles workflow and device messaging, while PDF/VT carries pages and records. Many shops use both.

Where Mapsoft fits

If you want to generate personalized PDFs from data, our Engage tools do that from templates and a data source, producing PDF ready for print workflows.

Engage Connect – server and web integration for VDP and web to print.
Mail Merge for PDF – desktop workflow to create personalized PDF output.

Sources and further reading

PDF/VT overview and resources: PDF Association; ISO 16612‑3.
PDF/X family: PDF Association; ISO entries for 15930‑7 (PDF/X‑4), 15930‑8 (PDF/X‑5), 15930‑9 (PDF/X‑6).
PPML and PPML/VDX: PPML 2.2; CGATS.20 App Note; PPML/VDX info.
AFP/IPDS: AFP Consortium; IPDS reference update.
ICC color: ISO 15076‑1; ICC.1:2022‑05.
GS1: GS1 DataMatrix Guideline; GS1 Digital Link Standard.
VDP performance with PDF: Global Graphics white paper.
Best practices for VDP PDFs: PDF Association; Global Graphics Best Practice VDP.
PDF/VT test files: Cal Poly PDF/VT‑1 Test Suite.

Understanding Variable Data Printing Standards

What is a PDF file? Unlock the Benefits Today!

Leave a reply

PDF

What is a PDF file? A PDF (Portable Document Format) file represents documents in a manner independent of the hardware, software, and operating system used to create them. This portability allows any device with a PDF reader to open and view PDF files.

PDF files employ a unique file format that uses the PostScript programming language for document representation. The structure of a PDF file comprises several different elements:

A header that contains information about the PDF file format’s version, encryption (if any), and other metadata.
A body that contains the document’s content.
Text, images, vector graphics, and other types of data can all be included.
An index that allows the PDF reader to locate specific objects within the file quickly.
A table of contents or outline that allows the user to navigate through the document.
A cross-reference table, which is optional, that lists the location of each object in the file.

An optional trailer containing file information such as the index location and total number of objects in the file.

PDF files can also have extra features like interactive forms, annotations, and hyperlinks.

A PDF file’s internal structure consists of a series of objects that represent the document’s various elements, such as text, images, and interactive elements. The document object model is a tree-like structure that organizes these objects to represent the logical structure of a document and to define the relationships between its various objects.

A PDF file contains a series of dictionaries and streams that contain metadata and other information about the document. This includes information such as the document’s author, title, and subject, as well as the fonts and colors used.

PDF files have a specific structure that includes a variety of objects.

The document information dictionary contains metadata about the PDF document such as the title, author, subject, and keywords.
Page tree: A tree-like structure that defines the layout and order of the PDF document’s pages.
Pages: A page object represents each page in a PDF document, defining the size and orientation of the page as well as any content displayed on the page.
Content streams: These streams contain the PDF document’s actual content, such as text, images, and graphics. To reduce the size of the PDF file, you usually compress the content streams.
Resources are objects that the content streams use to display the content of the PDF document. Fonts, images, patterns, and color spaces are examples of resources.
You can add interactive elements like links, buttons, and form fields to a PDF document by using annotations. Outlines: These objects define the PDF document’s hierarchical structure, allowing the user to navigate through the document by clicking on headings or other defined elements.

PDF files include metadata, which provides information about the document not contained in the content itself.This can include keywords, the author of the document, and the software used to create it.

You can use a variety of software tools to create PDF files, including Adobe Acrobat, LibreOffice, and GhostScript. Adobe Acrobat or other PDF viewers allow for viewing and editing these files.

Mapsoft and PDF

If you’re looking for a company with unparalleled expertise in PDF, look no further than Mapsoft. Our Technical Director, Michael Peters, was instrumental in developing the first-ever PDF Export for Adobe PageMaker in collaboration with Adobe Systems, Inc. With years of experience in this domain, we have an array of plug-ins that operate seamlessly within Adobe Acrobat, and we’re also proud to be an OEM licensee of the Adobe PDF Library. Whether you need customized PDF solutions or products, we’ve got you covered. Get in touch with us today to learn more.

Contact Info:

What is a PDF file and Why It Matters

Why Plugins Matter?

Leave a reply

Plugging Plug-ins – Why Third-Party Software Matters

Any professional racing driver will tell you that there’s no such thing as too much power. Give them a new, 1000-horsepower engine and after 5 laps, they’ll pull into the pits and say: “Great, but can you give me 1100bhp?” It’s just the same with software – especially software that’s as versatile as Adobe Acrobat and CC products such as Adobe InDesign, Adobe Illustrator and Adobe Photoshop.

The Inevitable Limitations of Software Applications

No matter how powerful, flexible or easy-to-use the application, as soon as users get to grips with it, they’ll find it doesn’t quite do exactly what they want it to. Or they’ll want it to be just that bit easier to do a certain function or perhaps be able to batch functions together. This isn’t greed, or customers being niggly – on the contrary, it’s actually a compliment that the original application is proving useful. It simply underlines that there’s no such thing as the perfect program.

Bridging User Needs with Third-Party Solutions

Users often don’t express their exact needs initially. Instead, they highlight desired improvements to existing solutions. This scenario opens opportunities for third-party developers and their plugins. These developers typically engage closely with user communities, such as forums, to understand their needs. Questions like “How can I do this?” or “Is there a tool for that?” signal potential market gaps.

For instance, repeated requests to mask sensitive information in PDF documents indicate a demand for new solutions. This was the case for Mapsoft. By aligning closely with the Adobe user community, Mapsoft identified and filled such needs, expanding its range of plugins.

Among its offerings, Impress Pro stands out. This plugin allows adding text stamps to documents, serving as watermarks or headers and footers. Other innovative solutions include MaskIt, for hiding confidential content, and DogEars, a tool that marks pages for easy reference, akin to a physical bookmark. Additionally, TOCBuilder offers the creation of a linked and printable table of contents, enhancing document navigation.

So what should you look for in a third-party developer?

Evaluating a Developer’s Endorsement and Partnerships

Firstly, consider if the developer is endorsed by the main vendor’s partner programme. This is crucial. For instance, Mapsoft, an Adobe Business Partner, boasts over 30 years of experience developing plugins for Adobe products.

Assessing Product Integration with Main Vendor’s Technology

Secondly, evaluate how the developer’s products integrate with the main vendor’s technology. Products should be developed using the main vendor’s core technology to ensure reliability and seamless functionality. Mapsoft exemplifies this by licensing A dobe’s core technology for their plugins and customized products.

Opportunities for Product Evaluation

Thirdly, check if the product can be evaluated before purchase. This is vital to ensure it meets user needs. Developers confident in their solutions typically offer evaluation versions. Mapsoft, for example, provides free evaluation versions of all their plugins on their website.

Developer Support and User References

Finally, consider the developer’s support and the availability of user references. This indicates a long-term commitment to quality and customer satisfaction. With over 30 years in the sector and partnerships with high-profile companies like Network Rail, Xerox, and Hallmark Cards, Mapsoft demonstrates its expertise and dedication. They also offer one year of free support for their software solutions.

Conclusion

By keeping these points in mind, you can ensure you choose effective and reliable plugins that enhance your main application, streamline tasks, and add valuable features and functionality.

Contact info:

Is PDF accessible?

Leave a reply

Is PDF accessible?

Overview

Accessibility in software refers to the design and development of software that is usable by people with disabilities. Keyboard shortcuts, screen reader compatibility, and high contrast modes are examples of such features. It also includes ensuring that the software can be used with assistive technology, such as screen readers and magnifiers, and that it can be navigated using only a keyboard. Text-to-speech and speech-to-text functionality can also be included in accessible software, making it easier for people with disabilities to interact with the software.

Accessible software is essential because it ensures that everyone, regardless of ability, can use and benefit from it. Making sure that people with disabilities have equal access to information and technology is not only a legal requirement, but also a moral imperative.

The Portable Document Format (PDF) is a file format developed by Adobe Systems. PDF makes it possible to distribute documents with original formatting intact. PDF files are created by scanning an original print document or by using a variety of popular software applications.

Accessibility

The popularity of PDF has created concerns about accessibility, particularly for users of screen readers and for those who have low vision. While Adobe has taken steps to permit access to those who use screen readers, it is essential that documents be correctly marked up (commonly referred to as “tagged”) so that screen readers have the information they need to identify items such as headings and alt text for images. Tables must also be marked up so that screen reader users can navigate them and clearly understand the association of data with appropriate column and row names.

Tagged PDF

Few authors are currently creating tagged PDF files, either because this requires additional effort or because of lack of awareness. Authors are also limited by the capabilities of their word processing or desktop publishing tools, many of which have PDF export capabilities that do not currently support tagged PDF. Microsoft Office, particularly with its most recent versions, does provide good PDF exporting, assuming that appropriate styles are used when first creating a document in Word.

Available Documentation

Adobe provides accessibility documentation at adobe.com/accessibility. Among other resources available from this site, Adobe has developed a variety of Acrobat accessibility training resources that describe in detail the process of creating accessible PDF documents using Word, InDesign, and Acrobat.

Support In Operating Systems

PDF accessibility also requires support from operating system and assistive technology developers. In Microsoft Windows, both JAWS and NVDA support tagged PDF. However, there is currently no support for tagged PDF in other operating systems.

Is PDF the Correct Choice of Format

Despite advances in accessibility, many users and advocacy groups continue to recommend that PDF documents be accompanied, or replaced, by alternative format documents that are more universally accessible, such as HTML. PDF unfortunately is still not indexed as well as HTML and so if content is to be used for SEO then it is often converted to HTML.

Contact Info:

Adobe PDF Base-14 Fonts

Leave a reply

Adobe PDF Base-14 Fonts

A number of fonts are included with Adobe Acrobat and therefore don’t need to be embedded in PDF files. In our products Impress, Impress Pro and TOCBuilder these fonts are marked in the font lists in Red:

4 font sets in the Helvetica family: normal, bold, and bold italic, with any size. XSL-FO “sans-serif” font family is normally mapped to “Helvetica”.
4 font sets in the Times family: normal, bold, and bold italic, with any size XSL-FO “serif” font family is normally mapped to “Times”.
4 font sets in the Courier family: normal, bold, and bold italic, with any size. XSL-FO “monospace” font family is normally mapped to “Courier”.
1 font sets in the Symbol family: normal, with any size. “Symbol” is normally used for Greek alphabets and some symbols like: Ω, φ, ≠, ©.
1 font sets in the ZapfDingbats family: normal, with any size. “ZapfDingbats” is normally used for Zapf dingbats like: ✌, ✍, ❀, ☺.

Also see our blog on the demise of Type 1 fonts: Type 1 Font Support Ending

Contact Info:

Camelot Project – the Precursor to PDF and Acrobat

Leave a reply

The Camelot Project

J. Warnock

This document describes the base technology and ideas behind the project named “Camelot.” This project’s goal is to solve a fundamental problem that confronts today’s companies. The problem is concerned with our ability to communicate visual material between different computer applications and systems. The specific problem is that most
programs print to a wide range of printers, but there is no universal way to communicate and view this printed information electronically. The popularity of FAX machines has given us a way to send images around to produce remote paper, but the lack of quality,
the high communication bandwidth and the device specific nature of FAX has made the solution less than desirable. What industries badly need is a universal way to communicate documents across a wide variety of machine configurations, operating systems and communication networks. These documents should be viewable on any display and should be printable on any modern printers. If this problem can be solved, then the fundamental way people work will change.

The invention of the PostScript language has gone a long way to solving this problem. PostScript is a device independent page description language. Adobe’s PostScript interpreter has been implemented on over 100 commercially available printer products.
These printer products include color machines, high resolution
machines, high speed machines and low-cost machines. Over 4000 applications output their printed material to PostScript machines.
This support for PostScript as a standard make the PostScript
solution a candidate for this electronic document interchange.

Within the PostScript and Display PostScript context the “view and print anywhere” problem has been implemented and solved. Since most applications have PostScript print drivers, documents from a wide variety of applications can be viewed from operating systems that use Display PostScript. PostScript files can be shipped around communication networks and printed remotely. “Encapsulated PostScript” is a type of PostScript file that can be used by many applications to include a PostScript image as part of a page the application builds.

The reason the Display PostScript and PostScript solutions are not a total solution in today’s world is that this solution requires powerful desktop machines and PostScript printers. The Display PostScript and PostScript solutions are the correct long-term solution as the power of machines increases over time, but this solution offers little help for the vast majority of today’s users with today’s machines.

The Camelot Project is an attempt to define technologies and
products that will give the value that Display PostScript and
PostScript delivers to the vast number of installed machines that exists today. For the purposes of this discussion these machines include 640K Intel 286/386/486 machines (PC compatibles), Apple Macintosh machines, mainframes, and workstations. The displays must include CGA, EGA, VGA and any other higher resolution or color displays supported by the above machines.

Our vision for Camelot is to provide a collection of utilities,
applications, and system software so that a corporation can
effectively capture documents from any application, send electronic versions of these documents anywhere, and view and print these documents on any machines.

There are at least two technical approaches to the Camelot project. Both solutions depend on the PostScript technology. One approach is to try to make Display PostScript and PostScript implementations smaller and faster so that they can run on the vast majority of today’s machines. This approach has been tried and is extremely difficult.

A second approach is to divide the problem into smaller problems. This approach would allow each piece to run independently on the smaller machines while achieving acceptable performance and a solution for the complete problem. This latter approach requires that the problem be divided in a way that is natural for users, and provides a solution for every user. An approach to the Camelot project will now be described that will divide the problem into smaller pieces. This solution depends on a unique property of the PostScript language.

PostScript, as an interpretive language, has some properties that other interpretive languages do not have. In particular, the semantics of operators is not fixed. Operators can be redefined to have any desired behavior. This property of PostScript allows the execution of a PostScript file to have side effects that are very different from the normal printing of a page. An example might be instructive. Suppose a PostScript file draws 10 sided polygon with the following PostScript procedure:

				
					/poly 

    {1 0 moveto 

        /ang 36 def 

        10 {ang cos ang sin lineto 

         /ang ang 36 add def 

     }repeat 

 }def

This procedure will build a path that is a ten sided polygon. In this procedure the verbs: “moveto” and “lineto” have the standard semantics of building a PostScript path within the PostScript Language.

By redefining “moveto” and “lineto” very different things can
happen. For example, if these operators are defined as follows:

				
					/moveto 

    {exch writenumber writenumber (moveto) writestring}def 

/lineto 

    {exch writenumber writenumber (lineto) writestring}def

then when the “poly” procedure is executed a file is written that has the following contents:

				
					1.0 0.0 moveto 

0.809 0.588 

lineto
0.309 0.951 

lineto
-0.309 0.951 

lineto
-0.809 0.588 

lineto
-1.0 0.0 

lineto
-0.809 -0.588 

lineto
-0.309 -0.951 

lineto
0.309 -0.951 

lineto
0.809 -0.588 

lineto
1.0 0.0 

lineto

In this example the new redefined “moveto” and “lineto” definitions don’t build a path. Instead they write out the coordinates they have been given and then write out the names of their own operations.
The resulting file that is written by these new definitions draws the same polygon as the original file but only uses the “moveto” and “lineto” operators. Here, the execution of the PostScript file has allowed a derivative file to be generated. In some sense this derivative file is simpler and uses fewer operators than the original PostScript file but has the same net effect. We will call this operation of processing one PostScript file into another form of PostScript file “rebinding.“

The above example illustrates a capability of the PostScript language that is not frequently used. This “rebinding” of the language, however, is extremely valuable. The Camelot project depends on variations on this idea.

The approach we will take with Camelot is to define a new language of operators and conventions. For the purposes of this discussion we will call this language “Interchange PostScript” or IPS. IPS will primarily contain the graphics and imaging operators of PostScript.
The language will be defined so that any IPS file is a valid PostScript file. The file will have the appropriate baggage so that it is a valid EPS file. IPS files will print on PostScript printer and will be able to be used by applications that accept EPS files. IPS will also be structured so that the complete PostScript parser is not necessary to read any file written in IPS. IPS will have an adequate set of operators so that any practical document expressed in PostScript can be represented in IPS. There will be situations in IPS where the IPS file cannot represent visual situations that can be theoretically generated in PostScript. However we believe these situations are extremely rare, and all practical application documents can be represented efficiently in IPS. The right way to think about IPS is as it relates to English. No person in the world knows every English word, but a small subset of the English words, and certain usage patterns enable people to consistently communicate.

Once we have defined IPS, we will build a version of the PostScript interpreter (IPS binder) that will read any PostScript file and rebind that file into an IPS file. The IPS binder can be quite small in that it does not need the graphics, font or device machinery contained in full PostScript interpreter. Another function of the IPS binder will be to include reconstituted fonts into the IPS file. The idea here is to include just the characters of a font that are actually used in the document. A result of including the necessary characters from the fonts used is that an IPS file will be completely self contained. In other words, when I send a file around the country, I don’t have to worry about whether the receiving location has all the fonts required
by the document. The current situation is that complex font
substitution schemes are used to deal with locations not having the appropriate fonts.

Once IPS is defined and the IPS binder implemented, then users can capture any PostScript file emitted by a PostScript driver, and convert that file to a self contained IPS file. This file can be shipped anywhere around the network and printed on any PostScript machine (management utilities will be written to ease this printing process.)

In addition to the IPS binder, a viewer and browser will be written that will read IPS files, and render those files on displays or to dumb raster printers. It is believed that IPS interpreters can be substantially simpler, and smaller than full PostScript interpreters. It is also believed that an IPS interpreter can have acceptable performance on small machines. The real hope is to make the IPS viewer and browser small enough so that it can co-exist with other applications. It is interesting to think about what those applications can be.

One obvious application for the IPS viewer is in its use in electronic mail systems. Imagine being able to send full text and graphics documents (newspapers, magazine articles, technical manuals etc.) over electronic mail distribution networks. These documents could be viewed on any machine and any selected document could be printed locally. This capability would truly change the way information is managed. Large centrally maintained databases of documents could
be accessed remotely and selectively printed remotely. This would save millions of dollars in document inventory costs.

Specific large visual data bases like the value-line stock charts,
encyclopedias, atlases, Military maps, Service Manuals, Time-Life Books etc. could be shipped on CD-ROM’s with a viewer. This would allow full publication (text, graphics, images and all) to be viewed and printed across a very large base of machines.

Imagine if the IPS viewer is also equipped with text searching
capabilities. In this case the user could find all documents that
contain a certain word or phrase, and then view that word or phrase in context within the document.
Entire libraries could be archived in electronic form, and since IPS files are self-contained, would be printable at any location.

One of the central requirements of the Camelot Project is that the IPS file format is device independent. This is essential because it is necessary to be able to print the documents on color or black and white machines — on low or high resolution machines. This requirement is also essential in order to visualize the documents at various magnifications on the screen. For example, it is imperative that the user be able to magnify portions of complex maps, so that subportions of the image are easy to read even on low resolution displays.

To accomplish the above requirement it is necessary that consistent font rendering machinery be available to the viewer. For this reason the viewers will need to contain the full ATM implementations as part of each system.

In considering all the requirements of corporations regarding
documents, it is important to structure Camelot components so that they can be sold in ways that are useful to the corporations. Several ideas have come to mind.

Components of Camelot are generally not interesting to single users. The exception to this is in the distribution of large generally useful databases. If someone produced a CD-ROM with “maps of the world” on it, then one can imagine selling a retail package with one viewer and the CD-ROM.

In most other applications, the distribution of information is to many people. In these latter cases a corporation would like a copy of the viewer for every PC. One can imagine viewers integrated into mail systems, or as general stand-alone browsing systems. In any event corporations should be interested in site-licensing arrangements.
(more to come)

Author: John Warnock

Editor for the purposes of this page: Michael Peters

Associated Links:

History of PDF

Leave a reply

A Short History of PDF (Portable Document Format)

Adobe Systems made the PDF specification available free of charge in 1993. In the early years PDF was popular mainly in desktop publishing workflows and the first PDF Export was created for PageMaker 5 by Mapsoft. PDF competed with a variety of formats such as DjVu, Envoy, Common Ground Digital Paper, Farallon Replica and even Adobe’s own PostScript format.

Released as an ISO standard

PDF was a proprietary format controlled by Adobe until it was released as an open standard on July 1, 2008, and published by the International Organization for Standardization as ISO 32000-1:2008,[5][6] at which time control of the specification passed to an ISO Committee of volunteer industry experts. In 2008, Adobe published a Public Patent License to ISO 32000-1 granting royalty-free rights for all patents owned by Adobe that are necessary to make, use, sell, and distribute PDF-compliant implementations.[7]

PDF 1.7, the sixth edition of the PDF specification and the version accompanying Acrobat version 8 became ISO 32000-1, includes some proprietary technologies defined only by Adobe, such as Adobe XML Forms Architecture (XFA) and JavaScript extension for Acrobat, which are referenced by ISO 32000-1 as normative and indispensable for the full implementation of the ISO 32000-1 specification. These proprietary technologies are not standardized and their specification is published only on Adobe’s website, and many of them are also not supported by popular third-party implementations of PDF.

In December, 2020, the second edition of PDF 2.0, ISO 32000-2:2020, was published, including clarifications, corrections and critical updates to normative references.[13] ISO 32000-2 does not include any proprietary technologies as normative references.[14]

Information taken in part from Wikipedia

Author: Michael Peters

Mapsoft is a member of the PDF Association https://pdfa.org/.

Summary of the Structure of PDF files

Leave a reply

Deeper Insight into the Complex Structure of PDF Files and Their Key Components.

PDF file format structure can be looked upon as a combination of different file types presented in a single container. The reason for this is that a PDF file contains Text, vector art, images, fonts and other file format can be embedded – even the native files that were used to create the PDF in the first place.

The complex structure of PDF files consists of objects where items can be connected directly or indirectly to each other. Often the indirection is because an object might is used multiple times as would be the case for a logo, font, color.

The objects within a PDF file can be divided into the following types:

Dictionaries

A group containing direct or references to indirect objects. Dictionaries can be seen as the glue holding together the elements in a PDF files. The example below shows the structure of a typical page dictionary:

The Contents stream has an attributes dictionary that contains a filter name and the length of the stream

The CropBox array contains the coordinates of the rectangle that defines the area that is visible on the page.

The MediaBox array contains the coordinates of the rectangle that defines the media size. This will typically match a standard media size such as Letter or A4 and will allow the PDF page to be reliably printed on a device that contains these standard media sizes.

The Resources dictionary contains references and information for elements that are needed to reliably output the visual elements of the page such as colors, fonts and Images.

Streams

The collection of operators outputting information onto the page. Normally the stream will also require elements of the page resources dictionary such as colors and fonts. Streams are either stored as a single element or in an array.

				
					q
567.48 61.011 -540 720 re
W* n
q
/GS0 gs
0 720 -541.1399536 0 567.4799194 61.0105438 cm
/Im0 Do
Q
Q
/CS0 cs 0.302 0.302 0.302  scn
1 i 
/GS1 gs
56.7 286.911 m
56.7 295.191 56.7 303.471 56.7 311.751 c
59.1 311.751 61.5 311.751 63.9 311.751 c
63.9 306.831 63.9 301.911 63.9 296.991 c
65.88 296.991 67.8 296.991 69.72 296.991 c
69.72 301.191 69.72 305.391 69.72 309.591 c
72 309.591 74.22 309.591 76.5 309.591 c
76.5 305.391 76.5 301.191 76.5 296.991 c
81.06 296.991 85.62 296.991 90.18 296.991 c
90.18 293.631 90.18 290.271 90.18 286.911 c
79.02 286.911 67.86 286.911 56.7 286.911 c
f*

You can see that there are several references to items in the page resources dictionary:

GS0 is a reference to a graphics state and gs is the operator that sets it.

Im0 is an XObject image and the Do operator draws the image.

CS0 is a reference to a color dictionary and the scn operator assigns it to strokes.

You can also see usage of several path operators re – rectangle, m – moveto, c – curve f* – fill.

Text strings

These can either be ANSI (single byte characters) or Unicode (multi-byte). The example here is the representation of the last date modified in the catalog dictionary.These can either be ANSI (single byte characters) or Unicode (multi-byte). The example here is the representation of the last date modified in the catalog dictionary.

Images

Images are normally held within the page resources and the stream will also have an associated Attributes dictionary that will describe the attributes of the data within the stream. BitsPerComponent size of the data that is used to define a single pixel (dot) within the image. The ColorSpace dictionary describes the colour model that is used to define the colors within the image.

Names

Used normally to provide a name that can be used to refer to a dictionary or dictionary item. For example, the pages dictionary has a name “Type” with the value “Pages” and a single page has a name of “Type” with a value of “Page”.

Arrays

Fixed length data holding types and/or references to other elements. For an example see the Real Numbers example below.

Real numbers

Decimal numbers. In this example they are being used to define the rectangle of the page media box:

Integers

Whole numbers. For example to show the total number of pages in the PDF file.

For further details on pdf file format structure see the PDF Specification at https://www.adobe.com/devnet/pdf/pdf_reference.html

Contact:

Michael Peters