Perl hacker…?

12 April 2006, 10:20 CDT

Any PERL hackers out there? Trying to manipulate (add text to) an existing PDF. I like PDF::Reuse, however:

– No support for the use of non-embedded fonts. This also means that fonts whose entire set of glyphs are not embedded are only partially available for use. Embedding the entire fontset from Adobe InDesign seems to be hit or miss.
– There are issues embedded fonts where the glyph sets are reported incorrectly so the alphabet becomes “DEFGHIJ…XYZABC”. This is a problem when you’re trying to spell English words, because the characters map in order of the appearance of the glyphs … so “A” shows up in the result as “D”, etc.
– prStrWidth() does not function correctly with non-native fonts. When I tried to right justify a string of text, it based the string’s width on Helvetica instead of the font I was actually using. This was actually really easy to get around, by using Font::TTFMetrics, and then instead of calling prText, I just added another subroutine to my script which looks something like this:

## Calculate the left x position needed to make sure
## a string is properly right-justified
## The prText() method in PDF::Reuse does not work with
## non built-in fonts.  It appears to use a default font
## of Helvetica to determine the correct position if the
## face in question is not standard for PDFs.
sub prTextRight {
# Use a baseline resolution of 72, which is standard Postscript.
my ($xpos,$ypos,$txt_string) = @_;
my $str_width_units=$font_metric->string_width($txt_string);
my $str_width_pt=$str_width_units * $font_size * 72 / (72 * $font_metric->get_units_per_em());
my $lpos=$xpos-$str_width_pt;
prText($lpos,$ypos,$txt_string);
return 1;
}

Also, PDF::Reuse uses an estimate to determine the width of the string

$w = $w / 1000 * $FontSize

I have also tried to use PDF::API2 to get around the embedded font problem, because API2 supports using fonts from a ttf file, and does not rely soley on the glyphs embedded in the document. Unfortunatley, API2 is extremely complex – far more than what I need. PDF::API2::Lite does not support using an existing PDF document. Sadly, API2’s CMap bails when using ttf files, but seems to be okay with otf files. API2 is also very very slow compared to PDF::Reuse, which is one of the reasons I prefer Reuse.

Why all this trouble for one PDF? Unfortunatley, it isn’t 1 PDF. It is 88 of them, one for each county in Ohio – each design consistent, but the information unique. If the document is going to be three pages this quickly gets out of hand. Need to make a minor change to the design? Good luck. Are you going to spend the next 3 weeks painstakingly updating all 264 pages? What happens when 2004’s data is replaced with 2005’s? Another 4 weeks of going page by page updating everything and being absolutely sure everything is what it should be and is where it should be. Ugh.

InDesign includes some very rudimentary mailmerge procedures via a plugin and, but it was never meant for more than one or two fields, as opposed to the 40 or 50 per page I’m working with. InDesign also requires a very significant effort to get the data properly formatted so it knows what to work with. Perl lets me call mySQL when I need to, and put each element exactly where I want it at runtime. 🙂

(I’m not terribly interested in GUI solutions like CrystalReports. I want to be able to open a shell, run one command, and have all these PDFs generated. I can build logic into the perl which determines what the most recent year available is, so if I add data to the DB, and then run the script, it will “autoupdate” everything.)

So, any PERL hackers or other hardcore OSS folks out there have any ideas?

—Update—

This might be a fix, albeit a really bad hack. I need to test it a little more and then submit it. Also need to work on a better solution for the strWidth() issue.

--- PDF-Reuse-0.33.alt-fix-usecmap/Reuse.pm     2006-04-12 16:58:57.000000000 -0400
+++ PDF-Reuse-0.33/Reuse.pm     2005-11-15 14:45:31.000000000 -0500
@@ -6,9 +6,6 @@

require    Exporter;
require    Digest::MD5;
-require Font::TTF::Font;
-require Font::TTF::Cmap;
-
use autouse 'Carp' => qw(carp
cluck
croak);
@@ -419,8 +416,7 @@
my $TxT   = shift;
my $align = shift || 'left';
my $rot   = shift || '0';
-  my $fontFileResource = shift;
-
+
my $width = 0;
my $x_align_offset = 0;

@@ -477,22 +473,8 @@
else
{   my $text;
$TxT =~ s/\\(\d\d\d)/chr(oct($1))/eg;
-        ## What is the proper offset to use for the locating the
-        ## glyphs in the font resource?
-        my $cmapOffset = 0x1d;
-        my $fObj = Font::TTF::Font->open($fontFileResource) || 0;
-
-        if ( $fObj ) {
-          my $fCmap=$fObj->{cmap}->read;
-          # We'll use "A" as an arbitrary baseline
-          my $gid = $fCmap->ms_lookup(0x41);
-          $cmapOffset = 0x41 - $gid;
-          $fObj->release; # free the memory
-        } else {
-          warn("Warning!  Unable to open the font file $fontFileResource, using default cmap offset 0x1d");
-        }
for (unpack ('C*', $TxT))
-         {  $text .= sprintf("%04x", ($_ - $cmapOffset));
+         {  $text .= sprintf("%04x", ($_ - 29));
}
$stream .= $xPos+$x_align_offset . " $yPos Td \<$text\> Tj ET\n";
}

Leave a Reply