Watermarking printouts in the Linux world

Printouts most often are a waste of paper.

But much worse is obsolete stationary. It’s not only as a waste of paper, but a waste of money, too, as companies typically have it printed for money.

With today’s high quality color laser printers in many offices, a better solution can be created by watermarking your printouts accordingly. But while many Windows printer drivers support it right out of the box, Linux and its “Common Unix Printer Subsystem” (CUPS) is surprisingly unprepared for this. Fortunately, as it is Linux, you can create your own solution rather easily.

A really short abstract on CUPS printing

I won’t be going into details here, because learning CUPS is worth an article series all by itself. But to cut things short, it’s sufficient to keep the following facts in mind:

  • When creating your printing infrastructure with Linux, you’re handling Postscript most of the time. Or rather, CUPS is doing so.
  • You can easily create a print server on any Linux box and have all your client stations learn the available printer queues from that box (and you can even have “MS Windows” clients use those queues, too).
  • Clients send their print jobs to their local CUPS instance, which forwards the jobs to the selected printer queue on the print server.
  • A “filter” for that printer queue will handle the conversion from Postscript to the printer-specific format.

Again, there’s no out-of-the-box watermarking component in this picture. But with a few tools (typically available for your Linux distro, too) you can add this functionality yourself.

The goal

We wanted to have various printer queues, one for each combination of printer and type of stationary, with the print server doing all the work – especially mixing in the stationary layout.

By selecting the proper queue, the user would have their print job “watermarked” with what makes up that specific form or letter paper. The printers only need a single paper bin and feed on plain, white paper. Of course, if you require different paper sizes or weight, you’ll still have to have those multiple sources.

The solution

To achieve this, we had to do the following:

  1. Create a PDF that contains the stationary’s design. This was no extra work, as this is already the output of the original process and would be needed for separate printing, too.
  2. Create a tool chain to merge the “watermark” PDF and the actual print job.
  3. Integrate the tool chain into the CUPS processing.

We selected “pdftk” as our central tool to add the watermark to the print job. And since CUPS is using PostScript, which is the same “language” that PDFs use, all we had to do is convert the print job to a PDF, use “pdftk” to watermark, and convert the resulting PDF back to Postscript. All this packed can be packed into a shell script like the following:
#!/bin/bash

logfile=/tmp/watermarkpdf.log
watermark=/etc/cups/watermark.pdf

tempdir=$(mktemp -d)

echo $(date) "$0 $* (tempdir: $tempdir)" >> $logfile

# Command line arguments
job="$1"
user="$2"
title="$3"
numcopies="$4"
options="$5"
filename="$6"

if [ -z "$filename" ] ; then
filename="-"
fi

if [ $# -ge 7 ]; then
cat $6 > $tempdir/ps.in
else
cat > $tempdir/ps.in
fi

# convert Postscript to PDF
/usr/bin/ps2pdf $tempdir/ps.in $tempdir/pdf.in 2>>$tempdir/err

# watermarking
/usr/bin/pdftk $tempdir/pdf.in background "$watermark" output $tempdir/pdf.out 2>>$tempdir/err

# convert PDF to Postscript
/usr/bin/pdftops $tempdir/pdf.out - 2>>$tempdir/err

# clean-up
rm -rf $tempdir

“pdftk” wouldn’t read stdin, so you have to take a slight detour – which helps during debugging, too. And of course there’s plenty that could be improved… starting with a way to specify the watermark PDF via some option, rather than hard-coding it into the script. Also note the seemingly unnecessary “filename” variable – it’s just there to illustrate what to look out for when reworking this sample script.

You may have noticed that while we use “ps2pdf” to convert the input stream, we don’t use “pdf2ps” to convert back. We experienced significant print quality problems and color changes with “pdf2ps”, while “pdftops” handled the job quite nicely. And looking at Stefaan Lippen’s blog page on the subject, there may be other severe drawbacks when using pdf2ps, too. Especially the significant increase in output file size may i.e. impact memory consumption and network traffic for networked Postscript printers.

Place that script in CUPS’ filter directory, which is “/usr/lib64/cups/filter/” in our case (obviously, a server running a 64bit variant of Linux – SLES11 SP1).

Now how to invoke this script? When creating your queue, CUPS creates a PPD (“Postscript Printer Definition”) file from the printer-specific PPD and saves it under /etc/cups/ppd/<queuename>.ppd. All you have to do is add a single line to that file, somewhere in the head section:

*cupsFilter: “application/vnd.cups-postscript 100 watermarkpdf”

The documentation of that command can be found on the CUPS web site. “watermarkpdf” was the name I gave to the script when testing this, so you may need to replace this with your own script’s name.

With a script per (watermarked) queue, you’re ready to rumble. Oh, and if anyone has an easy way of including the watermark file name in individual print jobs or at least in the queue’s PPD, please drop me a note in the comments section. Thanks!

Posted in CUPS, howto, Linux | Tagged , , | 11 Comments

Converting and adding OpenLDAP schema files

This is no “problem report”, but more of a little helper article, the first in my new category “howto”.

There are some articles on the net covering the subject of converting OpenLDAP schema files to LDIF format and then importing the result into your server(s). Unfortunately, all articles I’ve read so far were either too complex in their approach, too simple (leaving out important steps) or contained mistakes. All this for a rather simple task.

From time to time, we need to add another schema to our LDAP servers, but all we have is a .schema file, while our OpenLDAP servers are configured the OLC way (“LDIF configuration”).

The task is simple, done in 5 steps:

  1. create a temporary directory and put a simple dummy config file in there, with a single line to include your schema file
  2. use slaptest to convert to LDIF format
  3. remove sequence information from the created LDIF file name and content, as well as structural information
  4. import the LDIF file to your OpenLDAP server
  5. clean up the temporary directory

You can use our little helper script (schema2ldif.sh) to create the LDIFs directly, or to look up how it can be done and optimize that to better suit your needs.

Importing the resulting file is easily done via “ldapadd”:

ldapadd -D “configUser” -Wx -h yourhost -f yourfile.ldif

Of course, you need to insert your own credentials, host name and LDIF file.

That’s all.

Posted in howto | Leave a comment

Open-E DSS V7 – definitely an improvement

Despite having serious problems with previous versions, we never gave up on Open-E’s DSS software – an integrated NAS/SAN software, Linux-based, with a browser-based management interface.

Recently version 7 was published. The most highlighted enhancement was in the area of fail-over clustering, which is not the primary focus in our current installation. Nevertheless we gave DSS V7 a test run to determine if two (from our point of view) major problems of the previous versions have been fixed:

  1. DSS V6 for too long used an old version of SCST, having a severe data corruption bug that drove us crazy until we found our way to work around this.
  2. With our 60+ fiber channel groups, several with more than one disk device, administering the  FC groups was no longer possible as the web client never completed (re-)loading the list of disk devices.

Our tests have been really promising: Both problems could no longer be reproduced in our test installation. Especially the SCST bug was a real pain, as our work-around had to be adopted for (and tested against) every DSS update – now we’ll be able to just (test and) install new versions without much overhead.

On the other hand, the problem of handling large numbers of FC groups and devices may just be postponed: With 15 groups and FC LUNs, it took around 10 seconds to display the list of available LUNs for a group. When we had created 60 FC groups and around 70 disk LUNs, that time went up to already 25 seconds – that’s half aminute waiting time whenever you click on a FC group. This delay makes working with that part of the browser interface a patience-straining experience, and even more, making changes to the disk assignments takes equally long per change. But to me it seems that the interface is, although slow, robust: V6 gave up much earlier and when exceeding the implied limit, it just never finished to load that LUN list.

Unfortunately, DSS v7 still is rather locked up by Open-E so that you cannot easily add your own software to the server, despite being simply a repackaged & enhanced Linux distribution. When I once inquired about shell access, protection of intellectual property was the argument given against that. I would more understand if their arguments would target potential support problems when running customer-specific software in their environment… and even despite “voiding the warranty”, I’d put my own backup software client on the server: Scanning 10.000s of files on the server is causing much less impact on other users than doing so via an NFS share.

All in all, to us V7 looks like an improvement. We’ll nevertheless have to take a closer look if those bug fixes justify the cost, but for anyone starting off new and in the need for a neatly integrated piece of software to handle NFS, Fiber Channel, iSCSI and Microsoft & Apple file sharing services, I can recommend DSS V7 without hesitation.

Posted in DSS, Fiber Channel, Linux, NPIV, SCST | Tagged , | 2 Comments