Monday, March 6, 2017

Running a GDI printer under Linux part 6 - Writing the printer software

Running a GDI printer under Linux
part 6 - Writing the printer software


There are some other articles on this subject as well as the motivations for this work in my homepage.

In this final article of the series I will look at some details when writing the actual printer software. The main idea, as I said before, is to simulate the nearest possible the original working driver (under the other OS), so you will not have to think too much with command details nor discover the meaning of all registers inside the printer interface. If it works for the other, it will work for Linux as well. Of course, there will be found problems with timings, required commands (most are garbage, as you will discover, which explains why my Linux driver is twice as faster), and some minimum parameters to setup a correct printing image. The easiest way to get the parameters is printing with several page sizes but the same material (say, a image in MS-paintbrush) and noticing what changed between the captured data for both cases.
The pulses timings are more difficult to get. You will have to experiment with them, placing delays with usleep() or even sleep(). Get first the log generated by Bochs, calling the function bx_printf() at the devices.cc source file to get the timings in machine clocks and convert them to microsecond units. I have rewritten several times my Linux driver to get the best fit.
This article will be a tour through my printer driver's implementation, so you get a feeling of what to do after starting discovering your printer. As I have done myself, you can start to code your driver even before everything is well understood, if you follow the mimic principle. Much functionality of my printer was discovered that way, as my first driver was very crude to be useful. I dare you to write a working driver, even if it only write a tiny line, like my first experiment. Go ahead! Make your printer sing!

Mimicking the original driver

Will you have the chance to share some of my code? I don't really know, but I hope so. The low priced printers share several characteristics in common with mine: (1) they don't have enough memory, what mean you will have to print "bands", or "strips of paper", while the laser is burning the image in the printer's drum; (2) most of them need a fast way to transfer huge amounts of image data from the processor to printer's memory in real time, so there will be found non-standard protocols for that; (3) for the same reason, a compression algorithm will be used to encode image data. The compression is the tricky part of the reverse engineering, not so easy to mimic unless you understand it fully. I asked you in a previous article to make several patterned images to discover how it works. If you don't understand it fully, there is no way to write a printer driver, sorry.
Other matter of concern here is the parallel port emulation. If your printer supports SPP, EPP, and ECP, don't choose the latest! Linux is very efficient. Try to use the simplest protocol, because it is easier to debug. I mean SPP (standard parallel protocol), of course. A difficult issue I found in my printer was with the band sizes. First, each band must have an integer name of rows, in my case 4800 dots for each line. Second, when the size of compressed data varies too much, the printer gets lost, so I have to make a dynamic band sizing patch to my original compression algorithm (to see the full compression algorithm, please get the driver's source at ml85p-0.0.5.tar.gz, or at Metalab under /pub/Linux/hardware/drivers). The band is resized as it is compressed by fragments of code like:

       if ((cnt < LINE_SIZE) &&
          ((pktcnt + pcnt/256) < 5000) &&
          (linecnt<LINES_BY_PAGE))
       {
          cnt += LINE_SIZE;
       }
In this code, LINE_SIZE is the number of bytes for each printer's row (4800/8 = 600), pktcnt is the number of packets of compressed data assembled so far, while pcnt in the number of similar data found, that will generate other packets, each at most with 256 size. This gurantees my compressed bands will have about 5000 packets in size. The implementation seems very complex, but you have to account that each packet only can have 2, 3 or more bytes of data, so I have to take care of not overflowing the printed lines. When such overflow occurs, I see a shifted ugly page, or even a black band at the end of page (and here go my precious toner...).
You will have to recognize what is needed to reset the printer and which is the actual print page command. This is easy to get. Just capture your printer data without any printed page and you will get the reset procedure. There are different kind of commands, at least in my printer, so take care with the strobing of commands. The Samsung printer have two "kinds of strobe", I suppose one for selecting the printer's ASIC register and the other to send the register's new value. The lpoutw function show the two strobe sequences:

void
lpoutw ( int data,int type ) {
 int mask=0;
 char s[100];
 outlp(data);
 if (type) {
  mask=2;
 }
 coutlp( 4+mask );
 coutlp( 5+mask );
 sinpwfast(0x7f);
 coutlp( 4+mask );
 toggle_control(17);
}
The type argument tell it which kind of strobe to use. Of course, both generate a physical STB pulse, but with different AUTOFD signal levels. The toggle_control function call at the end is not well understood, but I told you that I mimic the windows interface. If I take out this function call, all my driver stops working, so let us leave it there! Your printer possibly will not be the same, but give attention to all control signals or you will be in trouble.

Help from ghostscript

Ghostscript is a nice postscript emulator, that translates not even to printer languages of most common printers, but also to several not printer related formats. I need my printer "understanding" postscript, so this is the way to go. I translate with ghostscript to an easy to process format and send to my printer from it.
I have chosen pbmraw as my target output format. I call ghostscript to translate postscript source to several pbmraw formatted pages, and then call my driver to read them (this format is very easy to read!) and send to the printer. The problem with this approach is that we have to get lots of disk spaces, for the image files are very large. The best approach is to pipe ghostscript output directly into my driver, so there will be no disk accesses at all. This is unix magic! The kernel connect ghostscript with my driver and as ghostscript send me image data, I read it and process in the flight. The driver must be aware of the exact end of each picture to look for the header of the next, or it will get lost.
Other problem I found was the page size. My default printer page is 4800 by 6774 pixels (as I only plan to use A4 paper) and ghoscript-generated pages varies in size. Then my routines have to be careful about that and fill the missing spaces or clip when the picture is larger, both for the width and the height of the pictures. After the piping mechanism was ready, the only disk space used was for the compressed images. This is a much lower requirement, and it is temporary, as I remove each page, after it is sent to printer.
There are some tricks for doing this in a modular fashion. First you save the bitmap file dimensions, when reading its header (I use bmwidth and bmheight variables). Then, when your get_bitmap() function is called, which returns one byte of bitmap data, you look if the bitmap's widht is greater than your page image size. If it is larger, simply read your page width and skip the remaining bytes from the bitmap file, otherwise read it's real size. If you clear the bitmap buffer before, (the memset() call) the space to the right of each line will be blanks, as expected.

unsigned char
get_bitmap () {
 FILE *dbgf;
 int i,k,tmp;
 if (bmcnt==0) {
  memset(bmbuf,0,800);
  if (linecnt<(bmheight-topskip)) {
   if (bmwidth > 800) {
    fread(bmbuf,1,800,bitmapf);
    bitmap_seek(bmwidth-800);
   }
   else {
    fread(bmbuf,1,bmwidth,bitmapf);
   }
  }
  bmptr = bmbuf+leftskip/8;
  bmcnt = LINE_SIZE;
  linecnt++;
 }
 bmcnt--;
 return *bmptr++;
}
The bitmap_seek() function is suppoed to do a seek, but I can't call fseek() directly, as I'm reading from a pipe! I just read and discard bytes with it. The variables bmptr, bmcnt, and bmbuf implement a simple buffer to get the next bitmap bytes, when get_bitmap() is called again. Notice linecnt, that tracks the line of the printed page output, and topskip and leftskip, that allows control of the margins at the top and left of the printed page. Ghostscript tends to put a larger margin at the top and left that I want, so the control is only to reduce those margins. It is easy though make them grow, if needed.

Pulse timing and status checking

A problem with gathering the pulses from Bochs is that, although the simulation is perfect in every detail, including real time clock of the virtual machine, the real hardware (printer) is not. He just "thinks" it is connected to a slow machine. So, most status reading will return an already ready condition. To know exactly what to look for, there are several possibilities:
  • you can check "unofficially" the status of the printer's port each time you change something, outputting data to it. When the printer driver check it again, you will notice what bits changed to get an idea of what is being tested. This is not infallible, but gives you a hint.
  • you can disassemble at the point the original driver is checking the status. Notice there are several such checks, and you must get them all to be sure you understand it. This breakpoint is tricky to be set, as I will explain below.
  • The first spying I made was to read the printer's status port each time sometime is written to it and log it to stderr or to the impr.log file. This file (impr.log) is being opened at the very start of Bochs and closed before it finishes, so I write stuff to it during the run. This spying is put at the bx_devices_c::outp(Bit16u addr, Bit32u value, unsigned io_len) routine, after checking that our printer port is being accessed.
          if ((addr <= 0x37a) && (addr >= 0x378)) {
                    port_real_outb(addr,value);
                    st379 = port_real_inb(0x379);
                    fprintf(stderr,"O%x,%x i1(%x)\n",(addr-0x378),value,st379);
            }
    Notice that port_real_outb() and port_real_inb() are not part of Bochs. I have included them to interface directly to the hardware. This routine is called when the virtual machine try to simulate an output. My code translate the simulation to real hardware access, but also reads the status port to st379, so we can see the status changing when the printer hardware detect the command. Otherwise, I would not see much, because the simulation is very slow compared with the hardware. 
    Of course, this output will be shown in real time. Sometimes, I change the stderr for impr_log (see my patched Bochs source) and log to it for further analysis, but it is good to see it in real time as well.
    To get the status checking disassembly, I modify the bx_devices_c::inp() function to print also the instruction pointer (EIP) when some port is being read. My capturing statement is fprintf(impr_log,"I%x(%x) 0x%x\n",(addr-0x378),ret,EIP);, so I get at the output (impr.log file) something like I1(7f) 0x80020965, the last large number representing the eip register at the time of the status checking. Then I filter all those statments (with grep I1 for instance), edit to cut everything except the last number and then sort | uniq it to have a list of all checking points for status. I wonder how useful are simple programs like uniq and sort and how much time I have been living without them (programming under msdos/windows).
    After getting these status checking points, I restart bochs (with the printer installed, of course) and disassemble several bytes after the input instruction, as the following example (in the example, I included 1 byte more at the beginning to show the reported "in" instruction):
            <bochs:6> disas 0x8001f350 0x8001f356
     8001f350: ec: in AL, DX
     8001f351: 24f0: and AL, #f0
     8001f353: 3cf0: cmp AL, #f0
     8001f355: 7522: jnz +#22
    Most of times you don't really need to know what bit is more meaningful . You can just do the same test at your Linux driver. First you shall have the addresses from the impr_log file, then you stop bochs pressing <Ctrl>-c (to stop the simulation) and execute the instruction disas 0x8001f350 0x8001f356. The second address is where to stop the diasassembly, so give something say ten bytes after the start address. Repeat the process for all recorded checkpoints you got with the procedure given before. You don't need to understand everything now, just record the assembly output. You don't really need to know much of assembly language, but at least your machine's architecture (registers) and a handful of logical instructions. In the example given, we are testing if the four high order bits of the status port are all turned on. If you can't understand this, please go read a good assembly language book or call for a friend's help.
    If you have a SMP (multiple processor) machine, you have to look for troubles when disabling interrupts. It is better to try first with only one processor. When everything works fine, you can rewrite a SMP version of your code. There will be critical parts of the data transfer that you will need all speed possible, or the printer will lose data, so it is unavoidable the use of cli() and sti() in some places.

    Tools for the future

    Real time techniques are invaluable to analyze unknown data streams. We can make a versatile logic analyzer with RT-Linux plus some driver code and a suitable graphical interface. I plan to make available in the near future something like that, not only to detect and reverse engineer printers, but anything connected to the parallel port or even other ports. The only concern here is that we will have to stop the cpu until the trigger conditions occur, and if they don't occur you will have a rock-solid frozen machine. Of course, we can use an interrupt source to do the triggering, but this doesn't guarantee real time performance, because RT-Linux, while much faster to react than a normal linux kernel driver, have a finite response time.


    Rildo Pragana <rildo@pragana.net>     Adventures in Linux Programming 

    Running a GDI printer under Linux part 5 - Solving the compress puzzle

    Running a GDI printer under Linux
    part 5 - Solving the compress puzzle


    There are some other articles on this subject as well as the motivations for this work in my homepage.

    No matter what acquisition method you use, you will finally get some compressed data. I expect most of data will be output at the baseport (0x378 at the normal parallel port), and if you are using my Bochs patched, you will see lines with O0,<something> in profusion. First try to run several distinct subjects, for instance, a blank page and your printer test page to see where large differences are found. Forget about sparse signals, like accesses tagged as O2,XX or I1(XX), for this first attempt.
    Then you will need to prepare yourself for several discovering experiments. You will have to use simple programs like Xpaint or bitmap to make your "drawings". See some example test patterns I have used to inspect my printer:

    A block of known dimensions like this is useful for some tests. You may need to experiment with several block sizes to discover what is the size of your packets. Some times it is not apparent where it have changed, for a change in one packet can propagate to many others. In that case, you will need a shifted pixel bitmap like the shown in the xpaint image below. A very important note is select margins zero when printing. Probably you will use MS-paintbrush for this purpose. It introduces a white margin in the paper (and so, in the compressed data stream), so before printing set the four margins to zero, after entering the dialog with Alt-t. If you forget to do this, all your bitmap will be shifted and much more difficult to understand.
    In the first experiments, don't push too much. It is better to do many small experiments, than a large one trying to cover every possibility. And also, don't be so economic. Spare some time trying to "understand" the language of your printer. Print several times the same subject (a simple block like the first figure, for instance) and try to find where the similarities reside, and what's meaningful or what don't matter. Remeber the enemy have spoiled the terrain with many garbage to make your life difficult or near to impossible.
    Thou that follow my counsel will reach the final victory!

    This second bitmap was one of my final test patterns. Notice that the first line have an interesting, almost random pattern. It is not line noise. I made it to convince myself that my compression algorithm reconstruction was correct.
    Draw a strategy to analyze your compressed data. For instance, draw many 1-line bitmaps with (1) 16 white dots + 1 black dot; (2) 64 black + 1 white, (3) 15 white + 1 black; etc. Your first goal is to know the size of a full filled packet. My packets were 4 bytes long, but yours can be oher sizes. If you do a systematic analysis, it is not so very difficult to find out.
    Dont' desperate if something goes wrong, or seems too random. Nothing is random, because itmust be printed and your printer is very, very much more stupid and dumb than you! If something is going wrong it is because you are not prepared yet to more complex experiments. Try many sizes of patterns and prefer simple patterns. Count the number of valid packets M$-windows is sending to your printer. Count both with white pages and with a page with a single 63 pixel line just at the upper row (to do this experiment, create a 64x1 with Xpaint and fill every pixel except the last with ink). Why 63 and not 64? Because this will cause most of bits in the packet flip and will be most visible!
    Look at the compressed data for my winprinter, as I've given in the first article of this series. It was not easy at all to decipher it! I have made many experiments (about a hundred pages printed, I guess). Well, I'm not a genius, so I had to experiment more as I expect you will do.

    The best tool to use is, of course, Bochs, because you will need only one machine, but it will be more time intensive and cannot solve your problems if your printer requires a faster data dump. The best test to see if it fits, is to print a test page from your printer driver inside Bochs. If it works, even leaving a scrambled image after the first lines, it is suitable. Don't go to RT-Linux just because your printer can't print a full page inside Bochs. RT-Linux is much harder to use and setup. I did both, and my printer also scrambled its printing, but the first lines of printing are the only thing you need to concentrate on while deciphering your compressed data.
    Next time I will show more details on the protocol discovering and some settings of my printer and how I discovered them. It's just a wrap-up of the series. If you have some question, while the matter is hot, please ask me by e-mail. I plan to port more printers, if someone give me another winprinter as a gift. I will not buy another printer, because I have plenty of printers now. Who wants another GDI printer ported to Linux?

    Rildo Pragana <rildo@pragana.net>     Adventures in Linux Programming 

    Running a GDI printer under Linux part 4 - Real Time Techniques

    Running a GDI printer under Linux
    part 4 - Real Time Techniques


    There are some other articles on this subject as well as the motivations for this work in my homepage.

    While Bochs is a good tool when the speed is not a limiting factor, there are times where we need to see the printer events at full speed. This is somewhat more complex and we will need to compile a special kernel to host our capturing tool.
    There is a kernel extension known as RT-Linux valuable for this kind of signal processing, allowing us to use a second computer as a logic analyzer of low cost. RT-Linux let we have full control of our machine, running the regular Linux kernel as a lower priority task or thread than its specially designed threads. Suppose that this second machine, or spy machine, will be connected to a modified "T" printer cable and can capture all signal modification occured during some time interval. We can also set some triggering condition for starting the capture or filter what is important to be stored. It can be seen that this logic analyzer is even better than many commercial instruments available. And if we run the printer from a slower computer than our spy, we can get every detail on the parallel port signals.

    The T Centronics cable

    You will need to open the DB-25 connector of a standard PC-to-printer cable and solder several wires, with care for not disturbing the present connections. Then at the other side of these wires, you put another DB-25 connector, that will be attached to a printed circuit or perforated or even other solderless board with some TTL data buffers as a kind of data selector. This is needed because our parallel port only have 5 input signals (we could save one part, but we choose to stay with just 4 signals for simplicity) and we will select which signals to spy by lowering the level of only one such buffers at a time. In the circuit given, for example, the SEL_HIGH_DATA selection corresponds to the value 11111110 (binary) being output through the spy's data port. BASEPORT is the base address of the spy's parallel port, usually 0x378 (but could be 0x3bc, check your /proc/ioport for your parport0 device).

    Fortunately, only a small number of signals are present at the port: 8 data signals, 4 control signals and 5 status signals. With a simple circuit like this: (you may also get the xfig original circuit)

    and a small real time thread code like this:
    void *lpt_thread_code( void *data ){
        int rb1,rb2;
        int cnt=4095; 
        while (cnt--) {
            while (!(inb(BASEPORT+1)&STB))
                ;
            outb(SEL_HIGH_DATA,BASEPORT);
            rb1 = (inb(BASEPORT+1) << 1)&0xf0;
            outb(SEL_LOW_DATA,BASEPORT);
            rb2 = ((inb(BASEPORT+1) >> 3)&0x0f) | rb1;
            rtf_put(1,&rb2,1);
            while ((inb(BASEPORT+1)&STB))
                ;
        } 
        pthread_wait_np();
        return NULL;
    }
    it will be possible, for instance, get strobed data until our buffer is full and then collect the gathered data from a /dev/rtf0 fifo device fed by our real time module. The drawback is, when the thread code is active, no other activity can take place in our spy machine, not even time interrupts, nor keyboard events, nothing, nada, niente. That's why it is crucial to have a well designed thread or we will have to push the reset button too much and wait for fsck!
    The code given is very easily understood. First it will wait until the STB signal is low (because it is inverted), at the first while loop. Then, it will get the four most significant bits from the data bus of the spied port, on its control port, shift them to align with bit position d7-d4 and finally repeat the procedure with the least significant position, shifting to the right and combining with the first 4 bits. The call to rtf_put will make the result available later when the acquisition phase finish. The last while statement will wait for the release of the STB signal. Of course, this code doesn't work with a GDI printer, because it doesn't honor the STB or other standard lpt signals, but this illustrate what kind of procedures we want to code. The counter cnt limit our acquisition so we don't stay hang forever. And the pthread_wait_np() call make the thread stop till the next period (that will never arrive).
    The real time fifo can be read as any other character device, for instance, just cat'ing the /dev/rtf0 device or copying it to a file. This thread code is not magic. You must make many experiment changing the code to suit to your printer protocol. A good first try is to get all ports, save it in a temporary integer and comparing each time with the value read before. When the value differ, you put it in the fifo and save it in the temporary variable and repeat the process. Then look at the captured data to see any interesting pattern on it.

    Some advice on the real time threads usage

    It is out of scope to explain here how to compile and install RT-Linux. But when you make RT device drivers remember you are in a lower level position then even the kernel, and as such, no printk or other non-reentrant kernel functions can possibly be called. There are a few library routines available from RT-Linux site, but we don't really need many. You can make your real time thread communicate by shared memory, but fifos are easier to work for streamed data like we have. You can have a time stamp attached to your readings, by calling clock_gethrtime( rtl_getschedclock() ). It returns the current real time scheduler time in nanoseconds units. Some parallel port pins have inverted logic. Please don't invert them at the real time. You need most of the CPU time for gaining speed. Instead, save them scrambled and make a utility routine to post-process your data.
    We can implement many triggering sources, both by analysing the signals, as well as letting the start of the real time thread after some event. For instance, if the interesting part of your measurement can be defined by your hearing of the printer sound, you can have a manual trigger. To implement it, create a command fifo (say /dev/rtf1), put the following: rtf_create_handler(IN_FIFO, &cmd_handler); at the init_module function, and make an external (user level) procedure pipe the command into the IN_FIFO device (defined by #define IN_FIFO 1). Generally you will need a most sophisticated triggering procedure. Use counters in the thread code for starting to capture after "n" times a given event occurs, so you will have a window into the data gathered.
    I am lucky because my printer didn't required so much real time tools to be analyzed. Next time I will show some reasoning I used to understand the compressed data and what kind of patterned data I did send to the printer, drawn by Xpaint and printed by MS-Paintbrush at the Bochs virtual machine side.


    Rildo Pragana <rildo@pragana.net>     Adventures in Linux Programming 

    Running a GDI printer under Linuxpart 3 - Tools and Techniques

    Running a GDI printer under Linuxpart 3 - Tools and Techniques



    There are some other articles on this subject as well as the motivations for this work in my homepage. This is the third article on the series Fighting against GDI. As good soldiers, all Linuxers are invited to battle the enemy, writing their winprinters drivers in the behalf of the free software world. Please forgive me for the bad style and language, for I don't have a good english teacher available to revise my writings. Feel free to send me corrections.

    About two weeks ago I went to a store to buy our monthly food and supplies and  I found there a real bargain: a laser printer for under R$ 600, (brazilian "reais". One real is changed for about 0.52 US dollars.) from Samsung, the ML-85G.  Usually I walk around from GDI printers, because of the trouble to get them working.  But I thought the time arrived when Linux is proliferating around us and I have several customers that want to enter the new wave. I decided then to enter the game, and as a good gambler I was not sure I was going to win, but I bet to try.
    In the first time, I have connected the printer to an old 486 with Win95 installed and connected to my home network to see how useful were it going to be.  With Samba, ghostview, including gprint, it worked well, but I noticed then that most of times Win95 were calculating the pixels and, as the memory inside the printer was only 512K, there must be some kind of very fast protocol to send the data as the printer drummer was being burned by the laser beam. Of course, I have had some previous exposure to lasers and electronics technology, as I make my life designing parts and assemblies for the industry.
    Then I looked for the various alternatives available.  I did already know that disassembling from windows programs is a burden at least, as one should expect to find many instruction modifying code, calls followed by extra inline arguments and all kind of masquerading to turn it in a difficult enterprise. But what should we expect from an interface with a limited number of pins? We have only 8 data bits, 5 input status signals and 4 more controls signals. So, it should not be so near to impossible as one could expect.
    Two alternative schemas caught my attention: (1) simulate a complete hardware environment, so I can trace all accesses to those parallel port pins and get a sequence of events; (2) capture data in real time, using another gadget, or even another faster PC to accumulate the results, while the printing occurs at the normal speed.
    I have tried both approaches with the most appropriate tools I could find and this article shows some of the reasoning and critical issues involved in the acquisition process.  The guiding point of my process is capturing a large data collection and then analysing it with the best intelligent tool invented so far: the human brain!  Nobody could possibly create something that complex that couldn't be simulated if we know everty detail of the interface and protocols. So don't expect me to explain everything, just patterns of data that will be simulated by another software, the Linux device driver.  I actually found that some data with similar properties could be grouped in some meaningful subroutines (that's exactly what subroutines exists: to group similar procedures) and then some revealing detail appear with the time from our bindly gathered data.

    Some protocol issues

    The parallel port protocol came from Centronics, as we know it.  It has very simple handshake signals to tell the printer when the data is ready to be read, and to tell back the computer that the printer actually processed the character, before moving to the next character. 

    Almost every parallel port printer follows this de facto standard, also known as IEEE 1284 (IEEE is the Institute of Electrical and Electronics Engineers, an association which I'm proud to participate as a member). The basic signals are the STB, BUSY and ACK signals, as shown in the figure.  GDI printers don't honor these signals. In fact, I found that mine laser printer transfer most of its data without any handshake signal being generated or tested.
    What to look for then? Think! Use your intuition, or ask some friend if you are clueless. You can also send some captured data to me, but please, try first to find out by yourself its meaning, for I'm not a magic nor I have all the time of the world to spare. If you find something really difficult to find with your tools, so please send to me. Probably I will find it interesting too and can help you to conduct more experiments and solve the maze. The central idea is to look for discrepant values embedded in a otherwise boring and repetitive pattern. Yes! The enemy uses such camouflage of signals to address you to an unteresting signal or pattern so you will find lost even before reaching the first real signals.  Please look at this captured data from my first experiment with RT-Linux and the circuit I will show you later:
    80 a0 00 a0 89 8a a6 07 a7 8b 8b 89 8c 8c 04 94
    3f 95 58 94 95 89 8a a6 07 a7 8b 8b 9a 89 8a a6
    07 a7 8b 89 8a a6 07 a7 8d 46 89 8a a6 07 a7 89
    8a a6 07 a7 8e 89 8a a6 07 a7 8d 4f 89 8a a6 07
    a7 89 8a a6 07 a7 8e 89 8a a6 07 a7 8d 01 89 8a
    a6 07 a7 8e 97 00
    This is very boring and don't show up what this is really. It was captured from the data lines (D0~D7) of the printer port from a second machine, qualified with the STB (strobe) signal. This means that, when STB goes active, it store one value (8 bits), then waited till STB is inactive, and repeat the process. However, this data can be more meagninful if we rewrite it like:
    80 a0 00 a0
    89 8a a6 07 a7 8b 8b
    89 8c 8c 04 94 3f 95 58 94 95
    89 8a a6 07 a7 8b 8b 9a
    89 8a a6 07 a7 8b
    89 8a a6 07 a7 8d 46
    89 8a a6 07 a7
    89 8a a6 07 a7 8e
    89 8a a6 07 a7 8d 4f
    89 8a a6 07 a7
    89 8a a6 07 a7 8e
    89 8a a6 07 a7 8d 01
    89 8a a6 07 a7 8e 97 00
    In the end, it turned out that this data is not really data but a giant stream of commands given to the printer ASIC, just as the camouflage of a small number of required signals. As I said before, the data is being sent with its own embedded handshake (the most significant bits). To read the enterlines fo the data remember the old psychological tests you're presented to enter high school or the puzzles your parents bought to you. It is like a joke, and if you're intelligent (aren't you?) you are going to win.  You don't need any fancy programs, just a couple of sed scripts. Personally, I like tcl, because it is my one-size-fits-all tool for many jobs. If I need a GUI (graphical user interface) it's there with tk. If I some interface to many others libraries, it's already implement or almost ready. Or if I need to really custom some new featuree, it is easily interface with C, even easier than many toolkits available for unices. 

    Bochs and boxes

    Kevin Lawton gave me a great gift by writing this nice and well documented program. Mandrakesoft decided to make it GPLed and sponsored Lawton's company, so we can have the best tool to spy inside deeply unknown programs, like the GDI layer of MS-Windows.  Altough Bochs have itself some nice and documented instrumentation interface, I'm somewhat undisciplined, so I made several patches on its sources to get  quick-and-dirty tricks of capturing and passing to the real hardware input/output accesses at the parallel port. 
    Such accesses must be given from a root (superuser) account and be allowed by ioperm() or iopl() system calls. Please refer to their manpages to know how to use them. They are a quick way for writing the final device driver outside the Linux kernel too. When I have some time to cleanup the dirty I've left...
    Bochs is a PC hardware simulator, very flexible, that also simulate the peripheral hardware like video card, mouse, keyboard  and disks of a virtual machine. The cpu can be chosen from a 386, 486 or Pentium class machine, with even the instructions per second speed selectable.  But all of this is done by C programs, so the speed is significantly lower, though useful for many experiments. Most of my GDI printer spying was done inside this virtual machine. You need to prepare a disk from scratch inside a linux file to get it running. Look that I said "a disk from scratch" and not a partition.  You can use fdisk with the filename, after reading the documentation and choosing a suitable disk geometry, than creating your msdos partition and formatting it. Then you can copy the files needed with the mtools package programs. I prefer instead to mount the partition with something like:
    mount -t msdos -o loop,offset=$[ 63*512 ] /opt/252M /bdsk
    Notice the expression $[ 63*512 ]  in shell syntax. The 512 is the size of each disk sector and the 63 is the first sector of my msdos partition, as reported by fdisk. If you're a systems administrator you're in your own, otherwise, RTFM. 
    Unfortunatelly, Bochs (that sounds "box", referring to the linuxes and bsd boxes) don't have parallel port support, not even in its BIOS (basic input output system).  As I plan to make all parallel ports access actually occur, I just started the main() with an ioperm(0x378,3,1). Please get the patched version of Bochs. For your convenience I included a pre-compiled binary, but the better is that you compile it yourself, as you will need to change the basic capture conditions several times with the feedback from previous runs. You must be aware too that Bochs need the disk before ready to start. You can make it boot from a floppy disk, but in the long run it's better if you install the target operating system inside the simulated disk image.  You may find difficult to install it from scratch, in each case you should install it very plain (vga, ms mouse, standard keyboard, etc) in another machine and the copy the pre-installed version to your image disk. Mount it like shown above. And please, be patient as Bochs is very very slow, as it simulate all instructions. Anyway it is fairly accurate and that's mos importatan to our experiments than speed. You will have plenty of time to run while you think on the results from the previous runs.
    The real capture is written inside two routines under iodev/devices.cc in the Bochs source directory. The routines are bx_devices_c::inp() and and bx_devices_c::outp().  We only need them to see if our i/o range is being selected and then write the data to one pre-opened log file.  We have to write very compacted things to not spend much time inside our code. It's far better to leave the interpretation to the other offline programs.  Here is a sample of my captured data:
    O2,2 i1(f)
    I1(f)
    I2(c2)
    O2,0 i1(4f)
    I1(4f)
    I1(4f)
    I2(c0)
    O2,2 i1(f)
    I1(f)
    I2(c2)
    O2,0 i1(4f)
    I1(4f)
    I1(4f)
    I2(c0)
    O2,2 i1(f)
    I1(f)
    Here we have "O" meaning output, "I" input. The number given next to it is the offset from the base port (0x378 for the first lpt) and then the data value. A "i" output means that we have read the port without any request from the intervenient operating system. I do this so we can know what was read fro the status port, just after MS-Windows write to the data or control port (where Windows didn't request it). Then I compare the previously read value with the same data requested later. Remeber that the speed of "virtualization" is hundred of times slower than a real machine. It is not time dependant for Windows, because Bochs fools it to think that only a short time elapsed, but the real devices (like the printer) have much faster responses. 
    Notice the data given in the begining of this article was extracted from large pieces of this kind of data streams. It was filtered with grep for only the "O0," lines, edited to strip all "O0," and "i1(.*)", and then grouped starting with the first character of the repeating pattern by a small tcl program: 

    #!/usr/local/bin/tclsh
    
    set s ""
    set n 0
    while {![eof stdin]} {
        gets stdin line
        catch {set byte [expr 0x$line]}
        if {$byte == 0x89} {
            puts $s
            set s ""
            set n 0
        }
        if {[incr n] == 16} {
            set n 0
            puts $s
            set s ""
        }
        append s "[format %02x $byte] "
    }
    puts $s
    Of course, you can easily construct your own tools with your favorite language. Use your creativity!
    Sometimes there is some hardware time-critical isues found that prevent us from getting the data with a working printer. In that case, Bochs can't give us much. We could look for other simulator, but I will show you some alternative ways (with RT-Linux) for gathering the data with the printer running at full speed. You will need to fire your soldering iron or grab your wire-wrapping tool for assembling a small 4 parts circuit. For now get a couple of  printer cables (yes, I said a couple) and get some 74HC374 integrated circuits and a printed circuit board or some experimenter's perfurated board.
    There is no way the devil can hidden itself. The believers will ever win!
    Wait for the next article and send me your comments. The saga will continue.


    Rildo Pragana <rildo@pragana.net>     Adventures in Linux Programming 

    Running a GDI printer under Linux (part 2)

    Running a GDI printer under Linux (part 2)





    If you read my previous article, you may have noticed that I haven't given a protocol to send the compressed data to the printer device. By that time, I already have dissected it. Now, I'm going to show you a very hard hack for making this baby work under Linux, with a simple program.
    As my program will access directly machine's hardware, including disabling and re-enabling interrupts for the critical parts, you will have to run it as root (superuser). Please don't consider this program a gem, not even an alpha-level software. It's just a hack for showing that is possible to construct a device driver for the printer.

    The code

    Now let's look at the code. It even starts to print earlier than with the OEM driver distributed by Samsung (M$ is not very good at improving their own software!) for their ML-85G. But don't expect something fancy, as this is very experimental yet. I don't know how to send several pages at once, but I'm working on this "feature".
    The fragment of code
      outb(0x7f,LPPORT);
      outb(0x83,LPPORT);
      outb(0x40,LPPORT);
      outb(0x80,LPPORT);
    
    just send a block of white space, as discussed in the previous article. If you change this to something like
      outb(0x7f,LPPORT);
      outb(0xbf,LPPORT);
      outb(0x4f,LPPORT);
      outb(0x80,LPPORT);
    
    you will see a narrow 1-pixel high line. This is what the program actually does now. I will have to write a program to translate pbm or other graphic data format to this compressed data format.
    The heart of the program are the command sequences cmd_seq, cmd_seq2, etc. They access the registers fo the chip inside the printer to control it. I don't know exactly what they do, but who cares? I just mimic the flow of data of other programs to tell him to do what I need! This is my "secret".
    Each pair of values of cmd_seq are called cmd_code and cmd_type. The cmd_type controls the generation of a different kind of "strobe" give to the parallel port and is really required. Don't ask me why.
    Another interesting "feature" is that, if you choose carefully where to stop giving commands to the printer¸ you can. See the commented part where it prints "going to sleep" and sleeps for some seconds. The original driver (from Samsung, I think) never stops giving commands to the printer. Well, this will be subjected to more experimentation. I don't want to have a driver that send data to the device each millisecond or so, just to see if the printer is out-of-paper, or have been disconnected from the computer. It's a terrible waste of time and cpu resources.
    Here is the code:
    #include 
    #include 
    #include 
    #include 
    //#include "ml85p.h" this is already included here :)
    
    #define PIXELS_BY_ROW  4800
    #define ROWS_BY_BAND   104
    #define MAX_BAND  61
    
    /* We must control the device bypassing the kernel driver,
     * because the interface don't follow any standard handshake
     * procedure. In the future, we can write a real device driver
     * to overcome this inconvenience.
     */
    #define LPPORT   0x378
    
    #define COUTLP(d) outb( (d),LPPORT+2 ); usleep(10000)
    #define OUTLP(d) outb( (d),LPPORT ); usleep(10000)
    #define SINLP() (inb( LPPORT+1 ))
    
    
    int cmd_seq[] = { 
    0x80, 1, 0xa0, 1, 0x00, 0, 0xa0, 1, -2, 
    0x89, 1, 0x8a, 1, 0xa6, 1, 0x07, 0, 0xa7, 1, 0x8b, 1, 0x8b, 1, -2,
     0x89, 0, 0x8c, 1, 0x8c, 1, 0x04, 0, 0x94, 1, -2, 
     0x3f, 0, 0x95, 1, 0x58, 0, 0x94, 1, 0x95, 1, -2, 
    0x89, 1, 0x8a, 1, 0xa6, 1, 0x07, 0, 0xa7, 1, 0x8b, 1, 0x8b, 1, 0x9a, 0,
    -1 };
    
    int cmd_seq2[] = {
    0x89, 1, 0x8a, 1, 0xa6, 1, 0x07, 0, 0xa7, 1, 0x8b, 1, -2, 
    0x89, 1, 0x8a, 1, 0xa6, 1, 0x07, 0, 0xa7, 1, 0x8e, 1, -2, 
    0x89, 1, 0x8a, 1, 0xa6, 1, 0x07, 0, 0xa7, 1, 0x8e, 1, 0x97, 1, 0x00, 0, -2, 
    0x89, 1, 0x8a, 1, 0xa6, 1, 0x07, 0, 0xa7, 1, 0x8e, 1, -2, 
    0x89, 1, 0x8a, 1, 0xa6, 1, 0x07, 0, 0xa7, 1, 0x8b, 1, 0x8b, 1, 0x9d, 0, -2,
    0x89, 1, 0x8a, 1, 0xa6, 1, 0x07, 0, 0xa7, 1, 0xa0, 1, 
     0x00, 0, 0xa0, 1, 0x81, 1, 0x15, 0, 0x82, 1,
     0x00, 0, 0x83, 1, 0x00, 0, 0x84, 1, 0x00, 0,
     0x85, 1, 0x24, 0, 0x86, 1, 0x01, 0, 0x87, 1, 0x3b, 0,
     0x88, 1, 0x0d, 0, 0x81, 1, 0x82, 1, 0x83, 1, 0x84, 1,
     0x85, 1, 0x86, 1, 0x87, 1, 0x88, 1, -2, 
    0x89, 1, 0x8a, 1, 0xa6, 1, 0x07, 0, 0xa7, 1, 0x8e, 1, -2, 
    0x89, 1, 0x8a, 1, 0xa6, 1, 0x07, 0, 0xa7, 1, -2,
    0x89, 1, 0x8a, 1, 0xa6, 1, 0x07, 0, 0xa7, 1, 0x8e, 1, 0x93, 1, 0x00, 0, -2, 
    0x89, 1, 0x8a, 1, 0xa6, 1, 0x07, 0, 0xa7, 1, 0x80, 1, -2, 
    -1 }; 
    
    int cmd_seq3[] = {
    0x89, 1, 0x8a, 1, 0x97, 1, 0x00, 0, 0xa6, 1, 0x07, 0, 0xa7, 1, -2, 
    0x89, 1, 0x8a, 1, 0xa6, 1, 0x07, 0, 0xa7, 1, 0x8e, 1, 
     0x93, 1, 0x00, 0, 0x80, 1,  
    -1 };
    
    int cmd_pause[] = {
    0x89, 1, 0x8a, 1, 0xa6, 1, 0x07, 0, 0xa7, 1, 0x80, 1, 
    -1 };
    
    static int last_ctrl=0x06;
     
    void
    sdelay() {
     __sti();
     usleep(5000);
     __cli();
    }
    
    void
    putmsg (char *msg) {
     __sti();
     printf("%s\n",msg);
     __cli();
    }
    
    int 
    sinp () {
     return (inb(LPPORT+1));
    }
    
    int 
    sinpwfast (wait) {
     __sti();
     while (inb(LPPORT+1) != wait) ;
     __cli();
    }
    
    int 
    sinpw (wait) {
     //int timeout=80000;
     __sti();
     printf("sinpw: %x\n",wait);
    
     while (inb(LPPORT+1) != wait) ;
     /*{
      if (timeout-- == 0) {
       __sti();
       iopl(0);
       exit(1);
      }
     }*/
     __cli();
    }
    
    void
    outlp( data ) {
     //printf("O0,%x\n",data);
     outb(data,LPPORT);
    }
    
    void
    coutlp( data ) {
     //printf("O2,%x\n",data);
     outb(data,LPPORT+2);
    }
    
    void
    lpwait (data) {
     __sti();
     printf("lpwait(%x)\n",data);
     while (SINLP() != 0xff) ;
     __cli();
    }
    
    void
    toggle_wait (times) {
     int i;
     __sti();
     last_ctrl = inb(LPPORT+2)&2;
     for (i=0;i<9;i++) {
      printf("I9(%x) ",inb(LPPORT+1));
      fflush(stdout);
      coutlp( last_ctrl );
      last_ctrl = last_ctrl ^ 2;
     }
     coutlp(6);
     __cli();
    }
    
    void
    toggle_control (times) {
     int i;
     __sti();
     for (i=0;i= 0) {
       cmd_type = *cmd++;
      }
      if (cmd_code == -1) {
       break;
      }
      if (cmd_code == -2) {
       __sti();
       usleep(5000);
       __cli();
       continue;
      }
      lpoutw(cmd_code,cmd_type);
     }
    }
    
    static int first_block=1;
    
    void
    print_band ( int band ) {
     int line;
     int group;
     char msg[50];
     sprintf(msg,"band %d",band);
     putmsg(msg);
     if (first_block) {
      first_block=0;
      
      outb(0x7f,LPPORT);
      outb(0xbf,LPPORT);
      outb(0x4f,LPPORT);
      outb(0x80,LPPORT);
    
      for (line=1;line<104;line++) {
       for (group=0;group<242;group++) {
        outb(0x7f,LPPORT);
        outb(0x83,LPPORT);
        outb(0x40,LPPORT);
        outb(0x80,LPPORT);
       }
       outb(0x4c,LPPORT);
       outb(0x83,LPPORT);
       outb(0x40,LPPORT);
       outb(0x80,LPPORT);
      }
     }
     else {
      for (line=0;line<104;line++) {
       for (group=0;group<242;group++) {
        outb(0x7f,LPPORT);
        outb(0x83,LPPORT);
        outb(0x40,LPPORT);
        outb(0x80,LPPORT);
       }
       outb(0x4c,LPPORT);
       outb(0x83,LPPORT);
       outb(0x40,LPPORT);
       outb(0x80,LPPORT);
      }
     }
    }
    
    void
    print_page ( int maxbands, int firstpage ) {
     int band;
     int *cmd;
     int cmd_code,cmd_type;
     
     putmsg("phase 4 *** printing page");
     if (firstpage) {
      cmd = cmd_seq2;
     } 
     else {
      cmd = cmd_seq3;
     }
     while (1) {
      cmd_code = *cmd++;
      if (cmd_code >= 0) {
       cmd_type = *cmd++;
      }
      if (cmd_code == -1) {
       break;
      }
      if (cmd_code == -2) {
       __sti();
       usleep(5000);
       __cli();
       continue;
      }
      lpoutw(cmd_code,cmd_type);
     }
     coutlp(6);
     outlp(0xff);
     coutlp(4);
     coutlp(5);
     
     for (band=0;band= 0) {
        cmd_type = *cmd++;
       }
       if (cmd_code == -1) {
        break;
       }
       if (cmd_code == -2) {
        __sti();
        usleep(1000);
        __cli();
        continue;
       }
       lpoutw(cmd_code,cmd_type);
      }
      coutlp(6);
      outlp(0xff);
      coutlp(4);
      coutlp(5);
     }
    }
    
    int
    main ( int argc, char *argv[] ) {
     int c;
     int reset_flag=1;
     int maxbands=25;
     int print_flag=1;
     
     while ((c = getopt(argc,argv,"rnb:")) != -1) {
      switch (c) {
       case 'r': {
        reset_flag = 1 - reset_flag; /* toggle it */
        break;
       }
       case 'b': {
        sscanf(optarg,"%d",&maxbands);
        break;
       }
       case 'n': {
        print_flag=0;
        break;
       }
      }
     }
     //ioperm( LPPORT,3,1 );
     iopl(3);
     __cli();
     if (reset_flag)
      reset_printer();
    
    /* printf("going to sleep...\n");
     sleep(10);
     printf("back to work...\n");
    */
     if (print_flag)
      print_page( maxbands, 1 );
    
     //ioperm( LPPORT,3,1 );
     __sti();
     iopl(0);
     return 0;
    }
    
    Now I have to go back to work. At the next time, I will show some of the tools I have used to get to the point. The main idea is to filter things to extract what matters. For instance, I have written several small tcl scripts to extract the commands that write to the ASIC registers reading the large captured files from Bochs. I have also hacked a little Bochs to send the data to the real printer and to write each i/o statement for the range of the printer ports registers to a compacted log file.
    I would like to thank all the people that have written to me. Their encouragement is an essential ingredient to keep me with the patience to explore the unknown, running thousand of times the same tests for discovering the workings of this creature. Of course, it was worth the time spent and I hope other laser winprinters will follow. All of you can count on my help and experience to bring their GDI printers working under Linux too. Please, continue writing to me.
    Now I look at my little printer and run the driver above to see it singing. It sounds wonderful, specially after the hard work done for getting it to even start to print. And I play with it saying Print, baby, print!, as the leaves of paper runs through its body.
    Rildo Pragana <rildo@pragana.net>

    Running a GDI printer under Linux (a preliminary report)


    Running a GDI printer under Linux (a preliminary report)


     This short article show what I have dissected from a winprinter or GDI printer to make it usable with ghostscript or other Linux tools. This kind of gadget is completely non standard as an excuse to make it cheaper, but also more convenient to its manufacturer and the commercial software monopolist, as they are very difficult to be interfaced. I plan to release also, in a later article, the techniques I have used to gain insights in its workings, if there are many interested people. Everything I have got my hands into are freely available in source code and with a handful of TTL integrated circuits and RT-Linux you can intercept everything going on at your parallel interface or printer port, even without a standard IEEE protocol. Another nice tool for exploring is Bochs, a Pentium emulator written by Kevin Lawton and sponsored by the Mandrakesoft, a Linux distributor. I will try to make all information contained here accurate and easy to follow, but I don't assume any responsability for the consequences or damages made for its use or misuse. In other words, no guarantees, please. My winprinter is a Samsung ML-85G (G means GDI here), a Korean brand better known for its video monitors, and is related as paperweight under the Printer-HOWTO entry. Otherwise, is a very beautiful piece of hardware, reasonably fast (8 ppm) and silent as a laser printer allows, and fully controllable by software (there isn't even a power switch!). Its only drawback is being a GDI printer. Well, things are going to change, at least for this particular making of printer. Wouldn't you like to explore yours too? Send an e-mail and let me discuss with you what you have. (Anyhow, I love to receive e-mails) Data compression To get the data transfer faster, our printer have several constraints on the data exchange, so only one port is active during most of data transfer, the baseport. Its address normally is 0x378, but it depends on which parallel port your system is configured. But how to work with only one data port, without any kind of handshake to tell the receiving side the data is ready? Some rules made this possible. Let us see them: 1. Each data chunk is composed of four bytes. 2. The first and third bytes have the MSB (most significant bit) zero, wheter the others have the same bit turned on. 3. Except for the first byte, all the others must have odd parity. The parity is the number of "ones" in the data it contains. To adjust the parity, the second MSB is used. 4. The first byte can be of two different kinds: (a) stream bits; or (2) RLE (run-length encoded) bits. This is selected by the second MSB bit of it. This is the only byte where bit-6 is not reserved for parity adjust. Here is a sketch of the command word (4 bytes) for sending data to the printer.
    There are two kind of data packets, selected by the REP bit (see figure). When REP=0, A7-A0, B7-B0, and C7-C0 are normal pixel data, forming a 24 bit packet. When REP=1, A7-A0 is a count of the repeated data length (minus 1), B7-B0 is the data to be repeated, and C7-C0 is an additional 8-bit data append to the end of the data stream. This second format is useful when the data repeats itself in 8-bit patterns, so the dithering for gray-scale data is better encoded as 8-bit or some multiple of eight for better compression. Some examples may clarify better the compression algorithm. Most pages are made of large white spaces with a few islands of painted pixels. Those pages will be found with the packets (numbers given in hexadecimal base) 7F-83-40-80. As you can see, the most significant bits of the four bytes conform to the packet standard given above, and REP=1 in the first byte (7F = 01111111b). So, we have a run-length-encoded data, with the counter A = 11111111b+1, or 256 (decimal). The data to be repeated is B = 00 and the byte to be appended is C = 00 as well. Then we will have this packet representing (256+1)*8 = 2056 pixels of white space. A more interesting example is given by 4D-B0-49-E0, representing a repeating pattern of 10011100b (where the 1 is a black pixel, and 0 is white) repeated fourteen times, followed by 10000000b (only the first pixel black). Printer page dimensions My ML-85G printer have only 512K bytes memory and as such have to receive the page's pixels in parts or chunks called "bands" in GDI nomenclature. For a A4 (better yet, letter) standard sized page, there may be found 61 those bands, each composed of 104 scanning lines. Each line have 4800 pixels at a resolution of 600 dpi (dots per inch), and then we can calculate the total pixels for each band as 4800 * 104 = 499200. This size in bytes is 499200/8 = 62400. Of course, this is easy to be buffered inside the printer, that have 512K memory, but not so easily for all bands together, that exceeds 4MB in size! In the next article, I will explain the other protocols for setting up the printer and checking its status. Of course, this other part will be needed for writing a driver for the domesticated monster. I would like to make known also that Samsung refused to give me technical information about the ML-85G. Maybe they even don't know how their printer work, or maybe they signed a contract with M$ for not releasing such information. Anyway, I don't require their information anymore. Rildo Pragana