Friday, April 6, 2018

Módulos JavaScript no servidor e no navegador

Aplicações web, mais especialmente as que são de maior porte, necessitam de bibliotecas javascript e de outros recursos como folhas de estilo (Bootstrap é um exemplo típico) e outras páginas html ou de texto plano, para funcionarem a contento. Alguns desses recursos devem ser carregados sequencialmente à medida que a página vem sendo carregada, daí colocarmos elementos como <script> e <link> (esse último para folhas de estilo), mas muitos desses programas só serão necessários bem adiante. Alguns podem ser carregados em background, e em paralelo, vários deles ao mesmo tempo. Isso acelera consideravelmente a apresentação da página web para o cliente, que aguarda ansiosamente. Além de usar os elementos acima (<script> e <link>), podemos efetuar chamadas XHR, ou seja, XMLHttpRequest, ou usando o padrão mais moderno do Fetch API. O problema nos elementos já mencionados é que o navegador precisa carregá-los sequencialmente, pois ele não tem idéia das dependências contidas nos scripts. Se soubéssemos que não há dependência entre os scripts, poderíamos fazer a carga em paralelo, daí a terminologia “Asynchronous Module Definition” ou AMD, que é um dos padrões para a definição de módulos cuja dependência é explicitamente identificada e que pode ser carregado sob demanda pelo navegador. Isso é explorado por carregadores de módulos como RequireJS (http://requirejs.org/) e muitos outros que visam a otimização da carga de módulos no navegador.

O formato AMD (Asynchronous Module Definition, em inglês) é uma tentativa de fornecer uma solução para módulos javascript que pode ser usada atualmente por desenvolvedores web do lado do cliente (navegador). Ele veio da experiência real do grupo do Dojo (https://dojotoolkit.org/) em modularizar seu código. O desenvolvimento desse padrão foi passado para o grupo AMDjs (https://github.com/amdjs). Ele é uma proposta para a definição de módulos onde tanto os módulos como suas dependências (em relação a outros módulos) podem ser carregados de forma independente, assincronamente. AMD representa também um passo certo na direção do sistema de módulos proposto para a padronização futura nos padrões do ECMAscript (javascript). Apesar de ter sido inicialmente definido como um formato para o CommonJS, ele é um padrão separado deste último.
Para criar módulos, usamos alguns padrões possíveis no javascript. O primeiro destes é o “immediately invoked function expression (IIFE)”, ou seja, uma expressão que define uma função e permite ela ser invocada imediatamente. Conseguimos isso colocando a definição da função (anônima) entre parênteses. Se não fosse assim, a definição não seria uma expressão e a tentativa de executá-la de imediato resultaria num erro.
(function(){
 // definição da função
})()

Os dois parênteses ( ) no final são responsáveis pela invocação imediata da função. Poderíamos incluir no interior destes, argumentos a serem passados para a função.
Esse padrão (IIFE) nos permite encapsular código no interior da função, de modo que não precisamos conhecer detalhadamente o que a função IIFE faz, mas podemos usar esse código externamente. Ademais, como o interior de uma função é uma closure, podemos criar variáveis no seu interior, sem poluir o espaço global. Podemos ter múltiplas definições de funções inclusive, retornando um objeto com as várias funções que queremos exportar, todas invisíveis de outra maneira, por estarem numa closure. Isso nos conduz ao segundo padrão útil, o “revealing module pattern”.
var globModulo = function(){
 function ola(){
   console.log('Olá, módulo!');
 }
 return {
   ola: ola
 }
}()

Após salvar o retorno em globModulo, podemos executar globModulo.ola( ) para invocar a função definida internamente. Obviamente, várias outras funções poderiam ser definidas no interior desse bloco. Observe que a definição da função não está entre parênteses como acima. Isso não é necessário porque a função não é a primeira declaração na linha, que começa por uma atribuição a uma variável. Nesse caso, a inclusão da definição da função entre parênteses não é necessária e ela pode ser invocada imediatamente.

Os formatos mais comuns para módulos são o CommonJS, o AMD e o UMD, além do novo formato padronizado pelo ES6.


Módulos CommonJS

Os módulos padronizados pelo CommonJS são definidos como arquivos js regulares, incluindo a atribuição de exports ou module.exports no seu conteúdo. O sistema de módulos então adicionará um wrapper que conterá essas variáveis como parâmetro, além de algumas outras, como veremos.

Um exemplo de módulo, usando a implementação do commonjs contida no NodeJS, é a seguinte:
// saudacoes.js
exports.bomdia = function() {
return "Bom dia!";
};
exports.boanoite = function() {
return "Boa noite!";
};

No programa principal, ou em outro módulo qualquer que precise utilizar esse módulo “saudacoes”, teremos:
var saudacoes = require("./saudacoes.js");
Ou simplesmente,
var saudacoes = require("saudacoes");
E após essa definição, podemos usar as funções saudacoes.bomdia() e saudacoes.boanoite(). Entretanto, nessa segunda forma (sem conter um caminho para um arquivo js diretamente, mas o nome do módulo, precisaremos de um arquivo package.json, que pode ser criado pelo comando “npm init”, num subdiretório do “node_modules”.


Todas as funções e variáveis criadas no módulo e não exportadas serão completamente invisíveis, graças ao wrapper introduzido.


(function(exports, require, module, __filename, __dirname) {
   module.exports = exports = {};
   // o conteúdo do arquivo com o módulo reside aqui
});


Para nossa conveniência, __filename e __dirname são respectivamente o nome do arquivo fonte e o diretório onde ele se encontra. A variável ‘module’ é um objeto com alguns ítens como “id”, “exports”, “parent”, “paths”, “children” (array com outros módulos). Ele tem referências circulares, o que o impede de ser visualizado por JSON.stringify(). Nesse caso, podemos dispor do módulo “json-stringify-safe” para essa função.

Módulos AMD

No navegador é mais interessante definirmos módulos AMD (assíncronos). Uma vantagem explícita desse tipo de módulos é que a carga destes pode ser feita independentemente, sem nos preocuparmos com a ordem de carga dos módulos. A definição de um módulo AMD é realizada num simples função:
  define(id?, dependencias?, factory);
onde o id do módulo é uma string opcional, bem como é opcional o array de dependências de outros módulos. O parâmetro “factory” é uma função que efetivamente cria o módulo, como por exemplo, se o módulo fosse jQuery, retornaria a tão familiar função “$”.
Vejamos um exemplo com a página html, um módulo (amdmod.js), um script inicial (main.js), usando o carregador RequireJS (http://requirejs.org/) feito especialmente para carregar módulos AMD no navegador.

index.html
-------------
<!DOCTYPE html>
<html>
    <head>
        <title>Uma pagina html</title>
        <!-- o atributo data-main attribute indica a require.js
             para carregar o script main.js depois que require.js
             é carregado -->
        <script data-main="main" src="require.js"></script>
    </head>
    <body>
        <h1>Texto do header h1</h1>
    </body>
</html>


amdmod.js
--------------
define(['jquery'] , function ($) {
    return function () {
        console.log("FACTORY em amdmod.js");
        // usa jquery para obter o texto do elemento h1
        console.log($("h1").text());
    };
});

main.js
---------
requirejs(["amdmod"], function(amdfun) {
    console.log("LOADED amdmod.js");
    amdfun();
});

Módulos UMD

Como desenvolvedores de software, nos deparamos frequentemente com a possibilidade de usar um determinado módulo no ambiente do navegador, e também no servidor (nodejs). Os métodos de inclusão funcionam de maneira diferente nos dois casos. Quando você inclui um módulo ou biblioteca usando um tag <script>, usualmente você estará criando variáveis globais que outros scripts poderão acessar, porém uma das vantagns de se usar RequireJS (ou outros carregadores de módulos) é que ele elimina a necessidade de dependermos de variáveis globais. Como podemos ter a mesma biblioteca carregada de ambas as maneiras? A resposta pode ser encontrada em módulos UMD (Universal Module Definition). Praticamente, o procedimento consiste em encapsular o código em uma espécie de boilerplate que tenta definir os tipos de módulos existentes na biblioteca e no sistema, escolhendo o que dá mais certo.

(function (root, factory) {
    if (typeof define === 'function' && define.amd) {
        // foi encontrada uma função define e esta parece ser
        // AMD, então crie um módulo AMD anônimo
        define(['b'], factory);
     } else if (typeof exports === 'object') {
        // parece ser nodejs (tipo commonjs)





        module.exports = factory(require('b'));
    } else {
        // nada suportado, defina 
        // como um global (root == window)
        root.returnExports = factory(root.b);
    }
}(this, function (b) {
    // use ‘b’ da forma apropriada
    // então retorne o que deve ser exportado pelo módulo
    // Pode ser um objeto ou simlesmente uma função.
    return {};
}));




Aqui está um vídeo mostrando como usar módulos CommonJS no NodeJS (servidor javascript)

Monday, February 19, 2018

Programação Reativa

A programação reativa é uma das formas mais modernas de desenvolvermos aplicações com partes visuais, interativas, na web, seja em desktops ou smartphones, onde todos os navegadores suportam versões modernas do javascript.

Esse artigo é uma transcrição do vídeo disponível no Youtube que eu gravei há algum tempo.  Assista ao vídeo, se preferir.


Olá,  pessoal.  Vamos hoje falar um pouco sobre a programação reativa. Esse novo paradigma de programação, que não é nem tão novo assim, está muito em voga, especialmente no uso de programação de páginas web interativas, ou aplicações web. Isso também val para os celulares, os smartphones.

Qual a vantagem dessa programação reativa? Vejamos. Todo mundo aqui já teve algum contato com planilhas (spreadsheets). As planilhas funcionam da maneira seguinte: digamos que numa determinada célula eu coloque uma expressão matemática a = b + c, ou seja, a célula "a" (não precisa ser essa letra extamente, só para simplificar) depende da letra "b" e da letra "c". Quando a letra "b" se alterar, ou quando a letra "c" se alterar, a célula com a letra "a" estará errada, precisa ser atualizada. É exatamente esse o princípio da programação reativa. Criamos as dependências. A célula "A", no caso, reagiria à qualquer alteração na célula "b" e "c". Isso pode ser implementado de muitas maneiras, inclusive por eventos, que é uma forma tradicional de implementarmos em javascript. Então vamos elaborar um pouco mais sobre isso.

Os desenvolvedores resolveram criar um manifesto sobre a reatividade. Então esse manifesto tem 4 princípios básico de reatividade, que qualquer biblioteca que for escrita deve seguir para manter a compatibilidade. Esses princípios seriam a responsividade, ou a interatividade, porque a responsividade significa a velocidade com que a aplicação deveria reagir a qualquer evento, podemos dizer assim, mas qualquer modificação feita pelo usuário. (O usuário clicou em determinado lugar.) Se espera que a aplicação responda rapidamente. Então isso é a responsividade.

A resiliência que seria ele ser tolerante a determinados erros e responder da maneira esperada, àquele erro que aonteça. Por exemplo, digamos que pedimos uma determinada aplicação que não está disponível (o servidor dessa aplicação) então ela deve ficar lá quieta, e não "sujar" a página toda ou deixar de funcionar por causa daquele módulo (digamos que fosse um módulo de tempo/clima). Então o resto da página está funcionando, mostrando notícias ou outras coisas, mas aquela parte que mostraria o clima, a temperatura, a pressão, a previsão nas próximas horas, etc, o servidor pode estar fora do ar o que não impede que o resto da página funcione. Então isso seria  resiliência.
Ele deve ser escalável, se a gente fizer uma ágina mais complexa ou uma aplicação web mais complexa, com mais recursos, ele não deve ficar mais lento só porque está mais complexo. Ou se a quantidade de dados aumentar porque nós estamos "pescando" dados de vários servidores ao mesmo tempo, então isso não deve oferecer resistência na velocidade, ou seja, a velocidade deve permanecer alta.

E o princípio básico, gerenciado por mensagens, ou por eventos. Significa o que? O usuário clica, a aplicação responde. O usuário bate uma tecla, a aplicação responde. E assim por diante.
Um outro princípio interessante, embutido nesses quatro aí, mas a gente não falou ele explicitamente, é que o mundo é assíncrono.  Vamos pegar um exemplo. A genta vai fazer um cafezinho, pega o pó do café, bota na máquina, ou na cafeteira elétrica, aí descobrimos logo depois que não tem açúcar.  Tem duas maneiras da gente fazer isso.  Ah! Não tem açṹcar então não iremos fazer café. Primeiro vou comprar o açúcar, depois quando voltar, eu ponho o pó do café na cafeteira, ligo a máquina, espero que ela termine, ao terminar o café está pronto, eu ponho o açúcar e tomo. Isso aí vai me fazer gastar mais tempo.  Seria mias natural, daí a gente dizer que o mundo é assíncrono, se eu digo: não, vou colocar o pó do café, vou ligar a cafeteira, vou no supermercado comprar o açúcar e enquanto isso a cafeteira está lá funcionando, ao mesmo tempo que eu estou comprando o açúcar. Quando eu voltar, o café já deve estar pronto, ou quase pronto, eu aguardo só um pequeno intervalo de tempo a mais para que ele seja concluido, adoço e tomo o café.  Então vou ficar muito mais satisfeito com essa segunda alternativa.
Isso também é uma explicação de como o mundo é assíncrono.

 Outra maneira de pensar em reatividade é com streams. O que são streams? Uma espécie de array, só que dinâmicos. Uma fila que tem cabeça, cauda, onde se inserem na cabeça os novos dados, e da cauda vai se retirando os dados já inseridos. Por exemplo, seu eu tivesse uma série de eventos que eu quisesse ordena-los, colocar nessa stream e do outro lado, ficar processando evento por evento, mantendo o histórico, inclusive a órdem que eles ocorreram. Isso é muito prático.  Digamos que eu tenha um arquivo muito grande na memória. Não vamos pensar em eventos diretament, mas digamos que eu tenha um arquivo bastante grande na memória e minha memória inteira não daria para colocar os dados contidos nesse arquivo. Então eu teria que ler por partes ele, e ir processando, depois ler outra parte e processar.  Em vez de ficar quebrando ele em pedaços, eu transformo ele numa stream, quer dizer, "venha um parte dele" e enquanto está vindo eu já estou processando. Então eu estou processando de uma forma reativa. Enquanto estão vindo os dados eu estou lendo filtrando esses dados, fazendo alguma ordenação, digamos por exemplo, eu poderia estar contando a ocorrência de caracteres, a frequência com que cada caractere acontece naquele texto (letra a, tantas vezes...). Então eu estaria contando, guardando estados (status) desse texto, mesmo que o texto seja maior que a minha memória, porque eu estou lendo à medida que eu estou necessitando. Então, transformando ele numa stream, eu posso processar de uma maneira como se fosse sequencial a leitura disso aí. Eu não tenho que parar, ler um pedaço, marcar onde eu li, etc.

Agora, muitas bibliotecas existem para a programação reativa em javascript. Algumas delas são famosas porque a maioria (essas famosas) são associadas a GUIs, ou seja, interface gráficas do usuário. Uma delas é a Meteor, automatiza (toma conta de tudo) o desenvolvimento de aplicações web completas, lado servidor e lado cliente. Inclusive, tenho um livro escrito sobre isso (voces podem encontrar aí em baixo).
Outra coisa, aproveitando que estou falando sobre a inclusão, não deixem de dar um "like" no vídeo e também se inscrevam no canal, caso ainda não sejam inscritos.

O Meteor seria uma dessas bibliotecas famosas usando programação reativa. Outra interessante é o React, da Facebook, que é uma biblioteca que só cuida da visão(view), ou seja, da aparência do programa na web, o DOM que é onde a gente desenha o que vai aparecer na tela do navegador. (as caixinhas de entrada, botões, caixas de texto, e por aí...ícones, imagens, parágrafos, etc.)  Bom, então o React que cuida só da parte visual, não cuida da parte de trás, o servidor, etc. E tem muitas, na realidade, eu contei mais de 20 bibliotecas que são um tanto quanto populares, algumas muito modularizadas.  Eu particularmente, gosto muito do Vue.js (acredito que se pronuncie assim porque é meio "francês"). Então essa Vue se dedica só à parte da visão também, a parte do navegador, mas ela é muito flexivel e é bem mais simples do que o Angular. por exempl. O Angular é uma outra reativa também, mas programada pensando de outra maneira.  Normalmente essa reatividade, esse processamento de dados em forma de streams é um fluxo só. Entram os caracteres e eles são processados no caso, são "displayados", colocados na DOM, no caso do React na DOM virtual, no Vue também. A maioria das bibliotecas reativas é assim, elas manem uma cópia do DOM (que é um DOM virtual) e esse DOM exportado, refletido no DOM real (o que está aparecendo realmente), porque esse DOM virtual deixa o acesso mais rápido.

Mas já estamos estendendo muito o nosso vídeo, então vou deixar mais discussões para uma outra oportunidade. Agradeço muito a visualização de vocês.
Até logo.

Thursday, February 15, 2018

Livros publicados

Livros publicados

Tenho alguns livros publicados que podem interessar aos leitores.
Visite minha página de autor na Amazon, ou veja os links abaixo.

Página de autor:  https://www.amazon.com/Rildo-Pragana/e/B01LYZ2TM4

Livros:

Curso Intensivo de JavaScript.  Ideal para principiantes na linguagem.
https://www.amazon.com/Curso-Intensivo-JavaScript-Portuguese-Pragana-ebook/dp/B077Z86X6Y/ref=asap_bc?ie=UTF8

NodeJS: JavaScript no Servidor.  Javascript, dedicado à programação dos servidores utilizando NodeJS, baseado no engenho V8.
https://www.amazon.com/Nodejs-javascript-no-servidor-Portuguese-ebook/dp/B01LZMH242/ref=asap_bc?ie=UTF8

Programando aplicações com AngularJS.  Utilizando o framework Angular (versão 1).
https://www.amazon.com/Programando-aplica%C3%A7%C3%B5es-com-AngularJS-Portuguese-ebook/dp/B0189PK16U/ref=asap_bc?ie=UTF8

Meteor Prático.  Utilizando o framework Meteor, que funciona no lado cliente mas também no servidor.
https://www.amazon.com/Meteor-pr%C3%A1tico-Portuguese-Rildo-Pragana-ebook/dp/B01H7J6NHA/ref=asap_bc?ie=UTF8

Curso Rápido da Lingaugem C.  Bom para complementar os textos utilizados nos cursos de engenharia ou onde C é ensinada a universitários, mas também serve para principiantes em C.
https://www.amazon.com/Curso-r%C3%A1pido-linguagem-C-Portuguese-ebook/dp/B078WKK8KQ/ref=sr_1_1?s=digital-text&ie=UTF8&qid=1518702260&sr=1-1&keywords=curso+rapido+de+C

Tradução do livro do NodeJS acima para o inglês.
https://www.amazon.com/NodeJS-server-side-javascript-Rildo-Pragana-ebook/dp/B01M1NHIE5/ref=asap_bc?ie=UTF8

Vários outros livros têm traduções, todos disponíveis na Amazon.  (dica: Procure por Rildo Pragana.)

Monday, March 6, 2017

Running a GDI printer under Linux part 6 - Writing the printer software

Running a GDI printer under Linux
part 6 - Writing the printer software


There are some other articles on this subject as well as the motivations for this work in my homepage.

In this final article of the series I will look at some details when writing the actual printer software. The main idea, as I said before, is to simulate the nearest possible the original working driver (under the other OS), so you will not have to think too much with command details nor discover the meaning of all registers inside the printer interface. If it works for the other, it will work for Linux as well. Of course, there will be found problems with timings, required commands (most are garbage, as you will discover, which explains why my Linux driver is twice as faster), and some minimum parameters to setup a correct printing image. The easiest way to get the parameters is printing with several page sizes but the same material (say, a image in MS-paintbrush) and noticing what changed between the captured data for both cases.
The pulses timings are more difficult to get. You will have to experiment with them, placing delays with usleep() or even sleep(). Get first the log generated by Bochs, calling the function bx_printf() at the devices.cc source file to get the timings in machine clocks and convert them to microsecond units. I have rewritten several times my Linux driver to get the best fit.
This article will be a tour through my printer driver's implementation, so you get a feeling of what to do after starting discovering your printer. As I have done myself, you can start to code your driver even before everything is well understood, if you follow the mimic principle. Much functionality of my printer was discovered that way, as my first driver was very crude to be useful. I dare you to write a working driver, even if it only write a tiny line, like my first experiment. Go ahead! Make your printer sing!

Mimicking the original driver

Will you have the chance to share some of my code? I don't really know, but I hope so. The low priced printers share several characteristics in common with mine: (1) they don't have enough memory, what mean you will have to print "bands", or "strips of paper", while the laser is burning the image in the printer's drum; (2) most of them need a fast way to transfer huge amounts of image data from the processor to printer's memory in real time, so there will be found non-standard protocols for that; (3) for the same reason, a compression algorithm will be used to encode image data. The compression is the tricky part of the reverse engineering, not so easy to mimic unless you understand it fully. I asked you in a previous article to make several patterned images to discover how it works. If you don't understand it fully, there is no way to write a printer driver, sorry.
Other matter of concern here is the parallel port emulation. If your printer supports SPP, EPP, and ECP, don't choose the latest! Linux is very efficient. Try to use the simplest protocol, because it is easier to debug. I mean SPP (standard parallel protocol), of course. A difficult issue I found in my printer was with the band sizes. First, each band must have an integer name of rows, in my case 4800 dots for each line. Second, when the size of compressed data varies too much, the printer gets lost, so I have to make a dynamic band sizing patch to my original compression algorithm (to see the full compression algorithm, please get the driver's source at ml85p-0.0.5.tar.gz, or at Metalab under /pub/Linux/hardware/drivers). The band is resized as it is compressed by fragments of code like:

       if ((cnt < LINE_SIZE) &&
          ((pktcnt + pcnt/256) < 5000) &&
          (linecnt<LINES_BY_PAGE))
       {
          cnt += LINE_SIZE;
       }
In this code, LINE_SIZE is the number of bytes for each printer's row (4800/8 = 600), pktcnt is the number of packets of compressed data assembled so far, while pcnt in the number of similar data found, that will generate other packets, each at most with 256 size. This gurantees my compressed bands will have about 5000 packets in size. The implementation seems very complex, but you have to account that each packet only can have 2, 3 or more bytes of data, so I have to take care of not overflowing the printed lines. When such overflow occurs, I see a shifted ugly page, or even a black band at the end of page (and here go my precious toner...).
You will have to recognize what is needed to reset the printer and which is the actual print page command. This is easy to get. Just capture your printer data without any printed page and you will get the reset procedure. There are different kind of commands, at least in my printer, so take care with the strobing of commands. The Samsung printer have two "kinds of strobe", I suppose one for selecting the printer's ASIC register and the other to send the register's new value. The lpoutw function show the two strobe sequences:

void
lpoutw ( int data,int type ) {
 int mask=0;
 char s[100];
 outlp(data);
 if (type) {
  mask=2;
 }
 coutlp( 4+mask );
 coutlp( 5+mask );
 sinpwfast(0x7f);
 coutlp( 4+mask );
 toggle_control(17);
}
The type argument tell it which kind of strobe to use. Of course, both generate a physical STB pulse, but with different AUTOFD signal levels. The toggle_control function call at the end is not well understood, but I told you that I mimic the windows interface. If I take out this function call, all my driver stops working, so let us leave it there! Your printer possibly will not be the same, but give attention to all control signals or you will be in trouble.

Help from ghostscript

Ghostscript is a nice postscript emulator, that translates not even to printer languages of most common printers, but also to several not printer related formats. I need my printer "understanding" postscript, so this is the way to go. I translate with ghostscript to an easy to process format and send to my printer from it.
I have chosen pbmraw as my target output format. I call ghostscript to translate postscript source to several pbmraw formatted pages, and then call my driver to read them (this format is very easy to read!) and send to the printer. The problem with this approach is that we have to get lots of disk spaces, for the image files are very large. The best approach is to pipe ghostscript output directly into my driver, so there will be no disk accesses at all. This is unix magic! The kernel connect ghostscript with my driver and as ghostscript send me image data, I read it and process in the flight. The driver must be aware of the exact end of each picture to look for the header of the next, or it will get lost.
Other problem I found was the page size. My default printer page is 4800 by 6774 pixels (as I only plan to use A4 paper) and ghoscript-generated pages varies in size. Then my routines have to be careful about that and fill the missing spaces or clip when the picture is larger, both for the width and the height of the pictures. After the piping mechanism was ready, the only disk space used was for the compressed images. This is a much lower requirement, and it is temporary, as I remove each page, after it is sent to printer.
There are some tricks for doing this in a modular fashion. First you save the bitmap file dimensions, when reading its header (I use bmwidth and bmheight variables). Then, when your get_bitmap() function is called, which returns one byte of bitmap data, you look if the bitmap's widht is greater than your page image size. If it is larger, simply read your page width and skip the remaining bytes from the bitmap file, otherwise read it's real size. If you clear the bitmap buffer before, (the memset() call) the space to the right of each line will be blanks, as expected.

unsigned char
get_bitmap () {
 FILE *dbgf;
 int i,k,tmp;
 if (bmcnt==0) {
  memset(bmbuf,0,800);
  if (linecnt<(bmheight-topskip)) {
   if (bmwidth > 800) {
    fread(bmbuf,1,800,bitmapf);
    bitmap_seek(bmwidth-800);
   }
   else {
    fread(bmbuf,1,bmwidth,bitmapf);
   }
  }
  bmptr = bmbuf+leftskip/8;
  bmcnt = LINE_SIZE;
  linecnt++;
 }
 bmcnt--;
 return *bmptr++;
}
The bitmap_seek() function is suppoed to do a seek, but I can't call fseek() directly, as I'm reading from a pipe! I just read and discard bytes with it. The variables bmptr, bmcnt, and bmbuf implement a simple buffer to get the next bitmap bytes, when get_bitmap() is called again. Notice linecnt, that tracks the line of the printed page output, and topskip and leftskip, that allows control of the margins at the top and left of the printed page. Ghostscript tends to put a larger margin at the top and left that I want, so the control is only to reduce those margins. It is easy though make them grow, if needed.

Pulse timing and status checking

A problem with gathering the pulses from Bochs is that, although the simulation is perfect in every detail, including real time clock of the virtual machine, the real hardware (printer) is not. He just "thinks" it is connected to a slow machine. So, most status reading will return an already ready condition. To know exactly what to look for, there are several possibilities:
  • you can check "unofficially" the status of the printer's port each time you change something, outputting data to it. When the printer driver check it again, you will notice what bits changed to get an idea of what is being tested. This is not infallible, but gives you a hint.
  • you can disassemble at the point the original driver is checking the status. Notice there are several such checks, and you must get them all to be sure you understand it. This breakpoint is tricky to be set, as I will explain below.
  • The first spying I made was to read the printer's status port each time sometime is written to it and log it to stderr or to the impr.log file. This file (impr.log) is being opened at the very start of Bochs and closed before it finishes, so I write stuff to it during the run. This spying is put at the bx_devices_c::outp(Bit16u addr, Bit32u value, unsigned io_len) routine, after checking that our printer port is being accessed.
          if ((addr <= 0x37a) && (addr >= 0x378)) {
                    port_real_outb(addr,value);
                    st379 = port_real_inb(0x379);
                    fprintf(stderr,"O%x,%x i1(%x)\n",(addr-0x378),value,st379);
            }
    Notice that port_real_outb() and port_real_inb() are not part of Bochs. I have included them to interface directly to the hardware. This routine is called when the virtual machine try to simulate an output. My code translate the simulation to real hardware access, but also reads the status port to st379, so we can see the status changing when the printer hardware detect the command. Otherwise, I would not see much, because the simulation is very slow compared with the hardware. 
    Of course, this output will be shown in real time. Sometimes, I change the stderr for impr_log (see my patched Bochs source) and log to it for further analysis, but it is good to see it in real time as well.
    To get the status checking disassembly, I modify the bx_devices_c::inp() function to print also the instruction pointer (EIP) when some port is being read. My capturing statement is fprintf(impr_log,"I%x(%x) 0x%x\n",(addr-0x378),ret,EIP);, so I get at the output (impr.log file) something like I1(7f) 0x80020965, the last large number representing the eip register at the time of the status checking. Then I filter all those statments (with grep I1 for instance), edit to cut everything except the last number and then sort | uniq it to have a list of all checking points for status. I wonder how useful are simple programs like uniq and sort and how much time I have been living without them (programming under msdos/windows).
    After getting these status checking points, I restart bochs (with the printer installed, of course) and disassemble several bytes after the input instruction, as the following example (in the example, I included 1 byte more at the beginning to show the reported "in" instruction):
            <bochs:6> disas 0x8001f350 0x8001f356
     8001f350: ec: in AL, DX
     8001f351: 24f0: and AL, #f0
     8001f353: 3cf0: cmp AL, #f0
     8001f355: 7522: jnz +#22
    Most of times you don't really need to know what bit is more meaningful . You can just do the same test at your Linux driver. First you shall have the addresses from the impr_log file, then you stop bochs pressing <Ctrl>-c (to stop the simulation) and execute the instruction disas 0x8001f350 0x8001f356. The second address is where to stop the diasassembly, so give something say ten bytes after the start address. Repeat the process for all recorded checkpoints you got with the procedure given before. You don't need to understand everything now, just record the assembly output. You don't really need to know much of assembly language, but at least your machine's architecture (registers) and a handful of logical instructions. In the example given, we are testing if the four high order bits of the status port are all turned on. If you can't understand this, please go read a good assembly language book or call for a friend's help.
    If you have a SMP (multiple processor) machine, you have to look for troubles when disabling interrupts. It is better to try first with only one processor. When everything works fine, you can rewrite a SMP version of your code. There will be critical parts of the data transfer that you will need all speed possible, or the printer will lose data, so it is unavoidable the use of cli() and sti() in some places.

    Tools for the future

    Real time techniques are invaluable to analyze unknown data streams. We can make a versatile logic analyzer with RT-Linux plus some driver code and a suitable graphical interface. I plan to make available in the near future something like that, not only to detect and reverse engineer printers, but anything connected to the parallel port or even other ports. The only concern here is that we will have to stop the cpu until the trigger conditions occur, and if they don't occur you will have a rock-solid frozen machine. Of course, we can use an interrupt source to do the triggering, but this doesn't guarantee real time performance, because RT-Linux, while much faster to react than a normal linux kernel driver, have a finite response time.


    Rildo Pragana <rildo@pragana.net>     Adventures in Linux Programming 

    Running a GDI printer under Linux part 5 - Solving the compress puzzle

    Running a GDI printer under Linux
    part 5 - Solving the compress puzzle


    There are some other articles on this subject as well as the motivations for this work in my homepage.

    No matter what acquisition method you use, you will finally get some compressed data. I expect most of data will be output at the baseport (0x378 at the normal parallel port), and if you are using my Bochs patched, you will see lines with O0,<something> in profusion. First try to run several distinct subjects, for instance, a blank page and your printer test page to see where large differences are found. Forget about sparse signals, like accesses tagged as O2,XX or I1(XX), for this first attempt.
    Then you will need to prepare yourself for several discovering experiments. You will have to use simple programs like Xpaint or bitmap to make your "drawings". See some example test patterns I have used to inspect my printer:

    A block of known dimensions like this is useful for some tests. You may need to experiment with several block sizes to discover what is the size of your packets. Some times it is not apparent where it have changed, for a change in one packet can propagate to many others. In that case, you will need a shifted pixel bitmap like the shown in the xpaint image below. A very important note is select margins zero when printing. Probably you will use MS-paintbrush for this purpose. It introduces a white margin in the paper (and so, in the compressed data stream), so before printing set the four margins to zero, after entering the dialog with Alt-t. If you forget to do this, all your bitmap will be shifted and much more difficult to understand.
    In the first experiments, don't push too much. It is better to do many small experiments, than a large one trying to cover every possibility. And also, don't be so economic. Spare some time trying to "understand" the language of your printer. Print several times the same subject (a simple block like the first figure, for instance) and try to find where the similarities reside, and what's meaningful or what don't matter. Remeber the enemy have spoiled the terrain with many garbage to make your life difficult or near to impossible.
    Thou that follow my counsel will reach the final victory!

    This second bitmap was one of my final test patterns. Notice that the first line have an interesting, almost random pattern. It is not line noise. I made it to convince myself that my compression algorithm reconstruction was correct.
    Draw a strategy to analyze your compressed data. For instance, draw many 1-line bitmaps with (1) 16 white dots + 1 black dot; (2) 64 black + 1 white, (3) 15 white + 1 black; etc. Your first goal is to know the size of a full filled packet. My packets were 4 bytes long, but yours can be oher sizes. If you do a systematic analysis, it is not so very difficult to find out.
    Dont' desperate if something goes wrong, or seems too random. Nothing is random, because itmust be printed and your printer is very, very much more stupid and dumb than you! If something is going wrong it is because you are not prepared yet to more complex experiments. Try many sizes of patterns and prefer simple patterns. Count the number of valid packets M$-windows is sending to your printer. Count both with white pages and with a page with a single 63 pixel line just at the upper row (to do this experiment, create a 64x1 with Xpaint and fill every pixel except the last with ink). Why 63 and not 64? Because this will cause most of bits in the packet flip and will be most visible!
    Look at the compressed data for my winprinter, as I've given in the first article of this series. It was not easy at all to decipher it! I have made many experiments (about a hundred pages printed, I guess). Well, I'm not a genius, so I had to experiment more as I expect you will do.

    The best tool to use is, of course, Bochs, because you will need only one machine, but it will be more time intensive and cannot solve your problems if your printer requires a faster data dump. The best test to see if it fits, is to print a test page from your printer driver inside Bochs. If it works, even leaving a scrambled image after the first lines, it is suitable. Don't go to RT-Linux just because your printer can't print a full page inside Bochs. RT-Linux is much harder to use and setup. I did both, and my printer also scrambled its printing, but the first lines of printing are the only thing you need to concentrate on while deciphering your compressed data.
    Next time I will show more details on the protocol discovering and some settings of my printer and how I discovered them. It's just a wrap-up of the series. If you have some question, while the matter is hot, please ask me by e-mail. I plan to port more printers, if someone give me another winprinter as a gift. I will not buy another printer, because I have plenty of printers now. Who wants another GDI printer ported to Linux?

    Rildo Pragana <rildo@pragana.net>     Adventures in Linux Programming 

    Running a GDI printer under Linux part 4 - Real Time Techniques

    Running a GDI printer under Linux
    part 4 - Real Time Techniques


    There are some other articles on this subject as well as the motivations for this work in my homepage.

    While Bochs is a good tool when the speed is not a limiting factor, there are times where we need to see the printer events at full speed. This is somewhat more complex and we will need to compile a special kernel to host our capturing tool.
    There is a kernel extension known as RT-Linux valuable for this kind of signal processing, allowing us to use a second computer as a logic analyzer of low cost. RT-Linux let we have full control of our machine, running the regular Linux kernel as a lower priority task or thread than its specially designed threads. Suppose that this second machine, or spy machine, will be connected to a modified "T" printer cable and can capture all signal modification occured during some time interval. We can also set some triggering condition for starting the capture or filter what is important to be stored. It can be seen that this logic analyzer is even better than many commercial instruments available. And if we run the printer from a slower computer than our spy, we can get every detail on the parallel port signals.

    The T Centronics cable

    You will need to open the DB-25 connector of a standard PC-to-printer cable and solder several wires, with care for not disturbing the present connections. Then at the other side of these wires, you put another DB-25 connector, that will be attached to a printed circuit or perforated or even other solderless board with some TTL data buffers as a kind of data selector. This is needed because our parallel port only have 5 input signals (we could save one part, but we choose to stay with just 4 signals for simplicity) and we will select which signals to spy by lowering the level of only one such buffers at a time. In the circuit given, for example, the SEL_HIGH_DATA selection corresponds to the value 11111110 (binary) being output through the spy's data port. BASEPORT is the base address of the spy's parallel port, usually 0x378 (but could be 0x3bc, check your /proc/ioport for your parport0 device).

    Fortunately, only a small number of signals are present at the port: 8 data signals, 4 control signals and 5 status signals. With a simple circuit like this: (you may also get the xfig original circuit)

    and a small real time thread code like this:
    void *lpt_thread_code( void *data ){
        int rb1,rb2;
        int cnt=4095; 
        while (cnt--) {
            while (!(inb(BASEPORT+1)&STB))
                ;
            outb(SEL_HIGH_DATA,BASEPORT);
            rb1 = (inb(BASEPORT+1) << 1)&0xf0;
            outb(SEL_LOW_DATA,BASEPORT);
            rb2 = ((inb(BASEPORT+1) >> 3)&0x0f) | rb1;
            rtf_put(1,&rb2,1);
            while ((inb(BASEPORT+1)&STB))
                ;
        } 
        pthread_wait_np();
        return NULL;
    }
    it will be possible, for instance, get strobed data until our buffer is full and then collect the gathered data from a /dev/rtf0 fifo device fed by our real time module. The drawback is, when the thread code is active, no other activity can take place in our spy machine, not even time interrupts, nor keyboard events, nothing, nada, niente. That's why it is crucial to have a well designed thread or we will have to push the reset button too much and wait for fsck!
    The code given is very easily understood. First it will wait until the STB signal is low (because it is inverted), at the first while loop. Then, it will get the four most significant bits from the data bus of the spied port, on its control port, shift them to align with bit position d7-d4 and finally repeat the procedure with the least significant position, shifting to the right and combining with the first 4 bits. The call to rtf_put will make the result available later when the acquisition phase finish. The last while statement will wait for the release of the STB signal. Of course, this code doesn't work with a GDI printer, because it doesn't honor the STB or other standard lpt signals, but this illustrate what kind of procedures we want to code. The counter cnt limit our acquisition so we don't stay hang forever. And the pthread_wait_np() call make the thread stop till the next period (that will never arrive).
    The real time fifo can be read as any other character device, for instance, just cat'ing the /dev/rtf0 device or copying it to a file. This thread code is not magic. You must make many experiment changing the code to suit to your printer protocol. A good first try is to get all ports, save it in a temporary integer and comparing each time with the value read before. When the value differ, you put it in the fifo and save it in the temporary variable and repeat the process. Then look at the captured data to see any interesting pattern on it.

    Some advice on the real time threads usage

    It is out of scope to explain here how to compile and install RT-Linux. But when you make RT device drivers remember you are in a lower level position then even the kernel, and as such, no printk or other non-reentrant kernel functions can possibly be called. There are a few library routines available from RT-Linux site, but we don't really need many. You can make your real time thread communicate by shared memory, but fifos are easier to work for streamed data like we have. You can have a time stamp attached to your readings, by calling clock_gethrtime( rtl_getschedclock() ). It returns the current real time scheduler time in nanoseconds units. Some parallel port pins have inverted logic. Please don't invert them at the real time. You need most of the CPU time for gaining speed. Instead, save them scrambled and make a utility routine to post-process your data.
    We can implement many triggering sources, both by analysing the signals, as well as letting the start of the real time thread after some event. For instance, if the interesting part of your measurement can be defined by your hearing of the printer sound, you can have a manual trigger. To implement it, create a command fifo (say /dev/rtf1), put the following: rtf_create_handler(IN_FIFO, &cmd_handler); at the init_module function, and make an external (user level) procedure pipe the command into the IN_FIFO device (defined by #define IN_FIFO 1). Generally you will need a most sophisticated triggering procedure. Use counters in the thread code for starting to capture after "n" times a given event occurs, so you will have a window into the data gathered.
    I am lucky because my printer didn't required so much real time tools to be analyzed. Next time I will show some reasoning I used to understand the compressed data and what kind of patterned data I did send to the printer, drawn by Xpaint and printed by MS-Paintbrush at the Bochs virtual machine side.


    Rildo Pragana <rildo@pragana.net>     Adventures in Linux Programming 

    Running a GDI printer under Linuxpart 3 - Tools and Techniques

    Running a GDI printer under Linuxpart 3 - Tools and Techniques



    There are some other articles on this subject as well as the motivations for this work in my homepage. This is the third article on the series Fighting against GDI. As good soldiers, all Linuxers are invited to battle the enemy, writing their winprinters drivers in the behalf of the free software world. Please forgive me for the bad style and language, for I don't have a good english teacher available to revise my writings. Feel free to send me corrections.

    About two weeks ago I went to a store to buy our monthly food and supplies and  I found there a real bargain: a laser printer for under R$ 600, (brazilian "reais". One real is changed for about 0.52 US dollars.) from Samsung, the ML-85G.  Usually I walk around from GDI printers, because of the trouble to get them working.  But I thought the time arrived when Linux is proliferating around us and I have several customers that want to enter the new wave. I decided then to enter the game, and as a good gambler I was not sure I was going to win, but I bet to try.
    In the first time, I have connected the printer to an old 486 with Win95 installed and connected to my home network to see how useful were it going to be.  With Samba, ghostview, including gprint, it worked well, but I noticed then that most of times Win95 were calculating the pixels and, as the memory inside the printer was only 512K, there must be some kind of very fast protocol to send the data as the printer drummer was being burned by the laser beam. Of course, I have had some previous exposure to lasers and electronics technology, as I make my life designing parts and assemblies for the industry.
    Then I looked for the various alternatives available.  I did already know that disassembling from windows programs is a burden at least, as one should expect to find many instruction modifying code, calls followed by extra inline arguments and all kind of masquerading to turn it in a difficult enterprise. But what should we expect from an interface with a limited number of pins? We have only 8 data bits, 5 input status signals and 4 more controls signals. So, it should not be so near to impossible as one could expect.
    Two alternative schemas caught my attention: (1) simulate a complete hardware environment, so I can trace all accesses to those parallel port pins and get a sequence of events; (2) capture data in real time, using another gadget, or even another faster PC to accumulate the results, while the printing occurs at the normal speed.
    I have tried both approaches with the most appropriate tools I could find and this article shows some of the reasoning and critical issues involved in the acquisition process.  The guiding point of my process is capturing a large data collection and then analysing it with the best intelligent tool invented so far: the human brain!  Nobody could possibly create something that complex that couldn't be simulated if we know everty detail of the interface and protocols. So don't expect me to explain everything, just patterns of data that will be simulated by another software, the Linux device driver.  I actually found that some data with similar properties could be grouped in some meaningful subroutines (that's exactly what subroutines exists: to group similar procedures) and then some revealing detail appear with the time from our bindly gathered data.

    Some protocol issues

    The parallel port protocol came from Centronics, as we know it.  It has very simple handshake signals to tell the printer when the data is ready to be read, and to tell back the computer that the printer actually processed the character, before moving to the next character. 

    Almost every parallel port printer follows this de facto standard, also known as IEEE 1284 (IEEE is the Institute of Electrical and Electronics Engineers, an association which I'm proud to participate as a member). The basic signals are the STB, BUSY and ACK signals, as shown in the figure.  GDI printers don't honor these signals. In fact, I found that mine laser printer transfer most of its data without any handshake signal being generated or tested.
    What to look for then? Think! Use your intuition, or ask some friend if you are clueless. You can also send some captured data to me, but please, try first to find out by yourself its meaning, for I'm not a magic nor I have all the time of the world to spare. If you find something really difficult to find with your tools, so please send to me. Probably I will find it interesting too and can help you to conduct more experiments and solve the maze. The central idea is to look for discrepant values embedded in a otherwise boring and repetitive pattern. Yes! The enemy uses such camouflage of signals to address you to an unteresting signal or pattern so you will find lost even before reaching the first real signals.  Please look at this captured data from my first experiment with RT-Linux and the circuit I will show you later:
    80 a0 00 a0 89 8a a6 07 a7 8b 8b 89 8c 8c 04 94
    3f 95 58 94 95 89 8a a6 07 a7 8b 8b 9a 89 8a a6
    07 a7 8b 89 8a a6 07 a7 8d 46 89 8a a6 07 a7 89
    8a a6 07 a7 8e 89 8a a6 07 a7 8d 4f 89 8a a6 07
    a7 89 8a a6 07 a7 8e 89 8a a6 07 a7 8d 01 89 8a
    a6 07 a7 8e 97 00
    This is very boring and don't show up what this is really. It was captured from the data lines (D0~D7) of the printer port from a second machine, qualified with the STB (strobe) signal. This means that, when STB goes active, it store one value (8 bits), then waited till STB is inactive, and repeat the process. However, this data can be more meagninful if we rewrite it like:
    80 a0 00 a0
    89 8a a6 07 a7 8b 8b
    89 8c 8c 04 94 3f 95 58 94 95
    89 8a a6 07 a7 8b 8b 9a
    89 8a a6 07 a7 8b
    89 8a a6 07 a7 8d 46
    89 8a a6 07 a7
    89 8a a6 07 a7 8e
    89 8a a6 07 a7 8d 4f
    89 8a a6 07 a7
    89 8a a6 07 a7 8e
    89 8a a6 07 a7 8d 01
    89 8a a6 07 a7 8e 97 00
    In the end, it turned out that this data is not really data but a giant stream of commands given to the printer ASIC, just as the camouflage of a small number of required signals. As I said before, the data is being sent with its own embedded handshake (the most significant bits). To read the enterlines fo the data remember the old psychological tests you're presented to enter high school or the puzzles your parents bought to you. It is like a joke, and if you're intelligent (aren't you?) you are going to win.  You don't need any fancy programs, just a couple of sed scripts. Personally, I like tcl, because it is my one-size-fits-all tool for many jobs. If I need a GUI (graphical user interface) it's there with tk. If I some interface to many others libraries, it's already implement or almost ready. Or if I need to really custom some new featuree, it is easily interface with C, even easier than many toolkits available for unices. 

    Bochs and boxes

    Kevin Lawton gave me a great gift by writing this nice and well documented program. Mandrakesoft decided to make it GPLed and sponsored Lawton's company, so we can have the best tool to spy inside deeply unknown programs, like the GDI layer of MS-Windows.  Altough Bochs have itself some nice and documented instrumentation interface, I'm somewhat undisciplined, so I made several patches on its sources to get  quick-and-dirty tricks of capturing and passing to the real hardware input/output accesses at the parallel port. 
    Such accesses must be given from a root (superuser) account and be allowed by ioperm() or iopl() system calls. Please refer to their manpages to know how to use them. They are a quick way for writing the final device driver outside the Linux kernel too. When I have some time to cleanup the dirty I've left...
    Bochs is a PC hardware simulator, very flexible, that also simulate the peripheral hardware like video card, mouse, keyboard  and disks of a virtual machine. The cpu can be chosen from a 386, 486 or Pentium class machine, with even the instructions per second speed selectable.  But all of this is done by C programs, so the speed is significantly lower, though useful for many experiments. Most of my GDI printer spying was done inside this virtual machine. You need to prepare a disk from scratch inside a linux file to get it running. Look that I said "a disk from scratch" and not a partition.  You can use fdisk with the filename, after reading the documentation and choosing a suitable disk geometry, than creating your msdos partition and formatting it. Then you can copy the files needed with the mtools package programs. I prefer instead to mount the partition with something like:
    mount -t msdos -o loop,offset=$[ 63*512 ] /opt/252M /bdsk
    Notice the expression $[ 63*512 ]  in shell syntax. The 512 is the size of each disk sector and the 63 is the first sector of my msdos partition, as reported by fdisk. If you're a systems administrator you're in your own, otherwise, RTFM. 
    Unfortunatelly, Bochs (that sounds "box", referring to the linuxes and bsd boxes) don't have parallel port support, not even in its BIOS (basic input output system).  As I plan to make all parallel ports access actually occur, I just started the main() with an ioperm(0x378,3,1). Please get the patched version of Bochs. For your convenience I included a pre-compiled binary, but the better is that you compile it yourself, as you will need to change the basic capture conditions several times with the feedback from previous runs. You must be aware too that Bochs need the disk before ready to start. You can make it boot from a floppy disk, but in the long run it's better if you install the target operating system inside the simulated disk image.  You may find difficult to install it from scratch, in each case you should install it very plain (vga, ms mouse, standard keyboard, etc) in another machine and the copy the pre-installed version to your image disk. Mount it like shown above. And please, be patient as Bochs is very very slow, as it simulate all instructions. Anyway it is fairly accurate and that's mos importatan to our experiments than speed. You will have plenty of time to run while you think on the results from the previous runs.
    The real capture is written inside two routines under iodev/devices.cc in the Bochs source directory. The routines are bx_devices_c::inp() and and bx_devices_c::outp().  We only need them to see if our i/o range is being selected and then write the data to one pre-opened log file.  We have to write very compacted things to not spend much time inside our code. It's far better to leave the interpretation to the other offline programs.  Here is a sample of my captured data:
    O2,2 i1(f)
    I1(f)
    I2(c2)
    O2,0 i1(4f)
    I1(4f)
    I1(4f)
    I2(c0)
    O2,2 i1(f)
    I1(f)
    I2(c2)
    O2,0 i1(4f)
    I1(4f)
    I1(4f)
    I2(c0)
    O2,2 i1(f)
    I1(f)
    Here we have "O" meaning output, "I" input. The number given next to it is the offset from the base port (0x378 for the first lpt) and then the data value. A "i" output means that we have read the port without any request from the intervenient operating system. I do this so we can know what was read fro the status port, just after MS-Windows write to the data or control port (where Windows didn't request it). Then I compare the previously read value with the same data requested later. Remeber that the speed of "virtualization" is hundred of times slower than a real machine. It is not time dependant for Windows, because Bochs fools it to think that only a short time elapsed, but the real devices (like the printer) have much faster responses. 
    Notice the data given in the begining of this article was extracted from large pieces of this kind of data streams. It was filtered with grep for only the "O0," lines, edited to strip all "O0," and "i1(.*)", and then grouped starting with the first character of the repeating pattern by a small tcl program: 

    #!/usr/local/bin/tclsh
    
    set s ""
    set n 0
    while {![eof stdin]} {
        gets stdin line
        catch {set byte [expr 0x$line]}
        if {$byte == 0x89} {
            puts $s
            set s ""
            set n 0
        }
        if {[incr n] == 16} {
            set n 0
            puts $s
            set s ""
        }
        append s "[format %02x $byte] "
    }
    puts $s
    Of course, you can easily construct your own tools with your favorite language. Use your creativity!
    Sometimes there is some hardware time-critical isues found that prevent us from getting the data with a working printer. In that case, Bochs can't give us much. We could look for other simulator, but I will show you some alternative ways (with RT-Linux) for gathering the data with the printer running at full speed. You will need to fire your soldering iron or grab your wire-wrapping tool for assembling a small 4 parts circuit. For now get a couple of  printer cables (yes, I said a couple) and get some 74HC374 integrated circuits and a printed circuit board or some experimenter's perfurated board.
    There is no way the devil can hidden itself. The believers will ever win!
    Wait for the next article and send me your comments. The saga will continue.


    Rildo Pragana <rildo@pragana.net>     Adventures in Linux Programming