Utilidades

Every software I developed is licensed under the GNU General Public License v3. This page hosts some useful tools to test pattern mining in synthetic noisy tensors, two Beamerposter themes, and four Shell scripts.

Índice:

Testing pattern mining in synthetic noisy tensors
Beamerposter themes for the DCC
srtfold
seq-phragmen
pdf-page-grep
offline-emerge
trivialibre

Testing pattern mining in synthetic noisy tensors

This archive contains the source codes of four useful commands to test pattern mining in synthetic noisy tensors. The first one, gennsets, generates randomly positioned n-sets of any sizes in a null tensor of any sizes. Then, the second one, noise, or the third one, num-noise, turns this "perfect" tensor noisy. The difference between the two commands is that noise outputs another Boolean tensor (the noise only switches Boolean values), wheras num-noise turns the tensor fuzzy (the noise turns Boolean values into degrees of membership). Finally, the fourth command, n-set-diff, provides a tuple-based comparison of the hidden patterns (as output by gennsets) with those discovered in the noisy version of the tensor.

Please, take the time to read the INSTALL files before compiling these commands and the README files before using them.

If you want to reproduce experiments reported in the article introducing Fenster, you may be interested in these Bash scripts. The source code of DCE is not available on the website of its author anymore. Since it was distributed under the terms of the GNU GPL, we are free to redistribute it. Thus, here is the source code of DCE. The only improvement w.r.t. the original code is the added possibility to specify different minimal size constraints for the different dimensions of the tensor.

Beamerposter themes for the DCC

This archive contains a theme beamerposter. The background and most of the colors come from DCC's website. The archive contains as well an example source file, a Makefile, and the resulting PDF. The example poster presents some useful commands to structure it.

To use the theme, beamer and beamerposter are (obviously) required. On Debian and derivatives (including the *buntu), use the package manager to install latex-beamer and texlive-latex-extra. If you do not want the whole texlive-latex-extra package (or if your distribution of choice does not ship beamerposter), visit beamerposter's website.

If you feel nostalgic about DCC's previous website, here is another theme beamerposter that reuses its background.

`srtfold`

Given in argument a maximal number of characters per line, srtfold reformat .srt subtitles to satisfy that constraint (except for words that, alone, exceed it) and to display at most two lines at a time. Consider for instance those subtitles:

1

	      00:01:10,000 --> 00:01:20,000

	      No entanto, apesar da eficácia de seu trabalho, ele se encontrou enfrentando o que ele descreveu como um dilema crescente.

	      

	      2

	      00:01:20,000 --> 00:01:25,000

	      Quando voltei para Chicago, pensei na história dele até casa.

Processing them, srtfold 40 outputs:

1

	      00:01:10,000 --> 00:01:13,902

	      No entanto,

	      apesar da eficácia de seu trabalho,

	      

	      2

	      00:01:13,902 --> 00:01:20,000

	      ele se encontrou enfrentando o que

	      ele descreveu como um dilema crescente.

	      

	      3

	      00:01:20,000 --> 00:01:25,000

	      Quando voltei para Chicago,

	      pensei na história dele até casa.

And srtfold 20 outputs:

1

	      00:01:10,000 --> 00:01:10,976

	      No entanto,

	      

	      2

	      00:01:10,976 --> 00:01:13,902

	      apesar da eficácia

	      de seu trabalho,

	      

	      3

	      00:01:13,902 --> 00:01:16,748

	      ele se encontrou

	      enfrentando o que

	      

	      4

	      00:01:16,748 --> 00:01:20,000

	      ele descreveu como

	      um dilema crescente.

	      

	      5

	      00:01:20,000 --> 00:01:22,258

	      Quando voltei

	      para Chicago,

	      

	      6

	      00:01:22,258 --> 00:01:25,000

	      pensei na história

	      dele até casa.

srtfold tries to avoid widows and orphans. It heuristically balances the number of characters per line. It prefers to break lines after punctuation marks. With a lower priority, it prefers longer second lines too. If an input subtitle cannot be displayed within two lines during the provided time interval, that interval is divided into sub-intervals whose duration is proportional to the number of characters. srtfold can process slightly-malformed subtitles (more than one blank lines between subtitles, missing/supernumerary spaces around "-->", missing hours, dots separating seconds from milliseconds, incorrect padding) that it fixes. It removes supernumerary spaces between words, but keeps the input line breaks. It removes HTML tags too. Here is the Shell script. Executing it with no argument makes it print a usage help.

To improve the readability of the output subtitles, some spaces in the input subtitles can be turned non-breaking. Appropriate substitutions depend on the language. Two sed substitutions are commented in the script: to turn non-breaking every space right before ":", "?" or "!" (useful in French) or right after a word consisting of a single alphanumerical character.

`seq-phragmen`

seq-phragmen é este script Shell (ou este em inglês). Ele implementa o método ordenado de Phragmén ("formulation 3" na página 22 de Phragmén’s and Thiele’s election methods, por Svante Janson). O método agrega cédulas, que são ranqueamentos de candidatos por ordem de preferência. É usado na Suécia desde 1921 e satisfaz uma combinação de propriedades matemáticas única (em 2024), em particular a "monotonia de comitê" e a "representação justificada proporcional". Brill et al. provam a última propriedade em Phragmén’s voting methods and justified representation.

Cada candidata ou candidato tem que corresponder a uma linha em um arquivo. Por exemplo:

$ cat candidatos

	    Alice A.

	    Bob B.

	    Carole C.

	    Peggy P.

	    Quinn Q.

	    Rupert R.

Para desempatar, seq-phragmen prefere a candidata ou o candidato antes no arquivo. Esse arquivo deve ser dado em argumento. Se ele for o único argumento, seq-phragmen roda interativamente. Ele lista repetitivamente os candidatos, cada um(a) precedido/precedida por um número, e pede uma cédula:

$ seq-phragmen_pt candidatos

	    Aqui estão os 6 candidatos em ordem de desempate:

	    

	    
	      
		1. Alice A.
		3. Carole C.
		5. Quinn Q.
	      
	      
		2. Bob B.
		4. Peggy P.
	        6. Rupert R.
	      
	    
	    

	    Digite a cédula, uma sequência de números em ordem decrescente de preferência,

	    ou nenhum dígito se não há mais votos:

Essa cédula tem que ser uma sequência de números em ordem decrescente de preferência dos candidatos correspondentes. Exceto dígitos e quebras de linhas, quaisquer caracteres podem separar os números. Os dos candidatos reprovados não devem ser entrados. Se um número corresponde a nenhum candidato ou ocorre várias vezes, a cédula é ignorada. Essas mesmas sequências podem ser linhas em (um) arquivo(s) dados como (um) argumento(s) adicional/adicionais. Se for o caso, seq-phragmen não roda interativamente e cédulas vazias são ignoradas, enquanto uma sequência sem dígito no modo interativo termina a entrada.

Por exemplo, para agregar não interativamente 1034 cédulas "Alice A. > Bob B. > Carol C.", 519 cédulas "Peggy P. > Quinn Q. > Rupert R.", 90 cédulas "Alice A. > Bob B. > Quinn Q.", e 47 cédulas "Alice A. > Peggy P. > Quinn Q.":

$ yes 1 2 3 | head -1034 > cédulas

	    $ yes 4 5 6 | head -519 >> cédulas

	    $ yes 1 2 5 | head -90 >> cédulas

	    $ yes 1 4 5 | head -47 >> cédulas

	    $ seq-phragmen_pt candidatos cédulas

	    Alice A.

	    Bob B.

	    Peggy P.

	    Carole C.

	    Quinn Q.

	    Rupert R.

seq-phragmen toma segundos para processar as 1690 cédulas. Ele ranqueá qualquer candidata ou candidato aprovada/aprovado pelo menos uma vez. Escrever --verbose (ou -v) antes do arquivo com os candidatos detalha os cálculos:

$ seq-phragmen_pt -v candidatos cédulas

	    * 1171 / (1 + 0) = 1171 voto(s) reduzido(s) para Alice A.

	    * 519 / (1 + 0) = 519 voto(s) reduzido(s) para Peggy P.

	    

	    Com 1171 voto(s) reduzido(s) e possível desempate, seguinte no ranqueamento: Alice A.

	    

	    * 1124 eleitor(es) satisfeito(s), agora em posição 1124 / 1171 = 0,959863, prefere(m) Bob B.

	    * 47 eleitor(es) satisfeito(s), agora em posição 47 / 1171 = 0,0401366, prefere(m) Peggy P.

	    * 1124 / (1 + 0,959863) = 573,509 voto(s) reduzido(s) para Bob B.

	    * 566 / (1 + 0,0401366) = 544,159 voto(s) reduzido(s) para Peggy P.

	    

	    Com 573,509 voto(s) reduzido(s) e possível desempate, seguinte no ranqueamento: Bob B.

	    

	    * 1034 eleitor(es) satisfeito(s), agora em posição 1034 / 573,509 = 1,80294, prefere(m) Carole C.

	    * 90 eleitor(es) satisfeito(s), agora em posição 90 / 573,509 = 0,156929, prefere(m) Quinn Q.

	    * 1034 / (1 + 1,80294) = 368,898 voto(s) reduzido(s) para Carole C.

	    * 566 / (1 + 0,0401366) = 544,159 voto(s) reduzido(s) para Peggy P.

	    * 90 / (1 + 0,156929) = 77,7922 voto(s) reduzido(s) para Quinn Q.

	    

	    Com 544,159 voto(s) reduzido(s) e possível desempate, seguinte no ranqueamento: Peggy P.

(...)

Essas explicações batem com aquelas nas páginas 23 e 24 de Phragmén’s and Thiele’s election methods, onde o mesmo exemplo é tratado. Chamado sem argumento, seq-phragmen mostra uma ajuda de uso.

`pdf-page-grep`

pdf-page-grep searches patterns (by default, basic regular expressions) in PDF files and concatenate the pages with matches. The arguments starting with "-" (e.g., -F or --ignore-case) are considered options. They are passed to grep. The output file is named after the "basename" of the last matching PDF file followed by "-matches.pdf".

The script is very short and simple. It is an occasion to learn. For this purpose, it is quite heavily commented. Moreover, Lucas Westermann made me the honor to write a pedagogical article about it. This article was published in the issue 89 (pages 10–11) of the Full Circle Magazine, which is freely readable.

If you do not want a PDF output but a text one with the sole matching lines, then install pdfgrep, a fully-featured program that probably is in the repositories of your GNU/Linux distribution.

Installation

Besides grep, the script mainly relies on pdfinfo, pdftotext, pdfunite, and pdfjam. In Debian and derivatives, the package named "poppler-utils" provides the three first commands and "texlive-extra-utils" provides pdfjam. Both packages must be installed. pdf-page-grep was tested on bash and dash but probably works as well on other shells. Just download it and execute it! Running it with no argument makes it print a usage help.

Non-interactive usage

pdf-page-grep can be non-interactively used. To do so, redirect the standard input to a file with one pattern per line and an empty line at the end. pdf-page-grep's exit status is 0 if pages matched the patterns, and 1 otherwise.

`offline-emerge`

This Bash script allows to keep the power of Gentoo's package manager without a local Internet connection. Every action that requires fetching some files from the Net lists instead, on a removable device (typically a USB key), the URLs of these files. A script on the removable device takes care of the download from any *NIX system (GNU/Linux, BSD, Mac OS X, etc.) connected to the Internet. Back on the Gentoo, a command allows to execute every postponed action. In this way, despite the lack of an Internet connection, it is easy to update the Portage tree, to install a new software or to update the whole system.

Installation

An ebuild is available here. Place it in your overlay and execute:

# emerge offline-emerge

If you do not have an Internet connection, you first have to manually download the source code and place it among your distfiles.

Finally you have to define the variable MOVINGDIR in /etc/make.conf. This variable must contain the path to the directory offline-emerge will use on your removable device. For example, to use /media/usbdisk/moving-portage as a directory on your removable device:

# echo 'MOVINGDIR="/media/usbdisk/moving-portage"' >>
	    /etc/make.conf

At the first execution of offline-emerge, the files on the removable device will be created.

Usage

If you can read French, the French version of this section may be useful to you. To know everything about offline-emerge, please read its manual:

$ man offline-emerge

`trivialibre`

According to its Web site, "Trivialibre is a set of questions dealing with Free Software for the famous Trivial Pursuit game". My tiny contribution consists in a small Shell script that asks those questions in graphical dialog boxes. Because the original questions are in French, you probably are not interested in them (if you are, download this archive instead). However you may still be interested in the script as a way to, for instance, study for exams. Indeed, adding/removing questions (resp. categories of questions), is only a matter of editing the files in the "categories" directory (resp. adding/removing files in this directory). The script itself needs not be touched and was translated into English. A few questions from Trivialibre were translated as well to make examples.

Features

The player chooses a category or select "Random!" for a random selection;
A 20s delay (with a progression bar) is let for the player to find an answer (or not);
This delay, in seconds, actually is a default value, hence can be modified (it is the only, facultative, option of the script);
The yet unasked questions are stored on the disk so that the game can be quit without risking being asked the same questions next time;
When a category is exhausted, all its questions are taken in a different order.

The script is very short and simple. It is an occasion to learn. For this purpose, it is quite heavily commented. Moreover, Lucas Westermann made me the honor to write a pedagogical article about it. This article was published in the issue 58 (pages 5–7) of the Full Circle Magazine, which is freely readable.

Installation

The script relies on Zenity, which is, I believe, installed on most GNU/Linux distributions (especially those using GNOME). If it is not the case, it probably is in the repositories. The other executed commands should not raise any trouble if your system has a reasonably recent version of GNU coreutils (usage of shuf). The script has been tested on bash, dash and zsh. It should probably work as well on other shells such as ksh.

To install the script, this archive (or this one for hundreds of questions in French) must first be downloaded and uncompressed wherever you want. Then, it is only about executing trivialibre to play. It is, of course, possible to create a launcher (typically in the "Games" menu) and an icon is available in the archive.

1. Alice A.	3. Carole C.	5. Quinn Q.
2. Bob B.	4. Peggy P.	6. Rupert R.