Utilidades
Every software I developed is licensed under the GNU General
Public License v3. This page hosts some useful tools to test
pattern mining in synthetic noisy tensors, two Beamerposter
themes, and four Shell scripts.
Testing pattern mining in synthetic noisy
tensors
This archive
contains the source codes of four useful commands to test
pattern mining in synthetic noisy tensors. The first
one, gennsets, generates randomly positioned
n-sets of any sizes in a null tensor of any sizes. Then, the
second one, noise, or the third
one, num-noise, turns this "perfect" tensor
noisy. The difference between the two commands is
that noise outputs another Boolean tensor (the
noise only switches Boolean values),
wheras num-noise turns the tensor fuzzy (the
noise turns Boolean values into degrees of
membership). Finally, the fourth
command, n-set-diff, provides a tuple-based
comparison of the hidden patterns (as output
by gennsets) with those discovered in the noisy
version of the tensor.
Please, take the time to read the INSTALL files before
compiling these commands and the README files before using
them.
If you want to reproduce experiments reported in the
article
introducing Fenster,
you may be interested
in these
Bash scripts. The source code of DCE is not available on
the website of its
author anymore. Since it was distributed under the
terms of the GNU GPL, we are free to redistribute
it. Thus, here is the
source code of DCE. The only improvement w.r.t. the
original code is the added possibility to specify
different minimal size constraints for the different
dimensions of the tensor.
Beamerposter themes for the DCC
This
archive contains a
theme beamerposter. The background and most
of the colors come from DCC's
website. The archive contains as well an example
source file, a Makefile, and the resulting PDF. The
example poster presents some useful commands to structure
it.
To use the theme, beamer
and beamerposter are (obviously) required. On
Debian and derivatives (including the *buntu), use the
package manager to install latex-beamer
and texlive-latex-extra. If you do not want the
whole texlive-latex-extra package (or if your
distribution of choice does not
ship beamerposter),
visit beamerposter's
website.
If you feel nostalgic about DCC's previous website, here
is another
theme beamerposter that reuses its
background.
srtfold
Given in argument a maximal number of characters per
line, srtfold reformat .srt subtitles to
satisfy that constraint (except for words that, alone,
exceed it) and to display at most two lines at a
time. Consider for instance those subtitles:
1
00:01:10,000 --> 00:01:20,000
No entanto, apesar da eficácia de seu trabalho, ele se encontrou enfrentando o que ele descreveu como um dilema crescente.
2
00:01:20,000 --> 00:01:25,000
Quando voltei para Chicago, pensei na história dele até casa.
Processing them, srtfold 40 outputs:
1
00:01:10,000 --> 00:01:13,902
No entanto,
apesar da eficácia de seu trabalho,
2
00:01:13,902 --> 00:01:20,000
ele se encontrou enfrentando o que
ele descreveu como um dilema crescente.
3
00:01:20,000 --> 00:01:25,000
Quando voltei para Chicago,
pensei na história dele até casa.
And srtfold 20 outputs:
1
00:01:10,000 --> 00:01:10,976
No entanto,
2
00:01:10,976 --> 00:01:13,902
apesar da eficácia
de seu trabalho,
3
00:01:13,902 --> 00:01:16,748
ele se encontrou
enfrentando o que
4
00:01:16,748 --> 00:01:20,000
ele descreveu como
um dilema crescente.
5
00:01:20,000 --> 00:01:22,258
Quando voltei
para Chicago,
6
00:01:22,258 --> 00:01:25,000
pensei na história
dele até casa.
srtfold tries to avoid widows and orphans. It
heuristically balances the number of characters per line. It
prefers to break lines after punctuation marks. With a lower
priority, it prefers longer second lines too. If an input
subtitle cannot be displayed within two lines during the
provided time interval, that interval is divided into
sub-intervals whose duration is proportional to the number
of characters. srtfold can process
slightly-malformed subtitles (more than one blank lines
between subtitles, missing/supernumerary spaces around
"-->", missing hours, dots separating seconds from
milliseconds, incorrect padding) that it fixes. It removes
supernumerary spaces between words, but keeps the input line
breaks. It removes HTML tags too. Here
is the Shell
script. Executing it with no argument makes it print a
usage help.
To improve the readability of the output subtitles, some
spaces in the input subtitles can be turned non-breaking.
Appropriate substitutions depend on the language.
Two sed substitutions are commented in the
script: to turn non-breaking every space right before ":",
"?" or "!" (useful in French) or right after a word
consisting of a single alphanumerical character.
seq-phragmen
seq-phragmen
é este script
Shell (ou este
em inglês). Ele implementa o método ordenado de Phragmén
("formulation 3" na página 22
de Phragmén’s and
Thiele’s election methods, por Svante Janson). O método
agrega cédulas, que são ranqueamentos de candidatos por
ordem de preferência. É usado na Suécia desde 1921 e
satisfaz uma combinação de propriedades matemáticas única
(em 2024), em particular a "monotonia de comitê" e a
"representação justificada proporcional". Brill et
al. provam a última propriedade
em Phragmén’s
voting methods and justified representation.
Cada candidata ou candidato tem que corresponder a uma
linha em um arquivo. Por exemplo:
$ cat candidatos
Alice A.
Bob B.
Carole C.
Peggy P.
Quinn Q.
Rupert R.
Para desempatar, seq-phragmen prefere a
candidata ou o candidato antes no arquivo. Esse arquivo deve
ser dado em argumento. Se ele for o único
argumento, seq-phragmen roda
interativamente. Ele lista repetitivamente os candidatos,
cada um(a) precedido/precedida por um número, e pede uma
cédula:
$ seq-phragmen_pt candidatos
Aqui estão os 6 candidatos em ordem de desempate:
| 1. Alice A. |
3. Carole C. |
5. Quinn Q. |
| 2. Bob B. |
4. Peggy P. |
6. Rupert R. |
Digite a cédula, uma sequência de números em ordem decrescente de preferência,
ou nenhum dígito se não há mais votos:
Essa cédula tem que ser uma sequência de números em ordem
decrescente de preferência dos candidatos
correspondentes. Exceto dígitos e quebras de linhas,
quaisquer caracteres podem separar os números. Os dos
candidatos reprovados não devem ser entrados. Se um número
corresponde a nenhum candidato ou ocorre várias vezes, a
cédula é ignorada. Essas mesmas sequências podem ser linhas
em (um) arquivo(s) dados como (um) argumento(s)
adicional/adicionais. Se for o caso,
seq-phragmen não roda interativamente e cédulas
vazias são ignoradas, enquanto uma sequência sem dígito no
modo interativo termina a entrada.
Por exemplo, para agregar não interativamente 1034 cédulas
"Alice A. > Bob B. > Carol C.", 519 cédulas "Peggy P. > Quinn
Q. > Rupert R.", 90 cédulas "Alice A. > Bob B. > Quinn Q.", e
47 cédulas "Alice A. > Peggy P. > Quinn Q.":
$ yes 1 2 3 | head -1034 > cédulas
$ yes 4 5 6 | head -519 >> cédulas
$ yes 1 2 5 | head -90 >> cédulas
$ yes 1 4 5 | head -47 >> cédulas
$ seq-phragmen_pt candidatos cédulas
Alice A.
Bob B.
Peggy P.
Carole C.
Quinn Q.
Rupert R.
seq-phragmen toma segundos para processar as
1690 cédulas. Ele ranqueá qualquer candidata ou candidato
aprovada/aprovado pelo menos uma vez. Escrever --verbose (ou
-v) antes do arquivo com os candidatos detalha
os cálculos:
$ seq-phragmen_pt -v candidatos cédulas
* 1171 / (1 + 0) = 1171 voto(s) reduzido(s) para Alice A.
* 519 / (1 + 0) = 519 voto(s) reduzido(s) para Peggy P.
Com 1171 voto(s) reduzido(s) e possível desempate, seguinte no ranqueamento: Alice A.
* 1124 eleitor(es) satisfeito(s), agora em posição 1124 / 1171 = 0,959863, prefere(m) Bob B.
* 47 eleitor(es) satisfeito(s), agora em posição 47 / 1171 = 0,0401366, prefere(m) Peggy P.
* 1124 / (1 + 0,959863) = 573,509 voto(s) reduzido(s) para Bob B.
* 566 / (1 + 0,0401366) = 544,159 voto(s) reduzido(s) para Peggy P.
Com 573,509 voto(s) reduzido(s) e possível desempate, seguinte no ranqueamento: Bob B.
* 1034 eleitor(es) satisfeito(s), agora em posição 1034 / 573,509 = 1,80294, prefere(m) Carole C.
* 90 eleitor(es) satisfeito(s), agora em posição 90 / 573,509 = 0,156929, prefere(m) Quinn Q.
* 1034 / (1 + 1,80294) = 368,898 voto(s) reduzido(s) para Carole C.
* 566 / (1 + 0,0401366) = 544,159 voto(s) reduzido(s) para Peggy P.
* 90 / (1 + 0,156929) = 77,7922 voto(s) reduzido(s) para Quinn Q.
Com 544,159 voto(s) reduzido(s) e possível desempate, seguinte no ranqueamento: Peggy P.
(...)
Essas explicações batem com aquelas nas páginas 23 e 24
de Phragmén’s and
Thiele’s election methods, onde o mesmo exemplo é
tratado. Chamado sem argumento, seq-phragmen
mostra uma ajuda de uso.
pdf-page-grep
pdf-page-grep searches patterns (by default,
basic regular expressions) in PDF files and concatenate the
pages with matches. The arguments starting with "-"
(e.g., -F or --ignore-case) are
considered options. They are passed
to grep. The output file is named after the
"basename" of the last matching PDF file followed by
"-matches.pdf".
The script is very short and simple. It is an occasion to
learn. For this purpose, it is quite heavily
commented. Moreover, Lucas Westermann made me the honor to
write a pedagogical article about it. This article was
published
in the
issue 89 (pages 10–11) of the Full Circle Magazine,
which is freely readable.
If you do not want a PDF output but a text one with the
sole matching lines, then
install pdfgrep,
a fully-featured program that probably is in the
repositories of your GNU/Linux distribution.
Installation
Besides grep, the script mainly relies
on pdfinfo, pdftotext, pdfunite,
and pdfjam. In Debian and derivatives, the
package named "poppler-utils" provides the three first
commands and "texlive-extra-utils"
provides pdfjam. Both packages must be
installed. pdf-page-grep was tested
on bash and dash but probably
works as well on other shells. Just
download it and
execute it! Running it with no argument makes it print a
usage help.
Non-interactive usage
pdf-page-grep can be non-interactively
used. To do so, redirect the standard input to a file with
one pattern per line and an empty line at the
end. pdf-page-grep's exit status is 0 if pages
matched the patterns, and 1 otherwise.
offline-emerge
This Bash script allows to keep the power of Gentoo's
package manager without a local Internet connection. Every
action that requires fetching some files from the Net lists
instead, on a removable device (typically a USB key), the
URLs of these files. A script on the removable device takes
care of the download from any *NIX system (GNU/Linux, BSD,
Mac OS X, etc.) connected to the Internet. Back on the
Gentoo, a command allows to execute every postponed
action. In this way, despite the lack of an Internet
connection, it is easy to update the Portage tree, to
install a new software or to update the whole system.
Installation
An ebuild is
available here. Place
it in your overlay and execute:
# emerge offline-emerge
If you do not have an Internet connection, you first have
to manually
download the source code and place it among your
distfiles.
Finally you have to define the
variable MOVINGDIR
in /etc/make.conf. This variable must contain
the path to the directory offline-emerge will
use on your removable device. For example, to
use /media/usbdisk/moving-portage as a
directory on your removable device:
# echo 'MOVINGDIR="/media/usbdisk/moving-portage"' >>
/etc/make.conf
At the first execution of offline-emerge, the
files on the removable device will be created.
Usage
If you can read
French, the
French version of this section may be useful to you. To
know everything about offline-emerge, please
read its manual:
$ man offline-emerge
trivialibre
According to its
Web site, "Trivialibre is a set of questions dealing
with Free Software for the famous Trivial Pursuit
game". My tiny contribution consists
in a small
Shell script that asks those questions in graphical
dialog boxes. Because the original questions are in
French, you probably are not interested in them (if you
are,
download this
archive instead). However you may still be interested
in the script as a way to, for instance, study for
exams. Indeed, adding/removing questions (resp. categories
of questions), is only a matter of editing the files in
the "categories" directory (resp. adding/removing files in
this directory). The script itself needs not be touched
and was translated into English. A few questions from
Trivialibre were translated as well to make examples.
Features
- The player chooses a category or select "Random!" for a
random selection;
- A 20s delay (with a progression bar) is let for the
player to find an answer (or not);
- This delay, in seconds, actually is a default value,
hence can be modified (it is the only, facultative, option
of the script);
- The yet unasked questions are stored on the disk so that
the game can be quit without risking being asked the same
questions next time;
- When a category is exhausted, all its questions are
taken in a different order.
The script is very short and simple. It is an occasion to
learn. For this purpose, it is quite heavily
commented. Moreover, Lucas Westermann made me the honor to
write a pedagogical article about it. This article was
published
in the
issue 58 (pages 5–7) of the Full Circle Magazine, which
is freely readable.
Installation
The script relies
on Zenity,
which is, I believe, installed on most GNU/Linux
distributions (especially those using GNOME). If it is not
the case, it probably is in the repositories. The other
executed commands should not raise any trouble if your
system has a reasonably recent version of GNU coreutils
(usage of shuf). The script has been tested
on bash, dash
and zsh. It should probably work as well on
other shells such as ksh.
To install the script,
this
archive (or
this one
for hundreds of questions in French) must first be
downloaded and uncompressed wherever you want. Then, it is
only about executing trivialibre to play. It
is, of course, possible to create a launcher (typically in
the "Games" menu) and an icon is available in the
archive.