Prototypes de
recherche
Every piece of software I develop is licensed under the GNU
General Public License
v3. Data-Peeler, fitcare, multidupehack
(which superseded Fenster
in any way), Biceps
and NclusterBox are
available from this
page. Data-Peeler
and Fitcare were designed
and implemented within the
LIRIS laboratory when I was working at
INSA-Lyon. Please,
send me an e-mail if you
are interested in using and/or improving my most recent
developments.
Data-Peeler
From a dataset, d-peeler
computes every closed
n-set (i.e., maximal rectangles of 1 modulo permutations on
any of the sets) satisfying given constraints. With
the --minimize
(-m
) option, it also
post-treats these patterns to output a minimization of the
input dataset.
Related publications
Data-Peeler was first
presented at SDM'08:
A longer version was published in
the ACM TKDD journal:
Loïc Cerf, Jérémy Besson, Céline
Robardet, and Jean-François
Boulicaut.
Closed
Patterns Meet n-ary Relations.
ACM
Transactions on Knowledge Discovery from Data,
3(1):1–36, March 2009.
If you
use Data-Peeler, or a
modified version of it, and publish your results, we ask you
to cite this reference.
Download
Here is the
source code. Please, take the time to read the INSTALL
file before compiling d-peeler
and the README
file before using it.
Fitcare
From a classified dataset, fitcare
computes
the bodies of the rules concluding on the classes such that
every rule is frequent in one class and not frequent in any of
the other classes. Either every frequency threshold is bound
to a parameter set by the user or these parameters are
automatically learned. Then, fitcarc
can apply
these rules on unclassified data.
Related publications
Fitcare was first
presented at DaWaK'08:
Loïc Cerf, Dominique Gay, Nazha Selmaoui,
and Jean-François
Boulicaut.
A
Parameter-Free Associative Classification
Method. In
DaWaK'08: Proceedings of
the Tenth International Conference on Data Warehousing and
Knowledge Discovery, pages 293–304. Springer,
September 2008. Acceptance rate: 33%.
A longer version was published in
the DKE journal:
If you
use fitcare, or a modified
version of it, and publish your results, we ask you to cite
this reference.
Download
Here is the
source code. Please, take the time to read the INSTALL
file before compiling fitcare
and fitcarc
. The installation
includes man
pages for both commands.
Multidupehack
From a fuzzy tensor, multidupehack
computes
every (closed) noise-tolerant n-set satisfying given
constraints.
Related publication
Multidupehack was
presented at ICDE'14:
If you
use multidupehack, or a
modified version of it, and publish your results, we ask you
to cite this reference.
Download
Here
is the
source code. Please, take the time to read the INSTALL
file before compiling multidupehack
and the
README file before using it.
This
archive contains the datasets and the script to rerun
all the experiments (and more) reported in the
article Enforcement of Minimal Size and Area
Constraints before and while Mining Patterns in Fuzzy
Tensors, which will be presented at SAC 2023.
Biceps
Given a real
matrix,
Biceps lists
muscly biclusters. A bicluster is a a subset of rows
associated with a subset of columns. In any column of a
muscly bicluster, the values in the rows of the bicluster are
all strictly greater than those out. Moreover, the rows of
the bicluster must not be a subset or a superset of the rows
of another bicluster of greater or equal quality.
Related publication
Biceps was published in
the Data Mining and Knowledge
Discovery journal:
If you
use Biceps, or a modified
version of it, and publish your results, we ask you to cite
this reference.
Download
Here is the
source code. Please, take the time to read the INSTALL
file before compiling biceps
and the README
file before using it.
This archive
contains the datasets and the scripts to rerun all the
experiments reported in the aforementioned article.
NclusterBox
NclusterBox modifies
patterns, which hold in a fuzzy tensor, to maximize their
explanatory powers and selects an ordered subset of the built
patterns to summarize this tensor.
Related publication
NclusterBox was
presented at SAC'23:
If you
use NclusterBox, or a
modified version of it, and publish your results, we ask you
to cite this reference.
Download
Here
is the
source code. Please, take the time to read the INSTALL
file before compiling nclusterbox
and the
README file before using it.
This
archive contains the datasets and the scripts to rerun
all the experiments reported in the aforementioned article
and this
one all those reported in the article and the
supplemental material submitted
to IEEE Transactions on Knowledge and
Data Engineering.