GearPrototypes de recherche

Every piece of software I develop is licensed under the GNU General Public License v3. Data-Peeler, fitcare, multidupehack (which superseded Fenster in any way), Biceps and NclusterBox are available from this page. Data-Peeler and Fitcare were designed and implemented within the LIRIS laboratory when I was working at INSA-Lyon. Please, send me an e-mail if you are interested in using and/or improving my most recent developments.

Index:

Data-Peeler

From a dataset, d-peeler computes every closed n-set (i.e., maximal rectangles of 1 modulo permutations on any of the sets) satisfying given constraints. With the --minimize (-m) option, it also post-treats these patterns to output a minimization of the input dataset.

Related publications

Data-Peeler was first presented at SDM'08:

Article Loïc Cerf, Jérémy Besson, Céline Robardet, and Jean-François Boulicaut. Data-Peeler: Constraint-Based Closed Pattern Mining in n-ary Relations. In SDM'08: Proceedings of the Eighth SIAM International Conference on Data Mining, pages 37–48. SIAM, April 2008. Acceptance rate: 14%.

A longer version was published in the ACM TKDD journal:

Article Loïc Cerf, Jérémy Besson, Céline Robardet, and Jean-François Boulicaut. Closed Patterns Meet n-ary Relations. ACM Transactions on Knowledge Discovery from Data, 3(1):1–36, March 2009.
If you use Data-Peeler, or a modified version of it, and publish your results, we ask you to cite this reference.

Download

Here is the source code. Please, take the time to read the INSTALL file before compiling d-peeler and the README file before using it.

Fitcare

From a classified dataset, fitcare computes the bodies of the rules concluding on the classes such that every rule is frequent in one class and not frequent in any of the other classes. Either every frequency threshold is bound to a parameter set by the user or these parameters are automatically learned. Then, fitcarc can apply these rules on unclassified data.

Related publications

Fitcare was first presented at DaWaK'08:

Article Loïc Cerf, Dominique Gay, Nazha Selmaoui, and Jean-François Boulicaut. A Parameter-Free Associative Classification Method. In DaWaK'08: Proceedings of the Tenth International Conference on Data Warehousing and Knowledge Discovery, pages 293–304. Springer, September 2008. Acceptance rate: 33%.

A longer version was published in the DKE journal:

Article Loïc Cerf, Dominique Gay, Nazha Selmaoui-Folcher, Bruno Crémilleux, and Jean-François Boulicaut. Parameter-free Classification in Multi-Class Imbalanced Data Sets. Data & Knowledge Engineering, 87:109–129, September 2013.
If you use fitcare, or a modified version of it, and publish your results, we ask you to cite this reference.

Download

Here is the source code. Please, take the time to read the INSTALL file before compiling fitcare and fitcarc. The installation includes man pages for both commands.

Multidupehack

From a fuzzy tensor, multidupehack computes every (closed) noise-tolerant n-set satisfying given constraints.

Related publication

Multidupehack was presented at ICDE'14:

Article Loïc Cerf and Wagner Meira Jr. Complete Discovery of High-Quality Patterns in Large Numerical Tensors. In ICDE'14: Proceedings of the 30th International Conference on Data Engineering, pages 448–459. IEEE Computer Society, April 2014. Associated poster. Acceptance rate: 20%.
If you use multidupehack, or a modified version of it, and publish your results, we ask you to cite this reference.

Download

Here is the source code. Please, take the time to read the INSTALL file before compiling multidupehack and the README file before using it.

This archive contains the datasets and the script to rerun all the experiments (and more) reported in the article Enforcement of Minimal Size and Area Constraints before and while Mining Patterns in Fuzzy Tensors, which will be presented at SAC 2023.

Biceps

Given a real matrix, Biceps lists muscly biclusters. A bicluster is a a subset of rows associated with a subset of columns. In any column of a muscly bicluster, the values in the rows of the bicluster are all strictly greater than those out. Moreover, the rows of the bicluster must not be a subset or a superset of the rows of another bicluster of greater or equal quality.

Related publication

Biceps was published in the Data Mining and Knowledge Discovery journal:

Article Bernardo Abreu, João Paulo Ataide Martins, and Loïc Cerf. Developing Biceps to Completely Compute in Subquadratic Time a New Generic Type of Bicluster in Dense and Sparse Matrices. Data Mining and Knowledge Discovery, 36(4):1451–1497, July 2022.
If you use Biceps, or a modified version of it, and publish your results, we ask you to cite this reference.

Download

Here is the source code. Please, take the time to read the INSTALL file before compiling biceps and the README file before using it.

This archive contains the datasets and the scripts to rerun all the experiments reported in the aforementioned article.

NclusterBox

NclusterBox modifies patterns, which hold in a fuzzy tensor, to maximize their explanatory powers and selects an ordered subset of the built patterns to summarize this tensor.

Related publication

NclusterBox was presented at SAC'23:

Article Victor Henrique Silva Ribeiro and Loïc Cerf. Summarizing Fuzzy Tensors with Sub-Tensors. In SAC'23: Proceedings of the 38th ACM/SIGAPP Symposium On Applied Computing, pages 362–364. ACM Press, March 2023. Associated poster.
If you use NclusterBox, or a modified version of it, and publish your results, we ask you to cite this reference.

Download

Here is the source code. Please, take the time to read the INSTALL file before compiling nclusterbox and the README file before using it.

This archive contains the datasets and the scripts to rerun all the experiments reported in the aforementioned article and this one all those reported in the article and the supplemental material submitted to IEEE Transactions on Knowledge and Data Engineering.

Valid
							       HTML
							       4.01
							       Strict Valid
								      CSS