Mass Estimation

This site contains the source codes for mass estimation and its applications in different data mining tasks.

Mass Estimation is a core data modelling method which models data distributions in terms of mass distribution, rather than density distribution, to solve various data mining problems. Mass estimation, as with density estimation, has been applied to different data mining tasks, e.g., anomaly detection, classification, clustering, information retrieval and regression. One of the advanatges of mass-based algorithms is that they can run orders of magnitude faster than the existing density-based counterparts. Papers on mass estimation are given below in the references.

The table below lists the software currently available for the various data mining tasks. The software are written in JAVA and integrated into WEKA.

Task | DEMass | Mass |
---|---|---|

Anomaly Detection | DEMass-LOF (JAR), LiNearN (JAR) | iForest, SCiForest, ReMass-iForest, HS-Tree |

Classification | DEMass-Bayes (JAR) | MassBayes (JAR) |

Clustering | DEMass-DBSCAN (JAR), LiNearN-Cluster (JAR) | MassTER (JAR) |

Information Retrieval | ReFeat, ReMass-ReFeat |

All software can be run within the WEKA GUI framework unless otherwise stated differently.

Alternatively, they can be run on the command line. An example using MassTER is given as follows using a data set consisting of 4 attributes (3 numeric plus class attribute), to construct 1,000 trees with a maximum tree height of 6 (i.e., 3 attributes X 2 levels), where each tree is built using a random subset of 256 instances. Assuming that all of the relevant JAR files are in the current directory. Note: '-c last' specify the class attribute.

`java -classpath "*" weka.clusterers.MassTER -c last -t data_set.arff -A 3 -D -E 5 -H 2 -N 1000 -W 256`

The latest method for (level 1) multi-dimensional mass estimation is now available (see reference: [Half-Space Mass] below.)

The first single dimensional mass estimation is implemented in MATLAB.

[DEMass-Bayes, DEMass-DBSCAN, DEMass-LOF] Ting, Kai Ming; Washio Takashi; Jonathan Wells; F. T. Liu; Sunil Aryal (2013). "DEMass: a new density estimator for big data". Knowledge and Information Systems 35 (3): 1–32.

[LiNearN, LiNearN-Cluster] Wells, Jonathan R.; Kai Ming Ting; Washio Takashi (2014). "LiNearN: A New Approach to Nearest Neighbour Density Estimator". Pattern Recognition 47: 2702–2720.

[ReMass-iForest, ReMass-ReFeat] Aryal, Sunil; Kai Ming Ting; Jonathan R. Wells; Takashi Washio (2014). Improving iForest with relative mass, Advances in Knowledge Discovery and Data Mining : 510–521.