Binh Tran

PhD Student School of Engineering and Computer Science

Binh Tran profile picture

Thesis Info

Research Interests: Evolutionary Computation, Feature Selection, Feature Construction, Dimensionality Reduction, Classification, Machine Learning
Thesis Title: Evolutionary Computation for Feature Manipulation in Classification on High-Dimensional Data
Supervisor: Prof Mengjie Zhang and Dr Bing Xue

 

Research Interest

Evolutionary Computation, Feature Selection, Feature Construction, Dimensionality Reduction, Classification, Machine Learning.

Refereed Journal Papers

Binh Tran, Bing Xue, Mengjie Zhang, and Su Nguyen. " Investigation on Particle Swarm Optimisation for Feature Selection on High-dimensional Data: Local Search and Selection Bias", Connection Science, Accepted 13 April 2016.
@ARTICLE{Tran2016investigation,
AUTHOR = {Tran, Binh and Xue, Bing and Zhang, Mengjie and Nguyen, Su},
TITLE = {Investigation on Particle Swarm Optimisation for Feature Selection on High-dimensional Data: Local Search and Selection Bias},
JOURNAL = {Connection Science},
VOLUME = {28},
NUMBER = {3},
PAGES = {270--294},
YEAR = {2016}
}

Feature selection is an essential step in classification tasks with a large number of features, such as in gene expression data. Recent research has shown that particle swarm optimisation (PSO) is a promising approach to feature selection. However, it also has potential limitation to get stuck into local optima, especially for gene selection problems with a huge search space. Therefore, we developed a PSO algorithm (PSO-LSRG) with a fast “local search” combined with a gbest resetting mechanism as a way to improve the performance of PSO for feature selection. Furthermore, since many existing PSO-based feature selection approaches on the gene expression data have feature selection bias, i.e. no unseen test data is used, 2 sets of experiments on 10 gene expression datasets were designed: with and without feature selection bias. As compared to standard PSO, PSO with gbest resetting only, and PSO with local search only, PSO-LSRG obtained a substantial dimensionality reduction and a significant improvement on the classification performance in both sets of experiments. PSO-LSRG outperforms the other three algorithms when feature selection bias exists. When there is no feature selection bias, PSO-LSRG selects the smallest number of features in all cases, but the classification performance is slightly worse in a few cases, which may be caused by the overfitting problem. This shows that feature selection bias should be avoided when designing a feature selection algorithm to ensure its generalisation ability on unseen data.

Binh Tran, Bing Xue and Mengjie Zhang. " Genetic Programming for Feature Construction and Selection in Classification on High-dimensional Data", Memetic Computing, Accepted 4 December 2015. (DOI:10.1007/s12293-015-0173-y).
@ARTICLE{Tran2015genetic,
AUTHOR = {Tran, Binh and Xue, Bing and Zhang, Mengjie},
TITLE = {Genetic programming for feature construction and selection in classification on high-dimensional data},
JOURNAL = {Memetic Computing},
VOLUME = {8},
NUMBER = {1},
PAGES = {3--15},
YEAR = {2015}
}

Classification on high-dimensional data with thousands to tens of thousands of dimensions is a challenging task due to the high dimensionality and the quality of the feature set. The problem can be addressed by using feature selection to choose only informative features or feature construction to create new high-level features. Genetic programming (GP) using a tree-based representation can be used for both feature construction and implicit feature selection. This work presents a comprehensive study to investigate the use of GP for feature construction and selection on high-dimensional classification problems. Different combinations of the constructed and/or selected features are tested and compared on seven high-dimensional gene expression problems, and different classification algorithms are used to evaluate their performance. The results show that the constructed and/or selected feature sets can significantly reduce the dimensionality and maintain or even increase the classification accuracy in most cases. The cases with overfitting occurred are analysed via the distribution of features. Further analysis is also performed to show why the constructed feature can achieve promising classification performance.

Refereed Conference Papers

Binh Tran Ngan, Mengjie Zhang, Bing Xue. " A PSO Based Hybrid Feature Selection Algorithm for High-Dimensional Classification". Proceedings of 2016 IEEE World Congress on Computational Intelligence/ IEEE Congress on Evolutionary Computation (WCCI 2016 /CEC2016).
@INPROCEEDINGS{Tran2016pso,
AUTHOR = {Tran, Binh and Xue, Bing and Zhang, Mengjie},
TITLE = {A PSO Based Hybrid Feature Selection Algorithm for High-Dimensional Classification},
BOOKTITLE = {Proceedings of 2016 IEEE World Congress on Computational Intelligence/ IEEE Congress on Evolutionary Computation (WCCI 2016 /CEC2016)},
YEAR = {2016},
PAGES = {3801--3808},
PUBLISHER = {IEEE}
}

Recent research has shown that Particle Swarm Optimisation is a promising approach to feature selection. However, applying it on high-dimensional data with thousands to tens of thousands of features is still challenging because of the large search space. While filter approaches are time efficient and scalable for high-dimensional data, they usually obtain lower classification accuracy than wrapper approaches. On the other hand, wrapper methods require a longer running time than filter methods due to the learning algorithm involved in fitness evaluation. This paper proposes a new strategy of combining filter and wrapper approaches in a single evolutionary process in order to achieve smaller feature subsets with better classification performance in a shorter time. A new local search heuristic using symmetric uncertainty is proposed to refine the solutions found by PSO and a new hybrid fitness function is used to better evaluate candidate solutions. The proposed method is examined and compared with three recent PSO based methods on eight high-dimensional problems of varying difficulty. The results show that the new hybrid PSO is more effective and efficient than the other methods. 

Binh Tran Ngan, Mengjie Zhang, Bing Xue. " Bare-Bone Particle Swarm Optimisation for Simultaneously Discretising and Selecting Features For High-Dimensional Classification". Proceedings of the 19th European Conference on the Applications of Evolutionary Computation (EvoApplications 2016, EvoIASP 2016). Lecture Notes in Computer Science.
@INPROCEEDINGS{Tran2016bare,
AUTHOR = {Tran, Binh and Xue, Bing and Zhang, Mengjie},
TITLE = {Bare-Bone Particle Swarm Optimisation for Simultaneously Discretising and Selecting Features for High-Dimensional Classification},
BOOKTITLE = {Applications of Evolutionary Computation: 19th European Conference, EvoApplications 2016, Porto, Portugal, Proceedings, Part I},
YEAR = {2016},
PAGES = {701–718},
PUBLISHER = {Springer}
}

Feature selection and discretisation have shown their effectiveness for data preprocessing especially for high-dimensional data with many irrelevant features. While feature selection selects only relevant features, feature discretisation finds a discrete representation of data that contains enough information but ignoring some minor fluctuation. These techniques are usually applied in two stages, discretisation and then selection since many feature selection methods work only on discrete features. Most commonly used discretisation methods are univariate in which each feature is discretised independently; therefore, the feature selection stage may not work efficiently since information showing feature interaction is not considered in the discretisation process. In this study, we propose a new method called PSO-DFS using bare-bone particle swarm optimisation (BBPSO) for discretisation and feature selection in a single stage. The results on ten high-dimensional datasets show that PSO-DFS obtains a substantial dimensionality reduction for all datasets. The classification performance is significantly improved or at least maintained on nine out of ten datasets by using the transformed “small” data obtained from PSO-DFS. Compared to applying the two-stage approach which uses PSO for feature selection on the discretised data, PSO-DFS achieves better performance on six datasets, and similar performance on three datasets with a much smaller number of features selected.

Binh Tran, Bing Xue and Mengjie Zhang. " Improved PSO for Feature Selection on High-Dimensional Datasets". Proceedings of the 10th International Conference on Simulated Evolution and Learning (SEAL 2014). Lecture Notes in Computer Science. Vol. 8886. Dunedin, New Zealand. December 15-18, 2014. pp. 503-515.
@INPROCEEDINGS{Tran2014improved,
AUTHOR = {Binh Tran, Bing Xue and Mengjie Zhang},
TITLE = {Improved PSO for Feature Selection on High-Dimensional Datasets},
BOOKTITLE = {Proceedings of the 10th International Conference on Simulated Evolution and Learning},
YEAR = {2014},
PAGES = {503--515},
PUBLISHER = {Springer}
}

Classification on high-dimensional (i.e. thousands of dimensions) data typically requires feature selection (FS) as a pre-processing step to reduce the dimensionality. However, FS is a challenging task even on datasets with hundreds of features. This paper proposes a new particle swarm optimisation (PSO) based FS approach to classification problems with thousands or tens of thousands of features. The proposed algorithm is examined and compared with three other PSO based methods on five high-dimensional problems of varying difficulty. The results show that the proposed algorithm can successfully select a much smaller number of features and significantly increase the classification accuracy over using all features. The proposed algorithm outperforms the other three PSO methods in terms of both the classification performance and the number of features. Meanwhile, the proposed algorithm is computationally more efficient than the other three PSO methods because it selects a smaller number of features and employs a new fitness evaluation strategy.

Binh Tran, Bing Xue and Mengjie Zhang. " Overview of Particle Swarm Optimisation for Feature Selection in Classification". Proceedings of the 10th International Conference on Simulated Evolution and Learning (SEAL 2014). Lecture Notes in Computer Science. Vol. 8886. Dunedin, New Zealand. December 15-18, 2014. pp. 605-617.
@INPROCEEDINGS{Tran2014overview,
AUTHOR = {Binh Tran, Bing Xue and Mengjie Zhang},
TITLE = {Overview of Particle Swarm Optimisation for Feature Selection in Classification},
BOOKTITLE = {Proceedings of the 10th International Conference on Simulated Evolution and Learning},
YEAR = {2014},
PAGES = {605--617},
PUBLISHER = {Springer}
}

Feature selection is a process of selecting a subset of relevant features from a large number of original features to achieve similar or better classification performance and improve the computation efficiency. As an important data pre-processing technique, research into feature selection has been carried out over the past four decades. Determining an optimal feature subset is a complicated problem. Due to the limitations of conventional methods, evolutionary computation (EC) has been proposed to solve feature selection problems. Particle swarm optimisation (PSO) is an EC technique which recently has caught much interest from researchers in the field. This paper presents a review of PSO for feature selection in classification. After describing the background of feature selection and PSO, recent work involving PSO for feature selection is reviewed. Current issues and challenges are also presented for future research.