Ensembling Methods for Selecting a Splitting Attribute in Decision Trees Learning Algorithms

Abstract

Learning decision trees involves choosing an attribute on which to split the dataset. The efficiency of decision trees depends on this choice. ID3 and CART, related to the classical algorithms for learning decision trees, enumerate all the attributes of the original sample, which is time-consuming, since it’s necessary to calculate the value of the informative criterion for all objects for all attributes. Previously, it was proved that the use of evolutionary algorithms for optimizing thresholds in decision tree learning algorithms can significantly speed up the learning process without loss of classification quality. Studies have also been conducted comparing various attribute selection methods, which have shown the high efficiency of the Separation Measure method. But it is known that methods in a team can work more efficiently, so the article compares the effectiveness of attribute separation methods with their ensemble. Due to the fact that a task can have hundreds of attributes, the classic voting methods won’t work. Therefore, a voting algorithm for attribute selection was developed and implemented. The method was evaluated on several classification problems. The classification accuracy is used as an estimate of the effectiveness of the methods and is averaged over all classification tasks.

Keywords: Decision trees, attribute selection methods, ensemble

Introduction

At the moment, artificial intelligence methods are actively used in all branches of human activity. This is due to the fact that there are many applied tasks with a large amount of data that can’t be manually processed by a person in a reasonable time. Such conditions create a need to improve existing and the emergence of new technologies. One of the development options is the development of additional algorithms for existing machine learning methods in order to increase their efficiency (Mitrofanov, 2020; Polin et al., 2020). Decision trees are one of the most popular and effective machine learning methods (Breiman et al., 1984). Its popularity is due to the relatively easy interpretability of the results even for specialists in other industries, except for machine learning (Polin et al., 2020; Vinogradov et al., 2009). And also because of the popularity, decision trees have a wide range of development. There are many different algorithms for learning decision trees. Decision trees are used as elements of larger algorithms, such as random forest or gradient boosting. Decision trees still have various applications, such as classification, clustering, forecasting, and others (Polin et al., 2020).

Decision trees are trained by sequentially splitting the original sample at each new node until the nodes are declared leaf (Mitrofanov & Semenkin, 2019). The decision tree learning process has several areas for improvement, one of which is the selection of the split-sample attribute (Mitrofanov & Semenkin, 2021). This attribute will be used for splitting at the specified node.

The article compares the effectiveness of some methods for selecting a sign of splitting a sample with an ensemble of such methods in solving practical problems of classification. The CART decision tree learning algorithm (Breiman et al., 1984) was taken as a basis.

Problem Statement

The classification problem is solved using a decision tree. The main disadvantage of decision tree learning algorithms is the complexity of choosing the splitting attribute of the original sample. This complexity lies in the fact that the classical algorithms for learning decision trees use exhaustive enumeration, which is obviously a resource-intensive procedure. The attribute selection procedure needs to be improved.

Research Questions

The presented work explores the following Research Questions:

  • The need to develop a procedure for ensembling attribute selection methods.
  • Investigation of the efficiency of the ensemble procedure for attribute selection methods.

Purpose of the Study

The purpose of this work is to develop and study the procedure for ensembling attribute selection methods. The article compares the effectiveness of the procedure with the previously considered feature selection methods.

Research Methods

Previously, the following attribute selection methods were considered (Hall, 1999; Kira & Rendell, 1992; Kramer, 1975; Singh et al., 2014; Volkov et al., 2019; Yu & Liu, 2003):

  • Selecting a splitting attribute by variance.
  • Selecting a splitting attribute by the mean absolute difference.
  • Selecting a splitting attribute based on the variance ratio.
  • ReliefF.
  • Fisher scoring.
  • Chi-squared score.
  • Correlation-based attribute selection.
  • Fast correlation-based filter.
  • Separation measure.

As shown in Table 1 the effectiveness of the presented attribute selection methods was studied on the following classification problems (Machine Learning Repository, 2022):

Table 1 - Classification tasks
See Full Size >

It’s well known that individual methods, even the most effective ones, can perform worse than a group of methods. Therefore, it is proposed to develop a procedure for ensembling attribute selection methods. Classic voting methods won't work for attribute selection, as there can be hundreds or even thousands of attributes. A new approach to the collective choice of an attribute in decision tree training was developed and implemented, which will allow the calculation of the overall importance of an attribute using several filtering methods.

Each individual attribute selection method calculates some "importance" score of all attributes and selects the attribute with the highest score. The collective choice method will normalize the "importance" coefficients from each of the methods used:

n o r m _ c o e f = ( c o e f - min c o e f ) m a x ( c o e f ) ,

where is the attribute “importance” coefficient, calculated using one of the methods; is normalized "importance" coefficient. For methods that select attributes in descending order of coefficients, the coefficients are reversed:

n e w _ c o e f = 1 - n o r m _ c o e f ,

where is the expanded importance factor for ascending accounting. We will summarize the resulting sets of coefficients by attributes and select the attribute that has the largest sum.

To study the effectiveness of the developed ensemble procedure, we will consider different groups of attribute selection methods:

  • All considered methods.
  • 3 best methods.
  • 3 medium methods.
  • 3 worst methods.

We will compare the obtained results with the efficiency of the best, average and worst method on each of the tasks. To do this, we present the results of the effectiveness of attribute selection methods for tasks (see Table 2):

Table 2 - Effectiveness of attribute selection methods
See Full Size >

Findings

To study the developed approach, according to Table 3, for each of the tasks, we choose 3 best, 3 average, 3 worst and best-average-worst (BAW) methods for selecting an attribute for ensembling in terms of efficiency.

Table 3 compares the ensemble method with different groups of methods and individual attribute selection methods.

Table 3 - Ensembling efficiency
See Full Size >

It can be seen from the results that none of the ensembles performs better than the best attribute selection method. This is most likely due to the fact that other methods in the ensemble give a negative contribution. To reduce the negative and increase the positive contributions, we will give each attribute selection method a weight in accordance with its effectiveness from Table 2.

Table 4 - Ensemble efficiency with weighting coefficients
See Full Size >

Ensembling with weights also doesn’t offer any performance gains over the best attribute selection method (see Table 4).

Conclusion

Let us average the efficiency of attribute selection methods and method ensembles over all classification problems for comparison (see Table 5).

Table 5 - Results averaged over 12 tasks
See Full Size >

Based on the results obtained, we can conclude that the most efficient method for selecting an attribute in the decision tree learning algorithm is Separation Measure.

Acknowledgments

This work was supported by the Ministry of Science and Higher Education of the Russian Federation (Grant № 075-15-2022-1121).

References

  • Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey.

  • Hall, M. A. (1999). Correlation-based Feature Selection for Machine Learning. Hamilton.

  • Kira, K., & Rendell, L. A. (1992). A practical approach to feature selection. In Machine learning proceedings 1992 (pp. 249-256). Morgan Kaufmann.

  • Kramer, G. (1975). Mathematical methods of statistics. Mir.

  • Machine Learning Repository. (2022). [Electronic resource]. Retrieved on October 05, 2022, from https://archive.ics.uci.edu/ml/index.php

  • Mitrofanov, S. A., & Semenkin, E. S. (2019). Differential evolution in the decision tree learning algorithm. Siberian Journal of Science and Technology, 20(3), 312-319.

  • Mitrofanov, S. A., & Semenkin, E. S. (2021). Tree retraining in the decision tree learning algorithm. IOP Conference Series: Materials Science and Engineering, 1047(1), 012082.

  • Mitrofanov, S. A. (2020). Application of genetic programming algorithm for designing decision trees and their ensembles. IOP Conference Series: Materials Science and Engineering, 734(1), 012098.

  • Polin, Y. A., Zudilova, T. V., Ananchenko, I. V., & Voytiuk, T. E. (2020). Decision Trees in Classification Problems: Application Features and Methods for Improving the Quality Of Classification. Modern High Technologies, 9, 59-63.

  • Singh, B., Kushwaha, N., & Vyas, O. (2014). A Feature Subset Selection Technique for High Dimensional Data Using Symmetric Uncertainty. Journal of Data Analysis and Information Processing, 2, 95-105.

  • Vinogradov, A. N., Lebedev, A. N., & Tereshonok, M. V. (2009). Using Decision Trees in Problems of Classification and Identification of Radio Signals. T-Comm - Telecommunications and Transport, (S-DSPA), 28-30.

  • Volkov, A., Yatskov, N., & Grinev, V. (2019). Selecting informative features of human gene exons. Journal of the Belarusian State University. Maths. Computer Science, 1, 77-89.

  • Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: a fast correlation-based filter solution. Proceedings of the Twentieth International Conference (ICML) (Washington), 856-863.

Copyright information

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

About this article

Publication Date

27 February 2023

eBook ISBN

978-1-80296-960-3

Publisher

European Publisher

Volume

1

Print ISBN (optional)

-

Edition Number

1st Edition

Pages

1-403

Subjects

Cite this article as:

Mitrofanov, S. A., & Semenkin, E. S. (2023). Ensembling Methods for Selecting a Splitting Attribute in Decision Trees Learning Algorithms. In P. Stanimorovic, A. A. Stupina, E. Semenkin, & I. V. Kovalev (Eds.), Hybrid Methods of Modeling and Optimization in Complex Systems, vol 1. European Proceedings of Computers and Technology (pp. 372-378). European Publisher. https://doi.org/10.15405/epct.23021.46