Can We Really Differentiate Between Generations? A Data Mining Approach

Ağaoğlu, Mustafa; Yurtkoru, E. Serra; Börü, Deniz Elber; Ağaoğlu, Mustafa; Serra Yurtkoru, E.; Elber Börü, Deniz

doi:https://doi.org/10.15405/epsbs.2021.02.15

Can We Really Differentiate Between Generations? A Data Mining Approach

,

,

Abstract

Both academicians and practitioners frequently discuss generational differences in the work force. Yet the findings of the research are contradictory, some supporting and some refuting the arguments. The aim of this study is to test if we can differentiate the Gen X and Gen Y employees based on their attitudes toward work and work environment expectations. The study was conducted on 1633 employees (45% Gen X, and 55% Gen Y employees) using a multi-item questionnaire. A data mining approach was used to analyze the data and five classification techniques, decision tree algorithms (CART, CHAID, and C5.0), support vector machines (SVM), and artificial neural networks (ANN) were applied. The C5.0 algorithm outperformed other classifiers in terms of specificity, precision, and accuracy. Only in recall measure, ANN performed better than the other techniques. The results revealed the most important variables differentiating between Gen X and Gen Y were “using technology for all daily chores”, “ease of technology use”, “entrepreneurial intention”, “changing jobs frequently”, and “having friends who are from different cultures and religions”. The accuracy rate was just above 70%, which is a relatively good measure considering this was a social science research where survey data was used. Still it is not well enough to distinguish generations from data mining perspective.

Keywords: Attitudes toward work classification data mining generations performance measure work life expectations

Introduction

While work life is still struggling to adapt Gen Y and yet to embrace Gen Z employees, there remains a question to be answered. Can we really differentiate between generations?

Since the famous baby boomers, business literature has been interested in generations and this interest has been hyped by the technology driven Gen Ys. There are many studies investigating the differences between generations in terms of various organizational issues. As there are seven different generations living and four - even in some cases five - of them working together, this interest is quite understandable. However not all studies indicate significant difference between generations.

This study aims to investigate and try to differentiate between Gen X and Gen Y employees, the largest generations at work life, based on their attitudes toward work life and expectations about work environment using data mining techniques.

Generations

As early as 18th century, philosophers, and scientists, e.g. Comte and Quetelet, have started to discuss the linkage between the date of birth and development in the society. In the 1920s, the sociologist Karl Mannheim claimed that sharing same experiences contributes a frame of reference, a distinct consciousness that can be influential in people’s lives (Alwin & Mc Cammon, 2003; Joshi et al., 2011).

Based on this theory, generations are cohort-groups who have similar values, ideas, and attitudes with common experiences of living in the same timeframe. However, the idea of distinctive generations is, quite complex and even though there is generally a consensus on approximate time lines and generation names, their existence and impacts are not easily documented. Although sources disagree on the specific dates separating the generations from each other, the most frequently used time lines and generation names are as the Greatest Generation (born between 1901-1924), who experienced profound economic and social turmoil, World War I and eventually World War II; the Silent Generation (born between 1925-1945) who experienced the Great Depression and lived through the world wars; the Baby Boomers (born between 1946-1964) who lived the cold war and the human rights movement; the Generation X or Gen X (born between 1965-1979) who lived oil crisis in the world, the continuation of the Cold War; the Generation Y, Gen Y or sometimes called the millennials (born between 1980-2000) who lived during the Gulf War in Iraq, September 11, and globalization; Generation Z, or Gen Z (born between 2001-2010) who lived after terrorism, global recession, climate change; Generation alpha born after 2010 and expected to born until 2025 (Kelan, 2014; Lyons & Kuron, 2014).

Apart from intercultural differences, countries have roughly similar generations since majority have experienced the same historical events and still experiencing with the world as the World Wars, oil crises, global recessions, climate change and likewise. In Turkey, we can also include the country specific events and separate generations as generations who lived through last days of Ottoman Empire and World War II; generations who lived through the foundation and early days of The Turkish Republic; generation who lived through the multiparty system and later the military takeover; the generation who lived through the left-right conflict; the generation who lived during terrorism, and military coup period, lived both welfare and crisis; and last the generation who experienced better economic conditions and increase in conservatism and all these fall approximately the same time line as the above named generations (Adıgüzel et al., 2014). In today’s technology driven life, technological developments also started to separate cohorts from each other and we can easily name the baby boomers as radio, Gen X as television, Gen Y as computer and Internet, and Gen Z as smartphone generations. This categorization is also in line with the technological development in Turkey, hence these generation cohorts are relevant for Turkey as well.

Generations in the organizational context

When the generations are considered from organizational context, we consider generations as cohorts who share similar life styles, work values, and work life expectations. For example, the baby boomers are designated by their high sense of loyalty and commitment to their workplace for long years. The Gen Xs are labelled with high level of motivation and respect for authority. There was also a speedy increase in women’s labor force rates among Gen Xs (Dixon et al., 2013; Sparks, 2012). The Gen Ys have quite different values and expectations compared to generations preceding them. They are said to be more confident and autonomous, have greater mobility and less commitment, prone to technology and social networks (Alexander & Sysko, 2013; Dixon et al., 2013; Holt et al., 2012). They want to shape their lives; work-life balance is more important than ever, and they desire flexibility (Yüksekbilgili, 2013). Compared to Baby boomers, and Gen Xs, Gen Ys are more inclined towards entrepreneurship (Holt et al., 2012; Keleş, 2013; Yurtkoru & Elber Börü, 2019). It is known that since Gen Ys did not witness the times of economic crisis, because they have grown up with the Internet, interactive video games, and TV game shows as entertainment, they like to enjoy fun and win, they consume fast, they have high expectations (but they do not want to pay the price) this generation does not like to work (Alexander & Sysko, 2013; Kuyucu, 2014; Yüksekbilgili, 2013). The Gen Z’s are very new to the work life and there is not enough evidence to profile, however the early signs indicate some characteristics that are emerging as tech savvy, pampered, protected, risk averse and empowered (Queen, 2015).

Data mining classification models

Data mining applications are popular tools in understanding and solving business problems. However, they are not frequently used to analyze survey data in business research although they can be very helpful in revealing useful hidden information. Data mining can be best described as a process for exploring large amounts of data to uncover meaningful patterns and rules and making predictions for behaviors or outcomes (Linoff & Berry, 2011; Luan, 2002). It is the process of extracting valuable information and knowledge including trends, associations, changes, anomalies, and meaningful structures that are hidden in complex or large datasets (Han et al., 2012). Classification models are one of the most widespread techniques used in data mining.

Assigning a class label from a set of possible values to a set of variables using a classifier model is the purpose of classification. A learning algorithm is applied on a training dataset in order to build the classifier model. Before training phase, the class belonging of each record is evident for the training dataset. The classifier model built based on the training set is evaluated on an independent test dataset after this learning phase. If the classifier’s performance is high enough, it will be used for prediction purposes.

There exist numerous algorithms that can be used to build classifier models such as artificial neural networks (ANN), decision tree algorithms, support vector machines (SVM), logistic regression, and discriminant analysis (DA). As mentioned above ANN, SVM, and three different decision tree algorithms are used in this study.

Inspired from human brain, ANN is a system where information is exchanged between interconnected neurons. Using the given information, the connections between the neurons can be improved by modifying the varying weights in between, thus makes ANN capable of learning. Since ANN can be used even in the case with no relationship among classes and variables, it can be used in any complex pattern modelling. ANN can work for any classification problem and tolerate noisy data. However, it requires a long time to build an appropriate model with training and because of this it is often hard to interpret due to its hidden layer structures and nodes (Agaoglu, 2016; Han et al., 2012).

SVM is a regression and classification and technique which tries to locate a hyperplane in order to differentiate the classes by maximizing the margins and minimizing the classification error (Cortes & Vapnik, 1995).

Decision tree algorithms split the instances recursively into subgroups which are mutually exclusive. This process stops when it is impossible to get a further split with an improvement in statistical or impurity measures. Gini Index, Gain Ratio, and Information Gain (used in ID3, C5.0, CART respectively) are the most well-known impurity measures in decision trees (Breiman et al., 1984; Quinlan, 2004). Additionally; in order to improve the performance boosting, an ensemble learning method taking C5.0 as base classifiers is used (Breiman, 1998; Schapire, 1990). In boosting, a strong classifier can be built from weak classifiers that learn iteratively. On the other hand, the CHAID algorithm uses chi-square test for multiway splitting.

In order to evaluate classification models or classifiers, performance measures regarding the decision correctness of the classifiers are used. These performance measures are precision, specificity, recall, and accuracy, and so forth. If a binary classification task is assumed, values of the class can be either Positive (P) or Negative (N). There are four terms to be mentioned to understand these performance measures and confusion matrix. True Positive (TP) values are the actual positive (P) ones correctly identified by the classifier as positives, whereas False Positive (FP) values are the actual negative (N) ones incorrectly identified by the classifier as positives. In a similar fashion; True Negative (TN) values are the actual negative (N) ones correctly identified by the classifier as negatives, whereas False Negative (FN) values are the actual positive (P) ones incorrectly identified by the classifier as negatives (Agaoglu, 2016). These terminologies are also given in the confusion matrix of Table 01 .

Table 1 -

See Full Size >

The calculations of performance measures are given in Equation (1), (2), (3), and (4). Accuracy is calculated by dividing the number of correct predictions to the number of all predictions. Similarly, precision is calculated by dividing the number of correct predictions for positives to the number of predicted positives. Likewise, recall is calculated by dividing the number of correct predictions for positives to the number of actual positives. Lastly, specificity is the rate of correctly predicted negatives over actual negatives (Agaoglu, 2016).

$A c c u r a c y = \frac{T P + T N}{P + N}$ (1)

$P r e c i s i o n = \frac{T P}{T P + F P}$ (2)

$R e c a l l = \frac{T P}{P}$ (3)

$S p e c i f i c i t y = \frac{T N}{N}$ (4)

Problem Statement

Based on the above understanding, differences between generations can be used to help recognize how members of a generation act and interact in workplace (Vincent, 2005). As organizations’ success are determined by the behavior of these generations in the long-run, knowing how to motivate or influence different generation employees will assist managers to control a wide range of organizational outcomes such as conflict, turnover, and socialization, which has been a great challenge since the appearance of Gen Y employees (Luttrell & McLean, 2013).

Consequently, studying generations will help to plan and design work environment that fits employee needs and to motivate and manage different generations to work coherently and efficiently and also to increase their well-being. Naturally, traditionalists are very rare and baby boomers, if still working, are retiring from workforce, Gen Z are newly entering and therefore it is early to profile them, which leaves us with the Gen Xs and Gen Ys. If we can understand and name the differences between these two generations, we can also use this knowledge to project and describe the Gen Zs in the coming years.

Research Questions

Our study focuses on describing the similarities and dissimilarities between two generations namely Gen X and Y using a data mining approach. Hence our research questions are;

Q1: to what extent Gen X employees and Gen Y employees are similar?

Q2: to what extent Gen X employees and Gen Y employees are dissimilar?

Purpose of the Study

The purpose of this study is to test if we can differentiate the Gen X and Gen Y employees based on their attitudes toward work and work environment expectations. If the cohorts are distinct as proposed by the literature, we should be able to classify Gen X and Gen Y employees correctly looking at their preferences in work life. Classification models will be used to test this proposition and a high predictive accuracy obtained would denote the distinctiveness of the generations. Classification models will also indicate the most discriminating criteria for distinguishing generations’ work styles.

In our study, five classification techniques, decision tree algorithms (C5.0, CART and CHAID), support vector machines (SVM), and artificial neural networks (ANN) are selected to build classifiers on a dataset with responses given by Gen X and Gen Y employees to WLEA (Work Life Expectations & Attitudes) scale (Elber Börü & Yurtkoru, 2016; Yurtkoru & Elber Börü, 2015), and the performances of these classifiers are compared.

Research Methods

Instrument

As briefly discussed above, literature on generations imply that attitudes toward work and work environment expectations differ among different generation employees. Yet, especially when we consider Gen Y employees their technology proneness and entrepreneurial intention should be taken into consideration. The attitudes toward work and work environment expectations are measured by Yurtkoru & Elber Börü’s Work Life Expectations & Attitudes scale (WLEA) (Elber Börü & Yurtkoru, 2016; Yurtkoru & Elber Börü, 2015). Since WLEA has considerable number of items measuring technology proneness, no additional scale is used to measure technology. But, to measure entrepreneurial intention Liñán and Chen’s (2009) 10 items scale is also included in the questionnaire. The items are measured on a five-point interval scale where “totally disagree” equals 1 and “totally agree” equals 5. All together the instrument consists of 110 items including entrepreneurial intention, which is used as the average of ten items.

Sample

Data for our study are collected from 1633 employees from Istanbul, Turkey. The sample consists of 737 (45.1%) employees born between 1965 to 1980, and belong to Gen X; and 896 (54.9%) employees born between 1980 to 1995 and belong to Gen Y. The demographics of the sample are given in Table 02 .

Table 2 -

See Full Size >

Prior to the data mining process, the dataset is randomly divided into train and test data. 80% (1297) of the dataset is used for training and the rest 20% (336) is used to test the classifiers developed. Generations of the respondents are taken as the class variable (target value). Class variables are like the dependent variables in the classical statistical methods. All the other variables are input variables, which will be used to predict the class variable.

Findings

The performances of the five classification techniques applied are assessed using the test data according to precision, specificity, recall, and accuracy.

ANN classifier

A feed-forward backpropagation ANN method was implemented as the build setting. Two hidden layers, the first with 30 and the second with 10 nodes, were used in the classifier topology. The final model, which is the neural network that has the maximum accuracy, is reached after the stopping criterion has been satisfied for all neural networks. As a result, classifier achieved 103 True Positive (TP) and 127 True Negative (TN) instances, which indicates that 230 records of all test dataset are correctly labeled. On the other hand; there are 50 False Negative (FN) and 56 False Positive (FP) instances, totally 106 records are incorrectly labeled (See Table 03 ).

Table 3 -

See Full Size >

SVM classifier

We used an RBF function as the kernel function in SVM classifier build settings. The confusion matrix, given in Table 04 , shows the distribution of predictions. In the result of SVM classifier, there are 93 TP and 128 TN instances, which indicates that 221 records of all test dataset are correctly classified. On the other hand; there are 60 FN and 55 FP instances, totally 115 records are misclassified.

Table 4 -

See Full Size >

C5.0 classifier

We implemented multiway splits using gain ratio as impurity measure in the build settings. Minimum instances per child was taken as two for the stopping rule. Additionally, boosting with 50 trials was applied. Results given in Table 05 , shows the distribution of predictions. As a result, there are 92 TP and 146 TN instances, totally 238 of all instances are correctly labeled. On the other hand, there are 61 FN and 37 FP instances, totally 98 records are incorrectly labeled.

Table 5 -

See Full Size >

CART classifier

In this classifier, Gini index impurity measure with binary splits was used. Minimum change for impurity was taken as 0.0001 for the stopping rule. Table 06 shows the result of predictions. In the result of CART classifier, there are 78 TP and 134 TN instances, totally 212 of all records are correctly classified. On the other hand, there are 75 FN and 49 FP instances, totally 124 records are misclassified.

Table 6 -

See Full Size >

CHAID classifier

The CHAID classifier uses chi square test for multiway splitting. Epsilon for convergence was set as 0.0001 for the stopping rule. As shown in Table 07 , there are 81 TP and 126 TN, totally 207 instances are correctly labeled whereas 72 FN and 57 FP, totally 129 instances are incorrectly labeled in the classification matrix.

Table 7 -

See Full Size >

Confusion matrices of all five classifiers are combined in Table 08 , which shows the differences between classifiers in terms TP, TN, FN, and FP. According to TP values, ANN is the best and CART is the worst; whereas C5.0 is the best and CHAID is the worst in terms of TN values. In a similar fashion, ANN is the best and CART is the worst in terms of FN values; whereas C5.0 is the best and CHAID is the worst in terms of FP values.

Table 8 -

See Full Size >

Table 9 -

See Full Size >

In summary, when we compare the performance of the different classifiers, we see that C5.0 outperformed other classifiers according to specificity, precision, and accuracy. Only in recall measure, ANN performed better than the other techniques (See Table 09 ).

Conclusion

In this study, we conducted data mining applications to test if we can differentiate the Gen X and Gen Y employees based on their attitudes toward work and work environment expectations and entrepreneurial intentions.

Findings revealed that the most important five variables differentiating between Gen X and Gen Y are “using technology for all daily chores”, “ease of technology use”, “entrepreneurial intention”, “changing jobs frequently”, and “having friends who are from different cultures and religions”. These variables are found important in at least three classifiers. Other outstanding variables were “when a task is wanted from me I should be informed why I should do that task”, “when I get a new task, I ask what benefit I will get from it”, “I like competition”, “no restrictions to be at the office as long as you do the job”, “flexible working conditions”. These indicate, in line with the literature, employees who use technology more at ease and for all their chores, who have high turnover rate and desire to found their own companies and who do not want to be bound to offices and need flexible conditions are the Gen Y employees (Adıgüzel et al., 2014; Keleş, 2013; Yüksekbilgili, 2013; Yurtkoru & Elber Börü, 2019). In addition, they want to know why a task is asked from them and they want to know the benefit they would get from doing it. Based on this information, we can conclude that there is a difference between Gen X and Gen Y employees based on their attitudes toward work and work environment expectations and entrepreneurial intentions.

On the other hand, the results indicated that even the best performing classifier has an accuracy rate just above 70%. Even though this is a relatively good measure considering the nature of social science research, it is not well enough to distinguish generations from data mining perspective. Naturally, this study is limited with its sample, and the results should be further tested. Consequently, the question of generational differences seems to be open to further study.

References

Adıgüzel, O., Batur, H. Z., & Ekşili, N. (2014). Kuşakların değişen yüzü ve y kuşağı ile ortaya çıkan yeni çalışma tarzı: mobil yakalılar [The new working style that emerges with the changing face and the generation of the generations: mobile collars]. Süleyman Demirel Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 19(1), 165-182.
Agaoglu, M. (2016). Predicting instructor performance using data mining techniques in higher education. IEEE Access, 4, 2379-2387.
Alexander, C. S., & Sysko, J. M. (2013). I’m gen y, I love feeling entitled, and it shows. Academy of Educational Leadership Journal, 17(4), 127-131.
Alwin, D. F., & Mc Cammon, R. J. (2003). Generations, cohorts, and social change. In J. T. Mortimer & M.J. Shanahan. (Eds.), Handbook of the life course. (pp. 23-50). Kluwer Academic Publishers.
Breiman, L. (1998). Arcing classifier (with discussion and a rejoinder by the author). The annals of statistics. 26(3) 801-849.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, P. J. (1984). Classification and regression trees. Wadsworth International Group, CA.
Cortes, C., & Vapnik, V. (1995). Machine Learning. Kluwer Academic Publishers.
Dixon, G., Mercado, A., & Knowles, B. (2013). Followers and Generations in the Workplace. Engineering Management Journal, 25(4), 62-72.
Elber Börü, D., & Yurtkoru, E. S. (2016). Yeni kuşakların iş yaşamı tarzları üzerine ölçek geliştirme çalışması [Scale development study on the work life styles of new generations]. IV. Örgütsel Davranış Kongresi Bildiriler Kitabı, 64-69.
Han, J., Kamber, M., & Pei, J. (2012). Data mining concepts and techniques. Morgan Kaufmann Publishers.
Holt, S., Marques, J., & Way, D. (2012). Bracing for the Millennial Workforce: Looking for Ways to Inspire Generation Y. Journal of Leadership, Accountability & Ethics, 9(6), 81-93.
Joshi, A., Dencker, J. C., & Franz, G. (2011). Generations in organizations. Research in Organizational Behavior, 31, 177-205.
Kelan, E. K. (2014). Organising generations – what can sociology offer to the understanding of generations at work? Sociology Compass, 8, 20–30.
Keleş, H. N. (2013). Girişimcilik eğiliminin kuşak farkına göre incelenmesi [Investigation of entrepreneurship tendency according to generation difference]. Sosyal ve Ekonomik Araştırmalar Dergisi, 26, 23-43.
Kuyucu, M. (2014). Y kuşağı ve Facebook: y kuşağının Facebook kullanım alışkanlıkları üzerine bir inceleme [Generation Y and Facebook: A review on the Facebook usage habits of millennials]. Elektronik Sosyal Bilimler Dergisi, 13(49), 55-83.
Liñán, F., & Chen, Y. W. (2009). Development and Cross‐Cultural application of a specific instrument to measure entrepreneurial intentions. Entrepreneurship Theory and Practice, 33(3) 593-617.
Linoff, G. S., & Berry, M. J. A. (2011). Data mining techniques for marketing, sales, and customer relationship management. (3rd Ed). Wiley Publishing Inc. Indianapolis.
Luan, J. (2002). Data mining and its applications in higher education in new directions for institutional research.Wiley Periodicals, Inc.
Luttrell, R., & McLean, D. (2013). A new generation of professionals: Working with Millennials in 5 easy steps. Public Relations Tactics, 20(4), 15.
Lyons, S., & Kuron, L. (2014). Generational differences in the workplace: A review of the evidence and directions for future research. Journal of Organizational Behavior, 35, 139–157.
Queen, M. (2015). Ready or not… Here come gen z. https://www.linkedin.com/pulse/ready-here-come-gen-z-michael-mcqueen/
Quinlan, R. (2004). C5.0: An informal tutorial. http://www.rulequest.com/see5-unix.html.
Schapire, R. E. (1990). The strength of weak learnability, Mach Learn. 5(2) 197-227.
Sparks, A. M. (2012). Psychological empowerment and job satisfaction between Baby Boomer and Generation X nurses. Journal of Nursing Management, 20, 451–460.
Vincent, J. A. (2005). Understanding generations; political economy and culture in an ageing society. The British Journal of Sociology, 56(4), 579-599.
Yüksekbilgili, Z. (2013). Türk tipi Y kuşağı. Elektronik Sosyal Bilimler Dergisi, 12(45), 342-353.
Yurtkoru, E. S., & Elber Börü, D. (2015). Are we really different? Comparison of Generation X and Generation Y employees. Paper presented at European Association of Work and Organizational Psychology Congress –Respectful and effective leadership - managing people and organizations in turbulent times. Oslo, Norway.
Yurtkoru, E. S., & Elber Börü, D. (2019). The determinants of entrepreneurial intention of employees in Turkey. Paper presented at 15th International Strategic Management Conference. 27-29. Poznan, Poland.

Copyright information

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

About this article

Publication Date

13 February 2021

Article Doi

https://doi.org/10.15405/epsbs.2021.02.15

eBook ISBN

978-1-80296-100-3

Publisher

European Publisher

Volume

101

Print ISBN (optional)

-

Edition Number

1st Edition

Pages

1-224

Subjects

National interest, national identity, national security, national consciousness, social relations, public relation, public organizations, linguocultural identity, linguistics

Cite this article as:

Ağaoğlu, M., Yurtkoru, E. S., & Börü, D. E. (2021). Can We Really Differentiate Between Generations? A Data Mining Approach. In C. Zehir, A. Kutlu, & T. Karaboğa (Eds.), Leadership, Innovation, Media and Communication, vol 101. European Proceedings of Social and Behavioural Sciences (pp. 162-172). European Publisher. https://doi.org/10.15405/epsbs.2021.02.15

Copy citation text

Can We Really Differentiate Between Generations? A Data Mining Approach

Abstract

Introduction

Generations

Generations in the organizational context

Data mining classification models

Problem Statement

Research Questions

Purpose of the Study

Research Methods

Instrument

Sample

Findings

ANN classifier

SVM classifier

C5.0 classifier

CART classifier

CHAID classifier

Conclusion

References

Copyright information

About this article

Publication Date

Article Doi

eBook ISBN

Publisher

Volume

Print ISBN (optional)

Edition Number

Pages

Subjects

Cite this article as:

We care about your privacy

Manage My Preferences

Can We Really Differentiate Between Generations? A Data Mining Approach

Abstract

Introduction

Generations

Generations in the organizational context

Data mining classification models

Problem Statement

Research Questions

Purpose of the Study

Research Methods

Instrument

Sample

Findings

ANN classifier

SVM classifier

C5.0 classifier

CART classifier

CHAID classifier

Conclusion

References

Copyright information

About this article

Publication Date

Article Doi

eBook ISBN

Publisher

Volume

Print ISBN (optional)

Edition Number

Pages

Subjects

Cite this article as:

{title}

We care about your privacy

Manage My Preferences