Abstract
In this article, the authors assess the methodological reliability of big data processing in sociological research. The authors compare sten score method and cluster analysis as methods of processing the results of sociopsychological tests aimed at identifying groups of young people potentially vulnerable to drug addiction. The survey was conducted in eight universities in a city in Siberia with a large student population where 22884 students aged from 18 to 25 were questioned. First, the obtained results were processed by using the sten score method. Then, cluster analysis was conducted to define a highrisk group of students having a propensity for drug consumption. Advantages and disadvantages of the two methods for processing a large sample of data are compared. The results of this comparison demonstrate that the cluster analysis method is the most appropriate method for this type of research as it produces statistically correct data. The use of cluster analysis makes it possible to work with any type of information, both qualitative and qualitative data. On the other hand, the sten scores method can only be applied in certain conditions, i.e. where the original distribution resembles a normal distribution; where some theoretical basis to expect normal distribution exists, and where there is certainty that the normalization group is sufficiently large and representative to be a true reflection of the population.
Keywords: Sten score methodcluster analysisdrug usedrug addictionsociological researchstatistical information
Introduction
The paper compares two methods of psychological test results processing, sten score method and cluster analysis. The object of the investigation is results obtained during the work on the project “Socialpsychological testing of students, aimed at early detection of nonmedical consumption of narcotic and psychotropic substances”. The study was conducted in eight universities of a Siberian city with a large student population; 22884 students aged from 18 to 25 were questioned. First, the obtained results were processed by using sten score method. Then, cluster analysis was conducted to define a highrisk group of students having a propensity for drug consumption.
Problem Statement
The paper considers the methodological reliability of big data processing in sociological research. One of the tasks of the study, which was solved by statistical methods, was to identify amongst students a group of individuals who (Newcomb & Bentler, 1988), according to their personal characteristics and circumstances, are vulnerable to drug use and addiction (Restivo & Loughlin, 2016).
Research Questions
The research, presented in this article, focuses on the psychological predisposition to drug use (Dembo R, et al., 1988) and addiction, whereby a combination of certain character traits, education, attitude to life (Drugs, Brains, and Behavior: The Science of Addiction, 2014), personal and environmental coping mechanisms (Sinha, 2008), family relationships (Barrett &Turner, 2006), sense of loneliness, negative emotional ambiance (Hayaki et al., 2005) and other factors may form a psychoprotective coping behavior (Spooner & Hetherington, 2004) that makes a person vulnerable to drug addiction (Patrick, Wightman, Schoeni & Schulenberg, 2012). It is assumed that in those situations, bans, high prices or other obstacles in obtaining drugs do not play a significant role (Blum et.al., 2012; Sinha, 2008; Lukianova, Fell, Sibers, 2015)
To reveal the presence of individuals with this predisposition, sociopsychological testing was conducted in a group. The respondents were asked to fill in a questionnaire, which included 35 statements. 27 of them were direct markers, and 8 were inverse markers of drugaddict behavior (Spooner &Hetherington, 2004). Filling in the "passport part" of the questionnaire helped the subject to familiarize himself or herself with the theme of the questionnaire and prepared them for the work with the main block of questions. Each respondent was given the following instruction: "Read carefully each statement and choose from the proposed answers the one that, in your opinion, is the most accurate. The selected variant of the answer should be marked with a "+" sign in the corresponding table cell ". The possible answers were “yes”, “rather yes”, “rather no” and “no”.
The average duration of the procedure was no more than 30 minutes.
The test was applied in the study of Siberian city students aimed at identifying a highrisk group prone to drugaddict behavior. To process the results, sten scores method (Kline, 2000) was used. The test and the specific method were specifically designed for this research. However, the sten scores method has its limitations. First, the sten scores method requires information in numerical form, whereas in the test respondents should answer ‘yes’, ‘rather yes’, ‘rather no’ and ‘no’, so, the information is given in an order scale. Hence, answers are translated into numbers; consequently, the order scale is transformed into the numerical scale. This leads to the loss of information and its distortion. Since there are no correct procedures for transferring qualitative information to quantitative information, there is a doubt about the correctness of the results obtained. Consequently, the sten scores method can only be applied in certain conditions (Kline, 2000): (1) where the original distribution resembles normal distribution; (2) where the authors have some theoretical basis to expect normal distribution, and (3) where the authors are confident that their normalization group is sufficiently large and representative to be a true reflection of the population. However, when using the sten scores method, a large number of respondents taking part cannot be guaranteed.
The authors propose to use cluster analysis to process such statistical information. The advantage of this method is that it works with any type of information, both qualitative and quantitative, as well as with data of various types. Cluster analysis is applicable to working with small samples and its purpose is to divide objects into a predetermined number of classes in such a way that the objects inside the class are similar to each other, whereas the objects from different classes are different. Applying this method makes it possible to isolate the risk group according to the criteria specified in the test, and also to estimate statistical reliability of the results.
Purpose of the Study
The purpose of the study is to compare the two methods of statistical analysis for the correctness of their application for identifying respondents who are at risk of drugdependent behavior.
Research Methods
Sten scores
First, the authors interpreted the results by using sten scores, sten being an abbreviation for 'Standard Ten'. The procedure is described as follows.
Answers for direct markers of DAB (dragaddict behaviour) are translated into ‘response scores’: answers ‘yes’ is interpreted as ‘4’, answers ‘rather yes’ – as ‘3’, answers ‘rather no’ – as ‘2’, and answers ‘no’ – as ‘1’.
Answers for inverse markers of DAB (dragaddict behaviour) are translated into ‘response scores’: answers ‘yes’ are interpreted as ‘4’, answers ‘rather yes‘ – as ‘3’, answers ‘rather no’ – as ‘2’, and answers ‘no’ – as ‘1’.
The ‘raw score’ for every ith respondent is calculated as the sum ${x}_{i}$ of all 35 response scores.
The mean M and the standard deviation s is calculated by the raw scores of all respondents.
By using the formula ${z}_{i}=\left({x}_{i}M\right)/s$, the raw scores are translated into ‘standard scores’.
By using the sten formula ${sten}_{i}={2z}_{i}+\mathrm{5,5}$, the obtained ‘standard scores’ are translated into ‘standard tens’ or ‘stens’.
Calculating the percentage of respondents whose results in stens are above 7.5 (high level of psychological propensity for DAB). Calculating this value for different universities and other subgroups of interest (age, gender, year, program of study, etc.).
Calculating the percentage of respondents whose results in stens are below 3.5 (low level of psychological propensity for DAB). Calculating this value for different universities and other subgroups of interest (age, gender, year, program of study, etc.).
Calculating the percentage of respondents whose results in stens are within the range from 3.5 to 7.5 (middle level of psychological propensity for DAB). Calculating this value for different universities and other subgroups of interest (age, gender, year, program of study, etc.).

Figure
01 presents the distribution of these three levels of psychological propensity for DAB in different A and B universities. Labels A, ..., H of the horizontal axis stand for the universities, the percentage of the respondents with low, middle and high scores are shown in blue, red and green accordingly. One can see that the percentage of the high score respondents varies from 12.13% to 24.79%; the percentage of the middle score respondents varies from 64.23% to 69.7%; the percentage of the low score respondents varies from 9.6% to 21.42%, depending on the university. Therefore, the percentage of middle score respondents is much less changeable in comparison with high and low score respondents.
The main objective of the study was to estimate the total number of respondents with a high level of psychological propensity for dragaddict behaviour (Shelehov, Kornetov & Grebennikova, 2016), or highrisk population, as a percentage of the sample. Figure
These results are very close to the results of similar studies conducted in Kazan, Birsk and Izhevsk (Cheverikina, 2012.). However, the authors can identify a number of problems arising when the tens scores for obtaining this objective are used.
First, the sten scores method needs information in numerical form, whereas in the test, respondents should answer ‘yes’, ‘rather yes’, ‘rather no’ and ‘no’, which means that the information is given in an order scale. Hence, at steps 1 and 2, answers are translated into numbers; consequently, the order scale is transformed into the numerical scale. From the statistical point of view, such transformations are unfounded and they result in the misrepresentation of information.
Then, the sten scores are in fact linearly transformed into normalised standard scores (see steps 46); after normalising, the authors investigate not absolute but relative numbers. Hence, the sten score indicates an individual's approximate position with respect to the population of values and, therefore, to other people in that population. So, the highrisk population can be estimated only in reference to the sample. In other words, the percentage does not depend significantly on the sample, if the authors investigate two groups of people, one with a high level of DAB, and another with a low level; in every group the authors will most probably obtain the same percentage of the highrisk population. This argument is supported by the investigation, the authors conducted using the same data. At step 5, the authors calculated the means
${M}_{k}$ and standard deviations
${s}_{k}$ for all universities
$K\in \left\{A,\dots ,H\right\}$ and then the authors recalculated standard scores by using the formula
${z}_{i}^{k}=\left({x}_{i}^{k}{M}_{k}\right)/{s}_{k}$, where
${x}_{i}^{k}$ is the raw score of the ith respondent from the kth university. After that, the authors performed steps 69. Figure
Consequently, the main objective of the stens method is to determine not the percentage of highlevel score and lowlevel score respondents, but to recognize the respondents belonging to these 16% groups. It is the reason of the wide application of the method in the interpretation of IQ and other similar tests (Anastasi, 1988). In other studies, it can be applied in certain conditions (Kline, 2000): (1) where the original distribution resembles normal distribution; (2) where the authors have some theoretical basis to expect normal distribution, and (3) where the authors are confident that their normalization group is sufficiently large and representative to be a true reflection of the population.
Besides, sometimes the results are essentially different with respect to the group where the mean and the variance are calculated. For example, for university H, Figure
Cluster analysis
Cluster analysis divides data into groups (clusters) based only on information that describes the objects and their relationships. The requirement is that the objects within a cluster must be similar to one another and significantly different from the objects in other groups. The greater the similarity within a cluster, and the greater the difference between clusters, the better is the clustering. In cluster analysis, the characteristics of object (variables) can be of any type: nominal, order, quantitative. The objective is to define a distance between objects and between clusters (Everitt, Landau, Leese & Stahl, 2011).
In this case, the authors have 35 questions, which is a rather big number of variables. Besides, all direct markers are supposed to be equal, and all inverse markers too. Consequently, the authors chose as variables the numbers of every kind of responses for the direct and inverse markers; as a result, the authors obtained eight variables. After that, the authors divided all respondents within universities into three clusters, as in the experiment described above, by using software package “Statistica 10” (StatSoft, 2013). Here Figure
Var1 – number of answers “yes” for the inverse markers;
Var2 – number of answers “rather yes” for the inverse markers;
Var3 – number of answers “rather no” for the inverse markers;
Var4 – number of answers “no” for the inverse markers;
Var5 – number of answers “yes” for the direct markers;
Var6 – number of answers “rather yes” for the direct markers;
Var7 – number of answers “rather no” for the direct markers;
Var8 – number of answers “no” for the direct markers.
Points mark mean values of the variables for each cluster.
One can see that in both cases, Cluster 3 (green line) corresponds to the lowrisk groups. The number of answers “yes” for the inverse markers (Var 1) and the number of answers “no” for the direct markers (Var 1) exceed such values in other clusters, and exceed the numbers of other answers within the clusters. For university H, Cluster 1 (blue line) corresponds to the highrisk group: it has the least (between the clusters) number of answers “yes” for the inverse markers (Var 1) and number of answers “no” for the direct markers (Var 8), and the greatest numbers of answers “yes” and “rather yes” for the direct markers (Var 5 and Var 6). For university E, Cluster 2 (red line) corresponds to the highrisk group: the number of answers “yes” for the direct markers (Var 6) essentially exceeds similar values for other clusters. Cluster 2 (red line) for university H and cluster 1 (blue line) for university E correspond to the middlerisk groups. The analysis of the variable variances shows that the probability of incorrect clustering is not more than 0.001.
Table
As for the inverse markers, the number of answers “yes” is the biggest for the most cases (about 34), except for the highrisk group at university H. But in this case, the number of answers “rather yes” is the biggest. Therefore, the variables do not characterize well the groups with different DAB. Most probably, it is connected with a small number of markers.
Figure
Remembering that the highrisk respondents of both universities admit the direct markers in 1213 cases from 27 in average, it is rather a large number, and it is similar in both universities, in spite of the fact, that the number of respondents in these two cases differs a lot. Hence, cluster analysis can be used to estimate the number of highrisk respondents both for gross and for small samples.
Findings
The comparison of the two methods of psychological processing of the results of testing young people to determine the group of the greatest risk by propensity to use narcotic substances showed that the cluster analysis method is a mathematically and statistically correct method for this type of research. As advantages and disadvantages of the two methods for processing a large sample of data were compared, the authors suggest that the use of cluster analysis for the processing of statistical information makes it possible to work with both qualitative and qualitative data of various types.
Conclusion
A comparison of the two methods demonstrated the advantages of applying the cluster analysis method to the processing of statistical data (both qualitative and quantitative) to determine correctly the respondents' belonging to a certain class. The authors suggest that this method can be successfully applied in the processing of sociological data, and other data of socioeconomic direction.
Acknowledgments
The research was completed as part of the project State assignment №27.4344.2017/5.1 "Improving the mechanisms of mentoring underaged citizens, including those from disadvantaged families, in work placements".
References
 Anastasi, A. (1988). Psychological testing. New York: Macmillan; London: Collier Macmillan.
 Barrett, A. Turner, R. (2006) Family structure and substance use problems in adolescence and early adulthood: examining explanations for the relationship. Addiction. 101,109–120. doi: 10.1111/j.13600443.2005.01296.x
 Blum, K., Chen A., Giordano, J., Borsten, J., Chen, T., Hauser , M., Barh, D. (2012) The Addictive Brain: All Roads Lead to Dopamine. Journal of Psychoactive Drugs. 44, 2. Retrieved from http://www.tandfonline.com/doi/abs/10.1080/02791072.2012.685407
 Cheverikina, E.A. (2012) Sotsialnopsihologicheskie osobennosti studentov vuzov, sklonnyih k zavisimosti ot psihoaktivnyih veschestv. Kazanskiy pedagogicheskiy zhurnal. 56. URL: Retrieved from http://cyberleninka.ru/article/n/sotsialnopsihologicheskieosobennostistudentovvuzovsklonnyhkzavisimostiotpsihoaktivnyhveschestv
 Dembo, R.,Dertke, M., Borders, S., Washburn M, Schmeidler J.(1988). The relationship between physical and sexual abuse and tobacco, alcohol, and illicit drug use among youths in a juvenile detention center. The International Journal of the Addictions. 23, 351–378. Retrieved from https://link.springer.com/article/10.1007/BF02888935
 Drugs, Brains, and Behavior: The Science of Addiction. (2014). Retrieved from https://www.drugabuse.gov/publications/drugsbrainsbehaviorscienceaddiction/drugabuseaddiction
 Electronic Version: StatSoft, Inc. (2013). Electronic Statistics Textbook. Tulsa, OK: StatSoft. Retrieved from: http://www.statsoft.com/textbook/.
 Everitt , B., Landau, S., Leese, M., Stahl, D.(2011) Cluster Analysis. John Wiley & Sons Ltd,United Kingdom.
 Hayaki, J., Stein, M.D., Lassor, J.A., Herman, D.S., Anderson, B.J.( 2005) Adversity among drug users: relationship to impulsivity. Drug Alcohol Depend. 78, 65–71. doi: 10.1016/j.drugalcdep.2004.09.002
 Kline, P. (2000). The handbook of psychological testing. (2d ed.). London and New York. Routledge, Retrieved from https://books.google.ru/books?id=lm2RxaKaok8C&printsec=frontcover&hl=ru&source=gbs_ViewAPI&redir_esc=y#v=onepage&q&f=false
 Lukianova, N., Fell E., Sibers J. (2015). Constructing images of the future: investigating the problem. Vestnik nauki Sibiri. 2(17).3746.
 Newcomb, M.D., Bentler, P.M. (1988). Impact of adolescent drug use and social support on problems of young adults: A longitudinal study. Journal of Abnormal Psychology. 97, DOI: 10.1037/0021843X.97.1.64 ·
 Patrick, M. E., Patrick, W., Schoeni, R.F., & Schulenberg, J. E. (2012). Socioeconomic Status and Substance Use Among Young Adults: A Comparison Across Constructs and Drugs. Journal of Studies on Alcohol and Drugs. 73(5), 772–782. Retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3410945/
 Restivo, S., Loughlin, J. (2016) Critical Sociology of Science and Scientific Validity. Science Communication . 8(3), 486 – 508. Retrieved from 10.1177/107554708700800304 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3410945/
 Shelehov, I. L., Kornetov, A. N., Grebennikova, E. V. (2016). Suitsidologiya: istoriya i sovremennyie predstavleniya. Tomsk : Izdatelstvo Tomskogo gosudarstvennogo pedagogicheskogo universiteta.
 Sinha, R. (2008), Chronic Stress, Drug Use, and Vulnerability to Addiction. Annals of the New York Academy of Sciences, 1141, 105–130. doi:10.1196/annals.1441.030
 Spooner, C., Hetherington, K. (2004). Social determinants of drug use (Technical Report Number 228). Retrieved from National drug and alcohol research centre, university of new south Wales, Sydney https://ndarc.med.unsw.edu.au/sites/default/files/ndarc/resources/TR.228.pdf
Copyright information
This work is licensed under a Creative Commons AttributionNonCommercialNoDerivatives 4.0 International License.
About this article
Publication Date
19 February 2018
Article Doi
eBook ISBN
9781802960341
Publisher
Future Academy
Volume
35
Print ISBN (optional)

Edition Number
1st Edition
Pages
11452
Subjects
Business, business innovation, science, technology, society, organizational behaviour, behaviour behaviour
Cite this article as:
Lukianova, N., Burkatovskaya, Y., & Fell, E. (2018). Sten Score Method And Cluster Analysis: Identifying Respondents Vulnerable To Drug Abuse. In I. B. Ardashkin, N. V. Martyushev, S. V. Klyagin, E. V. Barkova, A. R. Massalimova, & V. N. Syrov (Eds.), Research Paradigms Transformation in Social Sciences, vol 35. European Proceedings of Social and Behavioural Sciences (pp. 779789). Future Academy. https://doi.org/10.15405/epsbs.2018.02.92