What are Differences Between Statistics and Data Science?
The first steps of statistics date back to Islamic Golden Age (8–13. cc). Arab mathematicians and cryptographer Al-Khalil wrote a book which contains combinations and permutations. At that times, statistics consisted of only a few phenomenons but today statistics is the sum of different mathematical calculations concerned with the developing, collecting, and presenting data. Two different types of statistics methods are used to describe data; Descriptive and Inferential. Descriptive statistics studies on mathematical expressions of datasets. So, It uses mean, median, or variance depicted on bar or dot graphs, and histogram. The graphs include some features like KDE (Kernel Density Estimation), PDF (Probability Distribution Function), skewness and kurtosis so that statisticians are able to easily interpret results. Inferential method generally approaches with hypothesis tests and estimates. Its main property is to predict by using models. One of the bugs in statistics is to be manipulable. Because results strongly depend on the sample data taken from a population. If the sample data is not chosen equally, results would not be objective.
Data science is one of the top demanded career path today thanks to the digitalized world. John Tukey is the first person who used the term of “Data Science” in 1962. It is an interdisciplinary field that combine statistics, scientific methods, mathematics, and programing languages. Its field is not only data visualization but also AI (Artificial Intelligence) and machine learning algorithms. Capture, process, communicate, maintain, and analysis are life cycles of data science. Data science deals with big datasets by using more than one software languages for effective works. It has interactions with various disciplines as data mining, data engineering, and mathematics.
Although the background of data science and statistics actually hark back to same origins, there are also opposite opinions about the relationship between statistics and data science. Some of though is that data science is distinct from statistics because it focuses on digital data. Others say that data science is not a new field because it’s the different name of statistics and it is an applied field of statistics. It is understood from previous sentences that there are also similarities and differences of both. However, we focus on differences. So, though data science uses various fields like programing, mathematics, and engineering, statistics is based on only mathematics. Data science extracts informations from data. On the other hand, statistics plans data gathering and analysis. Data science uses scientific computer techniques, this is contrary to statistics which is the science of data. Data science applies scientific methods, whereas statistics utilizes mathematical formulas. Data science seeks for analysis towards understanding trends, patterns, and behaviors. This is opposed to statistics that represents data in charts and graphs like bar plot and histogram. In short, I think data science is the new digitalized form of statistics. Because data science does not exist without statistical terms.