Research computing resources and services for Data Science

What is Data Science?

A definition from Wikipedia [1]:

“Data science, also known as data-driven science, is an interdisciplinary field about scientific processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, machine learning, data mining, and predictive analytics, similar to Knowledge Discovery in Databases (KDD)”.

The indirect path between human and data through computer scientists, unlike the direct one in statistics domain, has given us the modern and emerging domain of Data Science. Data science aims to provide natural human-data interfaces where people can interact naturally with information using the concept of Open data (e.g. Drupal/DKAN),  Open Knowledge (e.g. The Open Knowledge Network), Open system (e.g. The Open Group), and Open source Software (e.g. Deep Learning package – Tensorflow, PyCasp) and Platform (e.g. CDH – Cloudera hadoop).

Data Science is an interdisciplinary field because it adopts techniques and theories from broad spectrum of fields in mathematics, statistics, operations research, information science, and computer science, including signal processing, probability models, machine learning, statistical learning, data mining, database, data engineering, pattern recognition and learning, visualization, predictive analytics, uncertainty modeling, data warehousing, data compression, computer programming, artificial intelligence, and high performance computing. Data Science also applies to wider domains of sciences, also including finances, social sciences and humanities.

more … [PDF]

