RVO

Introduction

RVO, Research Variable Ontology, proposes a schema enterprise can use to record their data analytics experiments and use as a knowledgebase for learning and recommendation support. RVO is designed around the research variables, which form the basis of the hypothesis that analysts test through building a model. RVO can answer an array of questions raised by data analysts when they start designing a solution and provide recommendations and alternatives. All facts or expert knowledge recorded in RVO are traceable to its origin (i.e. a person, publication, validated model). RVO follows best practices in ontology design and reuse existing data models and vocabularies (such as DBPedia1, RDF-Cube2, FABIO3) to facilitate efficient reuse in real world applications by using semantic technologies and open standards (RDF, OWL, SPARQL).

The purpose of RVO is to:

Establish a common terminology for the organization to represent classes and properties in the empirical analytics domain, relating to existing taxonomies, ontologies and data standards.
Capture established (even if contradictory) complex analytics knowledge in a particular domain and its origin. The knowledge may include known relationships between different variables, how a variable is linked to in a particular model, what are the relationships between variable and data sets, what are the relationships between variables and analytics models.
Integrate existing ontologies to enrich the knowledge base in an organization when conducting . These ontologies can be domain-specific ontologies, ontologies representing concepts, people, standards and etc. This integration provides the knowledge base across an organization.
Assist data analytics process stages such as variable selection, data source selection, dataset selection and evaluation.

The components of the ontology are shown in the diagram below. Main classes of the ontology are Variable, Measure, Model and LinkedVariables. LinkedVariables class captures a link between any two variables. Details of the classes are defined in the following table. rvo:origin is an important property defined in RVO to facilitate traceability of the knowledge and contain three sub-properties. Origin for a certain Variable, Model or LinkedVariable could come from an expert or a research paper. Additionally, origin of a LinkedVariable could be a Model as well.

Key Concepts	Definition
Concept	Concept is used to identify any domain concept that link a variable to a real-world entity. This is used to link domain ontologies to RVO and provide context to the variable.
Variable	Variables exist as metrics to quantify concepts associated with a value and whose associated value may be changed.
Measure	Measures are the metrics for observed values for a variables.
LinkedVariables	LinkedVariables class describes a connection between two variables, the nature of this connection and its origin.
Origin	Origin is where a concept (e.g. variable, LinkType, Measure) is first defined or mentioned. RVO has three origins – Expert Person, Reference of a publication, Analytics Model
LinkType	Link Type describes the type of link that exists between two or more variables. Identified Link Types are: Causal: a variable causes another variable. Investigative Causal: there is a hypothesis that one variable cause another, the hypothesis needs to be tested. NonCausal: Proven not to have a causal relationship between two variables. Correlated: Two variables are dependence or associated. Proxy: One variable can be used in place of the other.
Data Source	Data source refers to a data provider.
Dataset	Dataset refers to a collection of observations for a set of measures, usually stored in a data file.
Data Structure	Dataset structure refers to the meta data for understanding a dataset such as what measures are captured, unit of measure.
Model	The model is composed of a set of variables and a set of equations that establish relationships between the variables to describe particular phenomena or a system.
Model Type	Model type class is used to define types of model, (e.g. :- statistical model such as logistic regression, Bayesian model etc.)

Research Variable Ontology

Download Ontology

Choose the right format

RVO

Dataset for House Price Prediction Variables