RVO, Research Variable Ontology, proposes a schema enterprise can use to record their data analytics experiments and use as a knowledgebase for learning and recommendation support. RVO is designed around the research variables, which form the basis of the hypothesis that analysts test through building a model. RVO can answer an array of questions raised by data analysts when they start designing a solution and provide recommendations and alternatives. All facts or expert knowledge recorded in RVO are traceable to its origin (i.e. a person, publication, validated model). RVO follows best practices in ontology design and reuse existing data models and vocabularies (such as DBPedia1, RDF-Cube2, FABIO3) to facilitate efficient reuse in real world applications by using semantic technologies and open standards (RDF, OWL, SPARQL).
The purpose of RVO is to:
The components of the ontology are shown in the diagram below. Main classes of the ontology are Variable, Measure, Model and LinkedVariables. LinkedVariables class captures a link between any two variables. Details of the classes are defined in the following table. rvo:origin is an important property defined in RVO to facilitate traceability of the knowledge and contain three sub-properties. Origin for a certain Variable, Model or LinkedVariable could come from an expert or a research paper. Additionally, origin of a LinkedVariable could be a Model as well.
|Concept||Concept is used to identify any domain concept that link a variable to a real-world entity. This is used to link domain ontologies to RVO and provide context to the variable.|
|Variable||Variables exist as metrics to quantify concepts associated with a value and whose associated value may be changed.|
|Measure||Measures are the metrics for observed values for a variables.|
|LinkedVariables||LinkedVariables class describes a connection between two variables, the nature of this connection and its origin.|
|Origin||Origin is where a concept (e.g. variable, LinkType, Measure) is first defined or mentioned. RVO has three origins – Expert Person, Reference of a publication, Analytics Model|
|LinkType||Link Type describes the type of link that exists between two or more variables.
Identified Link Types are:
|Data Source||Data source refers to a data provider.|
|Dataset||Dataset refers to a collection of observations for a set of measures, usually stored in a data file.|
|Data Structure||Dataset structure refers to the meta data for understanding a dataset such as what measures are captured, unit of measure.|
|Model||The model is composed of a set of variables and a set of equations that establish relationships between the variables to describe particular phenomena or a system.|
|Model Type||Model type class is used to define types of model, (e.g. :- statistical model such as logistic regression, Bayesian model etc.)|