RVO, Research Variable Ontology, proposes a schema enterprise can use to record their data analytics experiments and use as a knowledgebase for learning and recommendation support. RVO is designed around the research variables, which form the basis of the hypothesis that analysts test through building a model. RVO can answer an array of questions raised by data analysts when they start designing a solution and provide recommendations and alternatives. All facts or expert knowledge recorded in RVO are traceable to its origin (i.e. a person, publication, validated model). RVO follows best practices in ontology design and reuse existing data models and vocabularies (such as DBPedia1, RDF-Cube2, FABIO3) to facilitate efficient reuse in real world applications by using semantic technologies and open standards (RDF, OWL, SPARQL).
The purpose of RVO is to:
The components of the ontology are shown in the diagram below. Main classes of the ontology are Variable, Measure, Model and LinkedVariables. LinkedVariables class captures a link between any two variables. Details of the classes are defined in the following table. rvo:origin is an important property defined in RVO to facilitate traceability of the knowledge and contain three sub-properties. Origin for a certain Variable, Model or LinkedVariable could come from an expert or a research paper. Additionally, origin of a LinkedVariable could be a Model as well.
Key Concepts | Definition |
---|---|
Concept | Concept is used to identify any domain concept that link a variable to a real-world entity. This is used to link domain ontologies to RVO and provide context to the variable. |
Variable | Variables exist as metrics to quantify concepts associated with a value and whose associated value may be changed. |
Measure | Measures are the metrics for observed values for a variables. |
LinkedVariables | LinkedVariables class describes a connection between two variables, the nature of this connection and its origin. |
Origin | Origin is where a concept (e.g. variable, LinkType, Measure) is first defined or mentioned. RVO has three origins – Expert Person, Reference of a publication, Analytics Model |
LinkType | Link Type describes the type of link that exists between two or more variables. Identified Link Types are:
|
Data Source | Data source refers to a data provider. |
Dataset | Dataset refers to a collection of observations for a set of measures, usually stored in a data file. |
Data Structure | Dataset structure refers to the meta data for understanding a dataset such as what measures are captured, unit of measure. |
Model | The model is composed of a set of variables and a set of equations that establish relationships between the variables to describe particular phenomena or a system. |
Model Type | Model type class is used to define types of model, (e.g. :- statistical model such as logistic regression, Bayesian model etc.) |