This document summarizes experiences with Semantic MediaWiki (SMW) from several wiki projects at Rensselaer Polytechnic Institute. It identifies some common pitfalls of SMW, including difficulties with knowledge modeling, organization and context, and collaboration protocols. It also describes a human test of SMW usability. The test found that SMW was better than MediaWiki for factual questions but worse for insight questions requiring abductive reasoning. The document examines potential approaches to address SMW's limitations, such as extending its syntax and introducing context models. It emphasizes that successful SMW projects require both software engineering and knowledge engineering practices.
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
The Unbearable Lightness of Wiking
1. The Unbearable Lightness of Wiking
- A Study of SMW Usability
Jie Bao
(joint work with Li Ding)
Tetherless World Constellation,
Rensselaer Polytechnic Institute (RPI),
Troy, NY, USA
baojie@cs.rpi.edu
Spring SMW Con, May 22, 2010, MIT
2. Goal
To identify a few common pitfalls and limitations of Semantic
Mediawiki in
Knowledge modeling,
Knowledge organization and context, and
Collaboration protocols
Project management
To examine some potential approaches to solve these
problems.
2
3. Background
Experiences from the
Tetherless World Constellation wiki (http://tw.rpi.edu )
Data-gov Wiki (http://data-gov.tw.rpi.edu )
RPI Map (http://map.rpi.edu)
CNL (Controlled Natural Language) Wiki
(http://tw.rpi.edu/proj/cnl/)
A human test of SMW usability at RPI
5. TW Wiki
+ –
It is successful as a wiki Few uses “semantics”
Lots of hands-on experience People get confused on
from it adding papers, etc.
Blog, mail archive, issue The majority are not
tracker, projects, impressed by its addon
publications, tasks, … services
Semi-successful as a semantic Constant ontology war
wiki
Privacy wins, openness out.
5
6. TW Wiki
+ –
Many templates + many
queries => slow
Weak connectivity to other
semantic apps.
Turns out to be a project that
can eat huge amount of time
6
9. Data-gov Wiki
+ –
A semantic registry for US SMW can’t host the RDF graphs
government data directly (because of convenience,
expressivity and scalability)
400+ RDFlized datasets.
Most of the demo are not SMW-
7.3 billion triples.
based (due to training cost and
Now linked from data.gov SMW limitations)
Carefully curated RDF export of SMW is not very
friendly to RDF browsers such
as tabulator.
Plan to migrate to Semantic
9
Drupal
11. RPI Map
map.rpi.edu
+ –
Linking live external data UI (skin) design is time-
(majority of our efforts) consuming
Bus, people, class, events, No clear benefits from
or satellites “semantics” (but certainly
Map on wiki from tagging), at least for
Tetherless map extension now.
Serves e-Science projects
11
13. CNL Wiki
tw.rpi.edu/proj/cnl
+ –
We can represent/edit But all are quite limited
OWL Page-centric organization is
We can represent rule restrictive
We can represent IC UI Nightmare: Semantic
(Integrity constraints) Form limited
We do add a CNL interface Due to considerable
to ontologies on SMW learning cost, developers
may still prefer Java than
MW+SMW+SF+String
13
Function+ etc
14. Human Study - Hypothesis
Semantic MediaWiki technology captures
wisdom of crowds to develop a knowledge
base.
Semantic MediaWiki technology is a better
technology for retrieving and understanding
knowledge than MediaWiki.
Semantic MediaWiki technology is a better
technology for retrieving and understanding
abductive questions than MediaWiki.
18. Human Study - Questions
Factual questions, e.g.,
What is GC’s real name - Danny Brown
Who is the second youngest Survivor: Gabon contestant? -Ken
Insight (inductive) questions (subjective questions that
needed abductive reasoning to answer), e.g.,
Based on the Episode 4 Tribe Switch scene, who likes or dislikes
who?
Answer 1: Ace expressed contempt for Kelly
Answer 2: Crystal expressed contempt for Randy
Answer 3: Charlie loves Marcus
19. Human Study - Result
Main Effect for Wiki (MW vs. SMW)
Comparison Mean (SD) Statistics Results
Factual
MW Group A M=.81 (.08) t(8)=2.29, p<.05
SMW Group A M=.69 (.10)
Factual
MW Group B M=.76 (.11) t(8)=3.13, p<.01
SMW Group B M=.57 (.09) MW better than
Insight SMW
MW Group A M=.66 (.13) non-significant
SMW Group A M=.69 (.24)
Insight
MW Group B M=.61 (.16)
t(8)=4.33, p<.001
SMW Group B M=.20 (.14)
Main Effect for Group (A vs. B)
Insight
Group A & B the
SMW Group A M=.69 (.24) t(8)=4.00, p<.001 same except for
SMW Group B M=.20 (.14) SMW insight
20. Human Study - Result
Subject Changes Semantic Changes
User402 36 16
User405 24 10
User410 44 21
User411 7 1
SMW+Group A
User415 34 1
TOTAL 145 49
Subject Changes Semantic Changes
User401 31 0
User406 27 2
SMW+Group B
User409 12 2
User413 52 0
User417 33 3
TOTAL 155 7
21. Wiki
Simplicity: least training required to contribute.
AAA: Anybody can say Anything Anywhere
NPOV: neutral point of view (among other collaboration
protocols of Wikipedia)
21
22. Semantic Wiki
• Can Semantic wiki reproduces the success of wiki to be
among the most prominent of forms on the Web that
harness the distributed, collective efforts of users to
create content knowledge online?
• We have seen encouraging success in quite a few projects
• However, some issues are identified in our real-world
experiences.
22
23. Knowledge Modeling
Myth: users can do RDF-style (triple-based) modeling on
SMW
Fact: few is able to do this (at least without substantial
training)
23
24. “Big Fat Page” effects
Students in our test largely failed to do collective
annotation
Difference between categories and properties is not that easy to
understand (see a lot misuse like Category:hug)
To describe a thing with triples requires “thinking in RDF”, which
needs some experiences.
It is a big headache to choose the right vocabulary and it is hard to
know what vocabulary to reuse.
As a result, many of the testees simply use the wiki as a
notepad, without adding much semantic annotations,
resulting in a long single “usual” wiki page.
24
25. Schema or not schema?
Two common knowledge models on a semantic wiki,
“Schema”-based modeling, often represented in the form of pre-defined
wiki templates, that are used by “common” users of the wiki to access data
via forms or prebuilt queries.
c.f. “infobox” in Wikipedia
=>stable, shared knowledge
Arbitrary RDF-style semantic markup - heavily used by a selected few elite
group
=> less structured, less shared knowledge
A carefully pre-populated wiki “schema” (template), is as
important as a schema in a database project.
25
27. Evolvable Schema
From a collaborator in our human test:
“Our best experiences with deploying semantic wikis are
those where there is a smaller cadre of people who think
semantically, and a larger group of people who interact mainly with
forms-based entry and prebuilt queries.”
“Database schemas tend to be too crude and too slow to evolve. The
RDF graph model and schema-last modeling seems deeply
right to me in this context.”
28. Organization and Context
Myth: semantic wiki, like wiki, allows you to write things
freely.
Fact: SMW does not support AAA
Every “triple” has to be on its subject’s page.
E.g., “South Park episode X is a parody of the this film” can only be said
on X’s page.
Each subject and property of a triple must be a local page name.
28
29. Organization and Context
Why it may be problematic?
May require the creation of many trivial, small pages.
Is troublesome to describe things (e.g., an external URL) that have
no corresponding wiki pages.
Discourages users due to the difficulty of determining where to
write knowledge (i.e., the best “subject” pages).
Many users are confused of query-based pages: they do not know
how to track the source of the queried results when they want to
change a query-based page.
29
30. Organization and Context
Potential Solution
Extending the SMW syntax
[[Cartman::friend of::Butters]]
Introducing a context model to SMW
Context:Where, Who, When
In the triple store, associate each triple to a context.
No more need to use the subject to locate a triple
30
31. Collaboration Protocol
Myth: semantic wiki, as wiki always does, allows
compromises between different points of view.
Fact: Semantic wiki only allows one version of the (semantic)
“truth”.
A triple can not be both true and not true
31
32. Ontology War
No! Batman is only a
Fictional Character
Batman is a man
http://www.gambling911.com/files/publisher/cat-fight-032609L.jpg
Collaboration Protocol Support Needed!
32
33. Collaboration Protocol
Avoid edit wars in Wikipedia
NPOV: allows multiple points of view co-exist
on one page verifiable sources.
natural language text can accommodate and
explain multiple points of view on a single
page
33
34. Collaboration Protocol
Two possible approaches
To have categories and typed links optionally contextualized by
authors, similar to the tag contextualizing mechanism in delicious
and flickr.
http://example.com/author/term (contextualized name)
http://example.com/term (non-contextualized name)
To introduce a context model of SMW knowledge statements, so
that different versions of truth may be formally represented with
explicitly given sources.
34
35. SMW Project Management
Lesson 1: a successful SMW Project needs both good
software engineering and good knowledge engineering
practices
It’s not always low-cost
Document it
Lesson 2: Keep scalability in mind
Heavy template + query may kill your site
Crawlers are coming!
How to do load balancing?
36. SMW Project Management
Lesson 3: Design the UI intuitively, so that users know how
to add/delete/update data
Don’t expect users to know how to create a page
Minimize required clicks as much as you can
Lesson 4: Don’t expect users to contribute semantic
annotation (unless they are forced to)
Even they do, don’t expect it to be “right” (remember
Wikipedia’s categories)?
That’s also largely true for filling forms.
Ontology or “ask” query? Never, ever.
And it’s not because of lack of training, but of incentive.
37. SMW Project Management
Lesson 5: Don’t try to compete with RDF/SPARQL
There are lot scenarios that SMW/ASK can’t do, or can only do
awkwardly.
SMW is in a different niche
……
38. Reality Check
User will use a special markup to add annotations to the wiki text
Primary goal is to enable text-based editing, but strongly structured
content is allowed.
Page-centric knowledge organization fits wiki better (than viewing
triple as primary units)
Formal semantics via a mapping to OWL DL
Queried lists are more accurate, easier to create and easier to
maintain than manually edited listings.
Tractable query language (P-Time)
Wikis are now an IT code word for “zero-training” [2]
[1] Markus Krötzsch, Denny Vrandecic, Max Völkel, Heiko Haller, Rudi Studer. Semantic Wikipedia. In Journal of Web Semantics
5/2007, pp. 251–261. Elsevier 2007.
[2] http://ontolog.cim3.net/file/work/SemanticWiki/SWiki-06_Future-of-SemanticWiki_20090305/SemanticWiki-Future--
MarkGreaves_20090305.pdf