Glossary
Automated Classification
Use of technology to organize content into groups so it can be
retrieved when needed. The result of automatic classification
is either a content
collection clustered into groups (possibly a candidate taxonomy),
or content categorized according to a pre-existing taxonomy. The
best results are obtained by defining a business process that combines
manual and automated processing so that technology is leveraged
and human editorial input is optimized.
Dublin Core
A set of 15 metadata elements (the Dublin Core Metadata Element
Set) used to describe and catalog content so it can be discovered
and retrieved. The Dublin Core is the de facto standard for cataloging
web content.
Information Retrieval Technologies
Automated methods to analyze, classify, search for, and retrieve
text. The basic principles of information retrieval or IR are
based on research done in the 1940’s and 1950’s.
The key observation was that word frequency provides a useful
measure of significance. Many refinements have been made to this
simple observation utilizing statistics, linguistics, logic,
and clever combinations of one or more methods.
Metadata
A common set of attributes that contain critical information to
describe and catalog content. The basic concept behind metadata
has been used to organize content since the beginning of clay
tablet and papyrus scroll collections 3000 years ago. Card and
book catalogs and bibliographic databases have used a commonly
understood metadata standard to organize large collections.
Dublin Core metadata example
| |
Dublin Core Elements |
Asset metadata—
The Who, Where and When |
Title, Creator, Publisher, Contributor, Date, Type, Format,
Identifier, Source, Language |
Subject metadata—
The What and Why |
Subject, Description, Coverage |
Relational metadata—
Links between Assets |
Relation |
Use metadata—
How to Monetize Assets |
Use |
Taxonomy
Overall scheme for organizing content to solve a business problem
such as improving search, browsing for content on an enterprise-wide
portal, enabling business users to syndicate content, and otherwise
providing the basis for content re-use. The basic idea behind
taxonomy is to provide a controlled vocabulary for metadata attributes,
and to specify relationships between terms in the controlled
vocabulary. The simplest relationships are broader, narrower,
and related, but relationships can be much more specific and
complex.
UNSPSC Taxonomy example
| Prepared and preserved foods |
|
|
Broader term |
| |
Snack food |
|
|
| |
|
Corn chips |
Narrower term |
| |
|
Popcorn |
Narrower term |
| |
|
Nachos |
Narrower term |
| |
|
Pretzels |
Narrower term |
|
|
|
|
| Beer |
|
|
Related term |
XML Schema
Data models expressed in XML. XML schema provide a means for defining
and implementing a consistent structure or syntax, and semantics
for XML documents that allow machines to carry out rules made
by people. A facetted taxonomy provides the names of metadata
elements and a consistent set of attribute values or vocabularies
for filling the elements in an XML schema.
|