Document Management System
From Open Clip Art Library Wiki
| This section is deprecated. You can help by updating it, or possibly deleting it.. |
For the Open Clip Art Library we expect to be maintaining a large number of SVG documents, and we wish to have an easy way to update and keep track of them. This page provides an overview of what a document management system is, the features it provides over other types of systems, and the work being done at OCAL to develop a solution for our needs.
Documents
What is a document?
|
A document is a set of one or more files, such as:
|
Documents require the ability to:
|
What is a document management system?
|
A document management system is NOT:
|
Any of the above systems are acceptable for managing modest (<1000) collections of documents,
|
What document management systems already exist? Why a new one?
Browsing freshmeat.net shows that there have been a number of attempts at creating a document management system (including one of my own - http://docsys.sourceforge.net). A number are simply a web wrapper around a hierarchical file system, with a file uploader. Also, most are web-oriented (e.g. many are cgi or php based), and thus cannot be easily scripted. This is a limitation because if you are dealing with tens of thousands of documents, you don't want to have to endlessly click web forms to submit each and every file.
What we actually need is something with an easily scriptable API. Something that works as a daemon-based service would work well, because the service could focus on simply managing the huge collection of files, and leave the interface (be it web, GUI, cmdline, or other) to any number of client programs.
What is 'dms' / 'Document::Manager'?
dms is a system to provide a daemon-based service that encapsulates a document management system. Think of it like an email server, but instead of sending emails with to/from/subject headers to it, you send documents with metadata. This is invoked via the 'dmsd' daemon, which runs continuously, accepting connections from clients, processing requests, and maintaining the document store.
The protocol used for communicating with dmsd is the 'Simple Object Access Protocol', or SOAP. SOAP is an XML-based protocol and is supported by a wide variety of languages. This means you can construct client interfaces in Perl, php, Java, Python, C++, or any other language that has a SOAP implementation. Perl's SOAP implementation is called SOAP::Lite and is being used for creating simple commandline tools.
dmsd is actually a really short, trivial interface around the Document::Manager perl module. This module defines the programmatic API that clients use, is able to interact with metadata, and implements the high level functionality of the system that provides a wrapper around lower functionality. The low level functionality of the system is implemented in Document::Repository; this contains the logic for maintaining the document repository itself, and checking in/out individual files. Document::Repository knows nothing about metadata.
How is document management distinct from source or content management?
Source code systems such as CVS or Subversion are useful for managing collections of source files for building an application. They allow the user to track and manage changes made across all files within the hierarchy. However, these systems tend not to provide mechanisms for operating on the metadata of the individual files; for instance, they don't have mechanisms for selecting the set of files with a given subject or keyword. They also tend to have the hierarchical nature of their contents hard-coded; with source code you rarely need to suddenly re-organize all of the files to browse by author or title, however this is a very common need for collections of documents.
Also, from a more practical standpoint, source code management systems tend to be fairly technical in nature, since by definition their users are technical folk. Unfortunately, for non-technical users such as artists or business people, this complexity can be a major roadblock. Since many types of documents (like binary or XML formats) aren't really amenable to line diff, many of the strengths of source code management systems aren't present, and so they tend to be overkill.
Content management systems are similar in some ways to source code management systems. They maintain the individual pieces of content in a hierarchy, track changes, and allow presentation of the collection as a whole. They differ from a source code management system in that they often include a 'state' for a piece of content - it may be published, retired, or scheduled for release on a particular date, for instance. Often, content management systems also track metadata for the individual pieces. However, such systems generally have a fixed notion of how the user will be presented with the collection of information - i.e. through a web browser.
Both source code and content management systems can and have been used for managing documents, and for certain applications they are a very good match. However, in many circumstances they are overkill or inappropriate for the need. For instance, a business wishing to gather several million documents for policies, forms, and procedures together may find a source code system or a website content management system too cumbersome or too complex for their needs.
Document management systems are geared towards addressing these sorts of niches. For instance, a company may need to manage several terabytes of documents scanned from microfilm, or millions of patient record files.
Interface Ideas
Basic File Listing
doc_id title size date author
Newest Additions
This page lists the latest submissions and their status. We make the images immediately both to give the artist quick feedback that their image has been uploaded successfully, and to encourage site visitors to review and rate new submissions. Images that have been marked down below a certain threshhold will be suppressed from this list.
title status
title is a link to the detail info page.
Most Frequently Downloaded this Month
List of all svg's ordered by # downloads (highest first)
title num_downloads [dl]
title is a link to the detail info page. This report would show up as a side box somewhere on the homepage. It would reset itself at the beginning of each month.
Author's Art List
The middle of the screen is a list of all svg's submitted by the given artist, ranked by # downloads (most popular first), or by age (newest first):
thumbnail status title num_views num_downloads
Side panels include:
|
About the Author Box
|
Statistics
|
SVG Document Detail Page
This page displays a single svg and the details about it. In the center of the page is a preview of the image. There is a 'comment list' and a short form for adding a comment about the image. This is not intended to be a sophisticated commenting system, just a quick way of jotting comments about the image.
|
Other info included:
|
Links to do the following actions:
|
Clipart Requests
The current wiki page seems to be working fairly well, but possibly something more structured would be nice? This is probably lower priority for now. Ideas...
- Include request 'age', and by default sort with newest requests at the top.
- 'Renew' requests, so requestors can keep their request towards the top
- Vote mechanism so multiple people can indicate desire for the same pieces.
- Each requestor only allowed one active 'renew' per request
- Separate list of "recently filled requests", perhaps with a short list on the homepage
- Mechanism for the requestor to indicate if they feel the request has been adequately filled.
- May need to allow expirations (must have it within X days, else don't bother).
Forms
|
Search forms
|
Upload forms
|
Package generation form
|
Where can I get DMS from?
Currently the most recent development is in Open Clip Art Library's CVS, as module dms:
cvs -d :ext:USERNAME@freedesktop.org:/cvs/clipart co dms
Check our CVS instructions for more help on CVS:
The most recent affiliated parts of DMS are available here through these modules:
svg_metadata dms-client-cgi
Also, DMS and other parts of the system are available on CPAN (http://cpan.perl.org), but the versions available are releases and not necessarily the most up to date versions.

