MarkJOwen

Thinking Business Analysis; Thinking Information Management; Looking at UX

Main menu

Skip to primary content
Skip to secondary content
  • Home
    • All My Posts
  • Business Analysis
    • Agile
    • IIBA
  • Information Management
    • AIIM
      • CIP
    • Search
    • Social Media
  • Other
    • Fun
    • Innovation
    • Learning
    • Presenting
    • Project Management
    • Psychology
    • Sociology
  • UX
  • Who am I?

Tag Archives: metadata

The Inaugural Conference of the Swiss ARMA Chapter

Posted on 04/12/2011 by markjowen
2

In my previous post I mentioned that I was heading to Switzerland to attend the first meeting of the Swiss Chapter of ARMA.

Here’s a run-down of that event…

The conference was held at Hotel Victoria, and there were 50 seats arranged. Thanks to the kindness of the organisers, another seat was found for me.

Before the conference started, I had the opportunity to chat with Jürg Hagemann (the President of the chapter), and Jürg Meier (Vice-President) and asked them how the formation of the Swiss Chapter came about.

It turns out that the idea was talked about over a few beers with Gavin Siggers, the Director of the ARMA Europe. (Gavin used to work at the same place that Jürg Hagemann works at, and now works at the same place that Jürg Meier works at.)

(Note – Jürg Hagemann has written a good post on this event also. Click here to read it.)

Then, after an official welcome, Gavin Siggers gave a presentation on ARMA. ARMA is a predominantly American organisation, but about 2 years ago, a region was set up in Europe. Normally individual chapters are formed, and then a region is formed to encapsulate the chapters. However, in this case, ARMA formed a region first (Europe).

What dictates the forming of a chapter is the number of members. Apparently, in the Europe region, 32% of the members are in the UK, 23% in Switzerland, 6% France, 5% in the Netherlands, and 4% in Germany. But, I was to discover, this was not only the inaugural meeting of the Swiss Chapter, but also of the first Chapter in the Europe region! (Even though the UK had more members, a Chapter had not yet been formed).

Jürg  Hagemann also explained that, in Switzerland, professional associations are focused predominately on the public sector, and are strongly linked to archiving . The “Swiss Association of Archivists”  (SAA) is the main association that covers this. In contrast, ARMA Switzerland is mainly a forum for members from the private sector where the main goal is to offer networking opportunities, as well as event and activities that encourage a better understanding of Enterprise Information Management (EIM) along with establishing information governance standards. However, the SAA & Swiss ARMA will coördinate their activities to the most benefit (it doesn’t make sense if SAA holds a vendor fair one week, and then Swiss ARMA does the same the week after).

On a personal note, I encourage any activity that leads to a better understanding of information governance, and it seemed that the people at this inaugural meeting really had the right idea!

After a few other “housekeeping” tasks, the “public” part of the Conference started. Oracle are the sponsors, and Peter Gobonya (WebCenter Territory Account Manager) gave a bit of a spiel on Oracle’s solution to records management of content on disparate systems.

Then…the keynote speaker (Christian Walker) swooped in with his Xoom tablet under his arms, and a grin on his face. Chris’ keynote speech was on “Unstructured Data” (there’s a link to this at the end of the post). It was based on several blog posts he had written (links at the end of this post), and which had caught Jürg Hagemann’s attention.

Chris went through his presentation relatively quickly as he wanted to leave a lot of room for discussion afterwards. And there were a lot of questions. Which Chris answered with aplomb, adding oodles of useful advice, and many jewels of wisdom (which I have captured and will share at a later stage).

After a coffee break, Uli Zipfel (from a pharma company), shared with us his presentation on Governance for unstructured electronic information – a streamlined approach.

This was a fascinating look into how the challenge of exponential growth in information, combined with increasing regulatory, and other legal, standards, was tackled so that every employee could share information, and the regulations were complied with.

From the sea of unstructured information that was scattered across multiple locations (hard drives, etc), a governance framework was put into place. This involved, first, categorising information as records – either official records (critical to business continuance), or convenience records (reference records, etc).  Further to this, the location of the electronic information was taken into account – convenience documents were generally found on file shares, and hard drives. Official documents were found in Enterprise Content Management Systems. Depending on the type of record, and the location, retention policies & schedules were put into place. After the specified retention period, the records would be disposed of. (Prior to that, lists would be circulated to check whether the records were still required.)

Uli’s presentation also generated a lot of discussion. Clearly something that was relevant to the people present.

And this brought us to the end of the meeting.

One thing that was really clear to me, was that, here, in this Chapter of ARMA were a group of really smart people. And the Chapter had some excellent plans. In fact, the members were presented with a 6 page document containing topics (including Policy Management, Litigation, education, archiving, etc, etc) with related questions, that they could choose for interesting workshops.

Overall I was very impressed, and wish them well.I’ll try and stay in touch with them and see how they progress.

Useful Links

  • A couple of reasons for me to travel to Switzerland
  • ARMA website
  • ARMA Europe
  • Jürg Hagmann’s Post on the event
  • Chris Walker’s presentation (Slideshare)

Chris’s blog post that were the basis for his presentation

  • ECM for Unstructured Content Only? No Way
  • Come Together
  • Mythical Beasties

Related Post

2011 Content Technology Predictions from Real Stor...
views 40
Enterprise Search & Federated Search – ...
views 129
New & Classic – Ways that SharePoint ...
views 41
SharePoint and 5 Reasons
views 22
Killa Hertz & The Case of the Missing Documen...
views 39
Is AIIM’s CIP Exam really worth doing?
views 496
Why the government is implementing a Digital Conti...
views 29
Promise #4 – Comments on “The Problem with N...
views 23
Posted in Information Management | Tagged ARMA, ECM, metadata | 2 Replies

Using a network file share – a case study

Posted on 24/06/2011 by markjowen
2
Storage folders

Image via Wikipedia

This post continues from the previous one where I debated some of the draw-backs of using a network file share.

In that last post I mentioned there are some situations where using a file share can still be useful.

Case Study

Company A has been storing content on a file share for many, many years. People have the been granted access to specific folders, and they also have the freedom of creating sub-folders (which they do).

Over time, the file share becomes unmanageable. As Adrian pointed out in his post, this has several disadvantages: nothing can be easily found; a lot of the information ins inaccessible; collaboration is not as effective as it could be, etc.

Recently Company A became aware of SharePoint. “Cool!” they thought, let’s move everything into a document library. Then we won’t have problems any more.

Is SharePoint the answer?

I certainly agree that a product like SharePoint can be useful. Once the content is in SharePoint, it can be further categorized using metadata, made accessible through the use of views, etc, and can be easily searched. Company A also thought this way.

Company A considered a couple of options here:

  1. Move the content from the file share to a SharePoint document library, and then just get SharePoint to index it, so that a search can be done whenever anyone needs to find something.
  2. Move the content from the file share to the document library, and then add appropriate metadata (enrich), and then also perform a crawl.

Let’s look at the options

Both option 1 and 2 are good options. Having all the content in SharePoint means that it’s all there in one place. Security can be applied, as well as versioning.

Option 2 increases the findability of the content even more by adding rich, meaningful metadata. Company A can create a taxonomy that allows the content to be suitably categorised. Combined with customizable views, users can display the content of the document libraries in multiple different ways.

Disadvantages–
  • As mentioned, there are terabytes of content in the fileshares.
    Moving terabytes of content into a document library would mean that the database is now terabytes in size.  Unless the data is properly optimized, and maintained, this will be a big hit on performance.
  • In the fileshares there are thousands of text documents, spreadsheets, pdfs, images, CAD drawings, project files, mp3 files, films, executables, and a wide assortment of different file formats, from the different applications.
    SharePoint can accept all these formats, but, by default, a lot of these file formats are excluded from being placed in a document library, and exceptions have to be made.

Alternatives

The file share has worked well for a long time. The main concerns were:

a. manageability – it was hard to manage security to the fileshares, as well as keeping track of when files were modified. It was also extremely difficult to navigate the folder structure; and

b. findability – It was almost impossible to locate any files (unless you knew where they were in the first place.

Keeping this in mind, here are a couple of alternatives Company A could consider:

  1. Keep everything in the fileshare, but configure SharePoint to crawl, and index the files in it.
  2. Keep “static” files  in the fileshare, move “dynamic” ones into SharePoint.

Let’s look at the alternatives

Alternative 1

The advantages of the first scenario is that you avoid that very, very large database. By setting up the fileshare as a content source, you can configure SharePoint to crawl it. And, as a result users can perform searches to find what they want. Scheduling incremental crawls allows SharePoint to pick up any changes that are made to the content of the files.

The disadvantage of this becomes obvious when the security changes on the file. A full crawl is necessary to pick up any security changes. This means that if there are regular security changes (new users being granted access to the share, access to documents being changed, etc) a full crawl is required. This can take a very long time (especially with a slow/busy network).

Alternative 2.

In this second scenario, a little bit of work is required to identify all the documents that are not  “active” documents (i.e. the documents that users are not currently modifying). This would include films, images, executables, and any files taht are not “dynamic”.

Then the company could move any documents that are “dynamic” (still being edited, etc) into SharePoint. Then, as described above, extra metadata can be added to improve findability.

The fileshare can then be treated as an “archive”, and the security changed to be Read Only. This will ensure that no documents get modified. And therefore the content only has to be crawl once.

Alternatively, lock down the file share so that no one can modify the permissions on folders or documents. Because there is no security change, no full crawl is required. Regular incremental crawls can be scheduled to pick up an changes to the content of the document.

Another alternative

The other day I was watching a demo of AvePoint’s File Share Connector. This connector allowed users working in SharePoint to interact seamlessly with the documents that were actually in the file share.

The obvious advantage of this is that SharePoint functionality is available, without jamming all the files in the database.

I was pretty impressed with what I saw. However – I haven’t used the connector myself, in a real-live situation, so I cannot make any comments on it.  If you have used it, please feel free to let me know.

Related articles
  • Using a network file share – a case study
  • SharePoint is NOT Your Fileshare
  • What is Metadata and Why Must I Use It?
  • AIIM White Paper on SharePoint Deployment

Related Post

Search in Real-life
views 36
Killa Hertz & The Case of the Missing Documen...
views 78
Search – it started earlier than you think.
views 25
How does FirstDoc “do” 21 CFR Part 11 ...
views 52
Killa Hertz & The Case of the Missing Documen...
views 33
In SharePoint, where the heck do I fit in? ECM spe...
views 45
AIIM SharePoint Master Course – Day 2, 3 ...
views 29
A quote from 1958
views 27
Posted in Information Management | Tagged case study, collaboration, fileshares, indexing, metadata, Search, SharePoint | 2 Replies

Determining the real number of crawled documents in a SharePoint-Documentum system

Posted on 17/08/2010 by markjowen
Reply

Technical Post ahead…

So you’ve been able to configure SharePoint so that it can crawl and index content in a Documentum docbase.

Cool.

You go and look at the crawl log, and can see that x number of documents have been crawled. But what does that actually mean?

You fire up your favourite DQL query tool (Repoint, dqMan, Documentum Administrator) and, with panache, type in a query that will return a count on the dm_document table.

For example:

SELECT count(*) FROM dm_document

But…the value returned does not quite match what SharePoint’s crawl log was showing.

This is because SharePoint crawls files based on the extension of the file (PDF, DWG, DOC, etc). And when you run a query against Documentum, the table you are querying against does not have extensions listed.

As shown above, the table to query against is dm_document. This contains all the document objects it has, and lists the object’s “content_type”. This is not the same as the file type. (It is also not the same as what SharePoint defines as a “content type”.) In Documentum, Content_types are the file format of the object’s content.

Here’s an example list of the sort of Content Types that can be found in the dm_document table.

Content Types
abt amipro bmp
cgm crtext css
csv ddd did
dtd excel excel12book
excel5book excel8book excel8template
gif html jpeg
js mp3 mpeg
ms_access7 ms_access8 msg
msproject msw12 msw12me
msw6 msw6template msw8
msw8template msww pdf
pdx powerpoint ppt8
ppt8_template ps quicktime
rtf swf text
tiff trn txt
wld wmv wp6
wp7 wp8 xml
xpt xsl zip

So even if you ran a DQL query that listed the objects in dm_document you would only see the “content_type”, which doesn’t help you work out the documents that are crawled.

The “content_type” in dm_document references the table dm_format. This table lists the content types and their associated file formats. Here’s an example of this table:


Content Types, Descriptions, and associated File Extensions 
name description ext. name description ext.
bmp BMP Image (Windows, OS/2) bmp crtext Text Document (Windows) txt
css Cascading Style Sheet Document css csv Comma Seperated values csv
dtd Document Type Definition dtd excel Excel 3.x worksheet (MacOS, Windows) xls
excel12book Microsoft Office Excel Workbook xlsx excel5book Excel workbook 5.0 (MacOS), 5.0-7.0 (Windows) xls
excel8book Microsoft Office Excel Worksheet 8.0-2003 (Windows) xls excel8template Microsoft Office Excel Template 8.0-2003 (Windows) xlt
gif GIF image gif html HTML Document htm
jpeg JPEG Image jpg js JavaScript File js
mdoc55 FrameMaker 5.5 document fm mp3 MP3 File mp3
msg Outlook Message Format msg msproject MS Project – project mpp
msw12 Microsoft Office Word Document docx msw12me Microsoft Office Word Macro-enabled Document docm
msw6 Word 6.0 (MacOS), 6.0-7.0 (Windows) doc msw6template Word 6.x template (MacOS, Windows) dot
msw8 Microsoft Office Word Document 8.0-2003 (Windows) doc msww Word 1.x, 2.x (Windows) doc
ms_access7 Access 95 database mdb ms_access8 Access 97 / 2000 database mdb
pdf Watermarked Acrobat PDF pdf pdx Acrobat Catalog index file pdx
powerpoint PowerPoint pre-3.0 (MacOS, Windows) ppt ppt8 Microsoft Office PowerPoint Presentation 8.0-2003 (windows) ppt
ppt8_template Microsoft Office PowerPoint Template 8.0-2003 (Windows) pot ps PostScript ps
quicktime QuickTime Movie mov rtf Rich Text Format (RTF) rtf
swf Shockwave Flash File
swf text Text Document (Unix) txt
tga Targa Image tga tiff TIFF Image tif
trn Transaction file trn txt Log File txt
wmv Windows Media Video wmv wp6 WordPerfect 6.0 wpd
wp7 WordPerfect 7.0 wpd wp8 WordPerfect 8.0 wpd
xml XML Document xml xsl XSL File xsl

From this you can see that multiple content_types can have the same dos extension.

So – the trick is to work out what the file formats (or DOS extensions) are of the documents in the docbase.

What follows is what I did. (Note – I am not the world’s smartest guy, so there may be a better way to do this.)

1. Determine the content types that are in use in the docbase

SELECT distinct d.a_content_type, f.dos_extension FROM dm_document d, dm_format f WHERE d.a_content_type=f.name

This gives you the list of :

  • the unique content types of the documents in the docbase,
  • and the extensions.

Now with this list, you can do a quick cross-reference.

2. Compare the dos extensions in the list with the  File Types list in SharePoint, and you can confirm which objects (documents) in Documentum are actually being crawled by SharePoint.
Take the content type/dos extension list from above and remove all the ones that SharePoint doesn’t crawl.

3. Then run another DQL command. This time use the following query:

SELECT  count(*) FROM  dm_document WHERE a_content_type IN  (   list content_types from above  )

And this should give you a count of the documents in the Documentum docbase that SharePoint crawls, and this should be much, much closer to the number listed in SharePoint’s Crawl log.

One thing to be aware of  – if you docbase is very active (that is, documents are being worked on most of the day), then the count that the DQL query gives will be as of the time the query was run.The number listed in the crawl log is from the time the crawl was last run. This may cause a difference.

Related Post

The Inaugural Conference of the Swiss ARMA Chapter
views 36
FirstDoc User Group 2011 – a look back at th...
views 43
Killa Hertz & The Case of the Missing Documen...
views 64
Is Microsoft a Religious Experience?
views 24
Killa Hertz & The Case of the Missing Documen...
views 46
Killa Hertz & The Case of the Missing Documen...
views 43
AIIM’s Email Management Practitioner Course
views 41
4 ways to improve findability on your intranet
views 217
Posted in Information Management | Tagged content types, Documentum, DQL, metadata, Search, SharePoint | Leave a reply

Killa Hertz & The Case of the Missing Documents – Part 8

Posted on 16/08/2010 by markjowen
Reply

… continued from Part 7  —  [Other Episodes]

Killa had told Trudy that he suspected a problem with memory on the web services server. It was way under the required amount, and Trudy had arranged to get more put in. In the meantime, Killa was going to show Trudy how to split up the crawl into smaller groups.

 

Killa Hertz & The Case of the Missing Documents – Part 8

Trudy was sitting next to me. She had her notebook with her. This one had little flowers in the corner. She wrote down the date and looked at me expectantly.

“Ok Trudy, what we need to do here is split up the crawl. At the moment, all the documents are being crawled. As you mentioned, it takes about a week. If it fails, then you gotta start it again. And hope that it isn’t going to fail again.” Trudy nodded

Trudy’s head nodded as if her head was on a spring.

“We need to split up the documents so that we can crawl them in smaller groups. That way, if a crawl does fail. It means that you only have to crawl that small amount again.” Trudy scribbled furiously.

“So, let’s look at the docs. Normally, I find the best way to split them up is by size. Let’s have a look at the spread.” I fired up Documentum Administrator and typed in a DQL command to get the maximum size, the minimum size, and the average size, of the documents. The smallest was 0 bytes. The largest was just over a Gigabyte. The average size was just under 1 Megabyte.

I fired up Documentum Administrator and typed in a DQL command to get the maximum size, the minimum size, and the average size, of the documents. The smallest was 0 bytes. The largest was just over a Gigabyte. The average size was just under 1 Megabyte.

“Ok Trudy – it looks like you’ve got lots of small documents, and a bunch of large documents.” She looked at the results from the DQL query, and I heard her inhale quickly.

“Wow – we’ve got a document that’s over a Gigabyte – I didn’t know that.” she said. “Do you know what it is?”

I didn’t look at her, but just said “That’s up to you to find out. I just want to make sure everything can be crawled.” She wrote down the size of the largest file and put a big circle around it.

Carrying on, I said “we’ll make a new content source. But first – is there a Test system running here?” I didn’t want to do anything on their production system.

Quickly Trudy opened another browser session and opened the Administrator for the TEST environment. After a bit of clicking, I got to the screen displaying the content sources. “Trudy – I’m gonna make a new content source that only crawls documents under 250kb”.

Luckily Mike had given me a few tips, so I knew that the content source couldn’t be created through the normal SharePoint method. At the top of the screen was an extra tab titled “Wingspan Connectors”. (Wingspan was the company that sold eResults). I clicked on it and was presented with more tabs and a screen that displayed the current content source.

“Look here, Trudy, this is where the content source needs to be created.” At the top of the page was a drop-down with the words “Add New”. “What I recommend that you do is create a new content source using exactly the same details as the current one. Make sure that the name of the docbase it will be connecting to is the same, and also that you’ve selected the right Item Types – this is the tables in the docbase that defines which objects you want crawled, as well as the source for the crawled metadata. Give it a different name of course.

“Down here at the bottom:, I said, using the mouse pointer to make sure it was really clear, “this is where you need to specify the document sizes.” There was a section called “Custom Filter” “Because you have already defined the which tables that would be used, you  only had to type in the criteria portion of the query. Like this:”

ANY r_version_label='CURRENT' AND r_content_size 
              

Trudy scribbled furiously on her pad. The kid obviously wanted to make sure that she didn’t miss anything.

“Once that is done, click on the other subtabs and make sure that everything matches the existing content source. Once that’s done – click on Save.” Trudy stopped me. She wanted to grab a screen shot of what I had done. Smart idea – a picture is often worth a thousand words. I clicked on Save. The original Content Management screen appeared.

“Then,” I said as I reached for my coffee, “then, you need to make sure that the crawl properties are properly picked up. You’ll see here, next to the new content source, is a link titled “Metadata”. Click on this, and the metadata window opens up. By clicking on ‘Generate’ eResults will read the tables you’ve selected and will create a crawled property for each column heading.”

As Trudy wrote this all down I added, “Even though you can get eResults to create corresponding Managed Properties, in this case, you don’t have to.”

Trudy looked up “Umm – why’s that?” ”

“Well, eResults needs to know what column headings to use as crawled properties. However, the managed properties are the things that SharePoint looks after. When the original content source was created, the managed properties were generated. So these already exist.

Trudy wrote it all down. She was on her fifth page, and she wrote the date on the top of each page. along with a page number.

“Now – you want to make sure that this “connection” will work.” I clicked back to the  Content management page. Next to the name of the listed content source was a link ‘Test’. “You click on this, and then on this Run button that is now enabled. This will cause eResults to make a connection based on the settings we have just put in.”

While I explained this to Trudy, I carried out the same actions. There was a small delay, and then the screen was filled with information. The time the crawl started was listed as well as an assortment of other data indicating that eResults was able to connect to the docbase. Under this, the crawled properties that were found  for a the document were listed as well as the values of the crawled properties. “See,” I said, “it working.”

Trudy’s eyes lit up. “Cool!”

“Now, you need to create other content sources. Even though you can use whatever size limit you want, I recommend the following:

  • documents equal to, or greater than 250kb, and less than 1Mb
  • documents equal to, or larger than 1 Mb, and less than 100Mb
  • documents equal to, or larger than 100Mb.

It looks like it’s the large documents that are causing the web service server to choke. Fortunately, there aren’t many of these.”

Trudy was busy taking screen shots, and making notes. “Thanks, Killa. I’ll get these created before the weekend. But do you think that it’s really necessary if we are putting in extra memory?”

I looked at Trudy. “Yes – the memory will help, but to be double sure, the separate content sources means that if there are any problems, you won’t have to waste another week.” She smiled stupidly “Oh yeah – you mentioned that.”

“One last thing – once you’ve set up the extra content sources, disable indexing on the original content source. This will clear the index for that content source without affecting the index from the SharePoint content sources. And it will also prevent having duplicate data in the index.”

I grabbed my hat and my jacket. That hot weather was now a thunderstorm.

 

“Once you’ve got the content sources, and the extra memory set up, kick off a crawl. Let me know how it goes.”

 

to be continued …

… Part 9

—  (Other episodes)

 

Recommended Content on Amazon


Related Post

Day 1 of the AIIM SharePoint Master Class
views 44
The true value of CMIS
views 70
Comments on “The Problem with Shared Network...
views 102
Post-move SharePoint site Comparison
views 123
SharePoint and 5 Reasons
views 22
2011 Content Technology Predictions from Real Stor...
views 40
I agree…technology does not encourage user a...
views 23
Predicting User Acceptance
views 298
Posted in Information Management | Tagged Content Management, content sources, Documentum, metadata, SharePoint | Leave a reply

Enterprise Search & Federated Search – what’s the difference?

Posted on 31/05/2010 by markjowen
1

In this post, I look at the difference between Enterprise Search & Federated Search …


Enterprise Search

In Enterprise Search, content is indexed locally.

That is, the content is made available to the search engine, and the metadata, and if possible, the content itself, is indexed and stored in a database.

Federated Search

Federated Search utilises the search results that are provided by an “external” system. (“External” refers to another document management system or search application.)


Enterprise Search & Federated Search – Different Indexing Methods

There are a few interesting things about the different indexing methods.

In Enterprise Search, the documents are available by the indexer, all the metadata and content can be crawled and indexed.

With a Federated Search, the results are what was returned from the external system.  This relies on the external system’s indexing capabilities.

On the other hand, to access, and index documents locally (in an Enterprise Search) usually requires loading the document to a locally accessible location. To do this can chew up a lot of bandwidth. Then there is the corresponding CPU utilization to do the actual indexing. And to keep the index up-to-date, further crawls/indexing need to be performed regularly. If the documents are relatively “close” (within the same subnet) then this should not be too much of a problem.

However, when the documents are located in, or indexed by, remote (outside the firewall) systems, a federated search has certain benefits. All that is being transferred is the query and the results that match that query. The indexing process is being handled by the remote system. And this includes the regular indexing required to keep the index up-to-date.


Should you use Federated Search, or Enterprise Search

So the decision whether to use Federated Search or Enterprise Search, depends on:

  • the resources available (how much grunt does your indexing server have?),
  • the availability of the content that needs to be indexed (can the indexer access the content directly)
  • the bandwidth available
  • the amount of confidence that the search results returned contain 100% of the content that meets the search criteria. (Are you happy with the results that the remote system provides).

 

Federated search versus Enterprise search, can the one go without the other?


Would you like to learn more?
[Important Disclosure]

 

 

Related Post

Determining the real number of crawled documents i...
views 94
We want Google! – The number one requirement...
views 38
Types of information search – exploratory an...
views 306
The Delicious changes…further unrest in the ...
views 23
Search in Real-life
views 36
Why giving the users what they want is not enough
views 51
Search – it started earlier than you think.
views 25
Delicious’ tasteful reaction to negative fee...
views 41
Posted in Information Management | Tagged enterprise, federated search, indexing, metadata, Search | 1 Reply

Why the government is implementing a Digital Continuity Strategy

Posted on 14/05/2010 by markjowen
Reply

What use is information if you cannot access it anymore? This post looks quickly at the digital continuity strategy that a government depart is committing to set up.


Did you know?

74% of the local public sector organisations admitted to holding some digital information they can no longer access”.

This is incredible! It means that out of every 100 pieces of digital information that they are holding, they can only use 26 of them!

Why? Because they do not have a defined method for retaining and storing their digital information.

out of every 100 pieces of digital information that they are holding, they can only use 26 of them!

Archives New Zealand (a government organisation that is the “official guardian of New Zealand’s public archives”) have been tasked with creating a digital continuity strategy.

This will ensure that valuable information is preserved and migrated when necessary to the latest formats and media, appropriate metadata is attached, and documents that are no longer relevant are securely deleted.

Wow – that is precisely the way that they can improve the “findability” of digital information.

Metadata that describes the information (whether it be film, document, etc) is essential.‘

Everyone speaking the same language

Further to that, there are a couple of other very important things to take into consideration.

One is the “vocabulary“. Do all departments/organisations speak the same language? All the various departments involved need to ensure that they use a common vocabulary when describing something. Otherwise, there will still be a very large number of documents that are “unable to be accessed”.

Another thing that is very important is “buy in“. With many different departments, it is important that they all agree to the above-mentioned strategies. Having only 2 out of every 10 organisations following the new strategy means that there is still a lot of information that can’t be found.

Tackling the Problem – the Digital Continuity Strategy

Fortunately, Archives NZ has addressed these extra points of consideration.

They have drawn up a Digital Continuity Action Plan. The focus of the plan in the first year is to raise awareness and understanding of the problem.

Excellent!

Communication and awareness are critical for success.

Part of this communication process will involve “refining the language used”. This will be done by looking at which words are used by CIOs, technology managers, record keepers and so on.

I am very keen to see how this initiative transpires. I’ll be watching with interest, and if I see anything of note, I will let you know.

 


Related Resources
(Important Disclosure)

Related articles

  • Archives NZ – Digital Preservation Strategy
  • Analysis of Current Digital Preservation Policies: Archives, Libraries and Museums

 

Related Post

AIIM’s Email Management Practitioner Course
views 41
Determining the real number of crawled documents i...
views 94
Using a network file share – a case study
views 139
Killa Hertz & The Case of the Missing Documen...
views 39
The Inaugural Conference of the Swiss ARMA Chapter
views 36
Enterprise Search & Federated Search – ...
views 129
Posted in Information Management | Tagged Archives New Zealand, archiving, Digital, metadata, records | Leave a reply
WordPress Theme Editor
Proudly powered by WordPress
X
Guess what. This site is getting a make-over!
Wow - I want to read more!