Killa Hertz & The Case of the Missing Documents – Part 9

An ECM Detective story - Killa Hertz and the case of the Missing Documents
…continued from Part 8 —  [All Episodes]

While Killa and Trudy were waiting for extra memory to be put in the Web Services server, Killa showed Trudy how she could split up the crawl of the documents in the Documentum server into smaller jobs. It had been several weeks since he had heard from her, and Killa planned to make contact again to see how things were going.

 

Killa Hertz & The Case of the Missing Documents – Part 9

It had been a month. I hadn’t heard anything from Trudy, so I gave her a call.

“Killa!” was the first thing I heard after I got put through to her. “I’m really glad you called’ she squeaked over the telephone.

“And…” I asked, “how did things go?”

“It worked!’ she cried. “After the extra memory was put in, I did as you showed me, and the total number of documents being crawled is almost the same as the number of documents in the Documentum repository!” “Excellent, I’m on my way over to check things out.”

I drove over to the office where Trudy was working. She met me at the door and led me into where she worked. There was still piles of paper everywhere, and that photo of her dog was still there. Pulling up a spare seat, Trudy logged onto the system. “Look,” she said – “there is 2GB of memory on the Web Services server now”. She showed me the screen, which clearly showed the correct amount of memory.

Pulling up a spare seat, Trudy logged onto the system. “Look,” she said – “there is 2GB of memory on the Web Services server now”. She showed me the screen, which clearly showed the correct amount of memory.

“Ok, let’s see the crawl log.” Trudy switched to another screen where I could see SharePoint’s crawl log. Looking at the bottom of the screen, I could see that the last full crawl had taken place last week. It looked like it had been successful.

Leaning over Trudy’s shoulder, I grabbed a notepad, and one of the pens in the container on Trudy’s desk. I jotted down the numbers of documents that SharePoint had crawled.

Trudy ctrl-tabbed to a spreadsheet she had open. The numbers of documents that SharePoint had crawled were listed. And she had listed the counts from running a DQL query to determine the number of each content type in the docbase. Each value was in a different color.

I looked through her numbers. “Looks good kid” I said to her with a smile. “Looks like that problem is fixed”. She swayed back and forth with excitement. “And I’ve confirmed with everyone here. They can all find the documents they are looking for.” she squeaked.

“Good – let me just check it one more time.” I went back to her spreadsheet. Fired up DA, and ran a few more queries. The numbers still looked good. “Well Trudy, there’s not much more to do.” “She looked at me coyly “I guess we should celebrate.”  The look on her face was adorable. Sort of between puppy dog, and baby fur seal. “No need to Trudy. Just doing my job.”

I picked up my hat, swung my jacket over my shoulder, and walked out to the car park.

Of all the lawyer firms, in all the cities…

Well, it was a good job. I’m glad I could help the dame. I headed in the direction of O’Leary’s.

An ECM Detective story - Killa Hertz and the case of the Missing Documents

The End

 

Recommended Content on Amazon


Killa Hertz & The Case of the Missing Documents – Part 7

An ECM Detective story - Killa Hertz and the case of the Missing Documents

… continued from Part 6  —  [Other Episodes]

Killa Hertz’ friend Mike Budrewski had analysed the logs and had determined that there was a Java memory error. Killa was investigating further.

 

Killa Hertz & The Case of the Missing Documents – Part 7

“Trudy – I need to look at the web server.”

Trudy looked up with a puppy dog look in her eyes. She quickly opened up a new remote session and logged me onto the web server.

“OK”, I said to myself, “somehow this thing is throwing a memory error.” I fired up the task manager. The thing was using a little more memory than normal, but it looked OK.

Suddenly my mobile phone rang. Trudy jumped. The girl was skittery.

I answered the phone and heard Mike’s voice. “Killa, I was able to find some documentation about this eResults application. There’s nothing explicit about the error, but it clearly states  that it requires 2Gigabytes RAM. How much is that thing running?” “1 gig” I replied. I was happy. It looked like an open and  closed case.

At the same time, I was annoyed. Why the hell was a law firm skimping on things like memory?

I looked over at Trudy – she was busy staring at numbers in a spreadsheet. “Trudy – this server doesn’t have enough memory. What you can you do to get another gig installed?”. She looked up. She wrinkled her nose, and trundled her chair next to mine. Her perfume was clearly set on “Kill” this morning..

“Umm, let’s have a look.” The web service server was a virtual one. That meant that, in principle, it should be easy to increase the memory. “Yes,” she said, in that excited voice of hers. “However, I’ll have to let the boss know. We’ll probably need to take the server off-line.” I went and grabbed a coffee while she called her boss.

After 5 minutes, she came into the coffee room. “Sure thing Killa, we can do it tonight.” I put down my cup. “Trudy – how long does it take to crawl all the documents in the docbase?” “Well…” she started. “The last time we did it, it took about a week.”

Suddenly, I had the urge to be sitting on a stool at O’Learys with a glass of Jack.

“A week is a long time to see if this is going to work.” Even though I was getting paid by the day, there were still limits.

“Let’s see if we can split up the load.” Trudy’s eyes opened wide. She was a good kid.

“Look – you’ve got over 800,000 documents in there. We’ll split up the documents into smaller groups. Then we start a crawl on each group of documents. If this memory increase doesn’t help, and a crawl doesn’t work properly, then it doesn’t mean we have to recrawl all the documents.”

An ECM Detective story - Killa Hertz and the case of the Missing Documents

Trudy ran over to her desk and grabbed a pad of paper (it had roses in the corner of each pad) and a pen. “Let me get this down,” she said.

“Ok Trudy, let me show you what needs to be done.”

to be continued…

Part 8

 

Recommended Content on Amazon


Killa Hertz & The Case of the Missing Documents – Part 6

An ECM Detective story - Killa Hertz and the case of the Missing Documents
… continued from Part 5 —  [All Episodes]

Killa Hertz had worked through the night with help from Trudy. They had gone through the indexing process. It looked like the answer could be in the eResult log. Killa had sent them to his super-geek friend Mike to see if he could make sense of it.

 

Killa Hertz & The Case of the Missing Documents – Part 5

The alarm clock went off at 8am. Swinging my arm I knocked the thing off the bedside table. Being electric, it just kept beeping. I pull the plug out of the wall.

After leaving Trudy’s office last night, I made a phone call. My friend Mike was awake. I expected that. He liked his internet games. I swung past his place with the CD. Trudy had made sure that there was only the eResults log on it.

Mike invited me to stay while he analysed the log.  His flat was small, and messy, and there was no bourbon. I declined. “Mike – call me in the morning when you have an answer.”

So now – it was morning. Still hot and as sticky as it was last night. After swallowing two cups of coffee, I headed into Trudy’s office. She was there looking at the system. “Hi Killa!” she squeaked far too enthusiastically. I hate morning people. “Have you heard anything?”. I told her that Mike would call me as soon as he had news for me.

“But you know Trudy, it could be that the system is choking while it’s doing the indexing. Let’s have another look at it.”

She logged onto the system for me and then let me sit in her chair. I had a look at the Crawler Impact Rules in SharePoint. There were none. I poked around and checked out a few other things. The system was 32 bit. Not the best, but didn’t explain why the crawls were suddenly stopping short. There were a few settings in the registry that could be tweaked to increase the amount of memory used. But, again, no point changing those…yet. I made note of them anyway.

Around 9:30, my cell phone rang. It was Mike. He wanted me to come around.

Knocking on his door, I was met by Mike in the same clothes that he had on the last time I saw him. He was talking fast. Clearly a sign of too many caffeine-loaded energy drinks. I didn’t want to be around when those wore off.

Mike pulled a stool over next to his chair. The computer screen was filled with the error logs. “I looked through the logs, Killa. There’s a hellova lot of information in there. I went through each line. This is a smart app.” I could hear that Mike was impressed. “There are a lot of errors, but they are nothing to be worried about. It looks like the system is just reporting that it couldn’t find certain things. These don’t look like they are causing the crawl to fail. I double-checked them anyway. It took me awhile, but about an hour ago I think I finally pinned it down”

I glanced at Mike. He liked his moment of importance. “So what do ya think it is?”, I asked. Mike continued “Memory” he said.”But their SharePoint system is running fine” I said. “No – not the SharePoint server – it’s a Java error.”

“I need coffee” I said. His response was to thrust a can of energy drink in my hand. It was better than nothing.

An ECM Detective story - Killa Hertz and the case of the Missing Documents

I thought back over the process. SharePoint indexed the docs. But Java wasn’t used for that. The documents were transferred in batches from the Documentum docbase to the SharePoint server first. And this was via a web server that did use Java.

“Mike – I’ve gotta go check something. I’ll call you.” Mike handed me a pile of paper. It was a printout of the error log with the Java error highlighted. “As always – Thanks”.

I arrived back at Trudy’s office. “Trudy – give me access to your web server.”

to be continued…

Part 7

 

Recommended Content on Amazon


Killa Hertz & The Case of the Missing Documents – Part 5

An ECM Detective story - Killa Hertz and the case of the Missing Documents
… continued from Part 4 —  [All Episodes]

Killa Hertz had taken control of the computer. Running a few DQL queries he had been able to determine how many, and what sort of, documents there actually were in the docbase. The number that SharePoint was crawling didn’t match…

 

Killa Hertz & The Case of the Missing Documents – Part 5

After taking a swig of cold coffee, I decided to learn more about eResults.

It was a protocol handler that allows SharePoint to talk with Documentum. As well as that, it keeps track of the security of  the documents in Documentum. This ensures that security trimming is applied correctly to the documents returned in the search results.

Looking at the SharePoint’s Search Administration screen, and the configuration screen for eResults I deduced the following:

  • A content source has been set up that points to the Documentum docbase.
  • At regular intervals, SharePoint connects to Documentum using the information defined in the content source.
  • Based on a custom filter, SharePoint retrieves a list of the documents that Documentum has 20,000 at a time.
  • Using a web server as the intermediary, SharePoint copies the documents from the docbase to a folder on the Index server.
  • The documents are then crawled, and when each document is finished, it is deleted from the folder.
  • At the same time, that the list of documents is retrieved from Documentum, the Documentum security is also translated so that it matches a corresponding group in Active Directory.

Luckily eResults kept detailed logs about its activities. I opened the latest one. There was a lot of information. I started looking through it.

There were several places where the word “error” appeared in the log. They looked like pretty harmless entries, but I wanted to be sure. I’d have to call in a favour.

Mike Budrewski was an old friend of mine. He was born with a copy of “The Geeks Guide to Being a Geek” in his hands. What Mike didn’t know about technology wasn’t worth knowing. Problem with Mike was, if you didn’t have a keyboard, and a monitor, he didn’t really feel comfortable talking to you. Mike wasn’t really a people person.

I looked at Trudy. She was looking tired. “Let’s call it a night” I said. “Copy these log files onto a CD for me. I’ve got a guy who will look over them tonight, and I should have an answer by tomorrow.”

Trudy looked pleased. It was 3 o’clock in the morning, & the caffeine was starting to wear off.

An ECM Detective story - Killa Hertz and the case of the Missing Documents

to be continued…

Part 6

 

Recommended Content on Amazon


Killa Hertz & The Case of the Missing Documents – Part 4

An ECM Detective story - Killa Hertz and the case of the Missing Documents
… continued from Part 3 [All Episodes]

Killa and Trudy were delving deeper into the problem. The crawl log indicated that it had finished crawling successfully, but something wasn’t right…

 

Killa Hertz & The Case of the Missing Documents – Part 4

“How many documents are in the docbase, Trudy?”

“500 000” she said, but she didn’t look me in the eye. I could see that she wasn’t really sure.

“OK,  Trudy, let’s swap seats, and let’s find out what’s going on. She jumped up and I took her place.

Starting Documentum’s Administrator tool (DA), I opened up the DQL screen. This would let me query the docbase using Documentum’s query language. I prefer Repoint for this type of job, but running a query through DA was just as good. I ran a count on the number of objects in the docbase.

The result I got back was 824,129. Trudy was surprised. “Wow!” she said in that squeaky, excited voice. “That’s a lot more than I had expected.”

I was curious about what these documents were. What type of files they were.

Quickly I ran another DQL query and got a list of the content types. I looked through the list. There were Excel 3.0 spreadsheets, Excel 8 spreadsheet, Word 6 documents, Word 8 documents, PDF files, scanned Tiffs, several mpgs, and mp3 files, log files, jpeg files, html files, and several rich-text format files.

“OK – now let’s look at the files that SharePoint can crawl”.  Starting up SharePoint’s Central Administrator, I navigated to the Shared Service Provider, and then to the Search Administrator. After clicking on the File Types link, I noted down the file types listed.

Cross-referencing the two, I could see that for most of the files in the Docbase there was a suitable iFilter in SharePoint allowing SharePoint to be able to read the document. (The mpg files, the mp3 files, the jpeg files and tiff files weren’t getting crawled.) They were using the PDF iFilter from Foxit.

I ran the count query again, this time only including the content types that were crawled. The count came to 801,232 objects. Even taking into account the documents that SharePoint didn’t crawl there was still a discrepancy. Why was SharePoint stopping happily after about 350 thousand documents?

Grabbing my notepad, I jotted down what I knew so far:

  1. Documents were stored in Documentum
  2. Documents were also stored in SharePoint doclibs
  3. There were two content sources set up
    1. One for SharePoint
    2. One for Documentum.
  4. Most of the file formats had a suitable iFilter that allowed SharePoint to crawl them.
  5. According to SharePoint’s crawl log, there were no errors, but it would stop crawling too early missing a large bulk of the documents.

So … what was I missing? I looked further into the process…

to be continued…

Part 5

Recommended Content on Amazon


Killa Hertz & The Case of the Missing Documents – Part 3

… continued from Part 2An ECM Detective story - Killa Hertz and the case of the Missing Documents —  [All Episodes]

Trudy had explained the set-up at the law firm to Killa. Now Killa wants to find out more.

Killa Hertz & The Case of the Missing Documents – Part 3

“Ok, let’s see if I’ve got this right”, I said. “You know that the documents are in the system, but they aren’t showing up in the search?”

The group of lawyers in cheap suits all nodded together like a set of those toy dogs you see in the back of old people’s cars.

Trudy explained that they had browsed to a document directly in the system using the client interface, so they knew it was there, but when they did a search in SharePoint nothing was being found.”

“And”, I asked, while looking around for a coffee machine, “do you know whether these documents are actually being indexed?”

“Oh yes”, Trudy replied, her voice hitting a high C. “The crawl runs every 2 hours.”

“Yeah – but is it working properly? Is the crawl actually crawling the documents?” There was silence. Trudy glanced nervously at the suits, and then back at me.

In a plaintive voice, she said, “I’m not sure”.

I’ve worked cases like this for the more years than I’d like to remember. You get to learn that things are not always as they appear.

“Let me see the logs.”

Trudy led the way to her office. It was a real contrast to the reception area which was uncomfortably clean, and lifeless. In her office, there was a desk covered with papers. On the wall was a whiteboard on which she had written several numbers and drawn arrows. Next to her computer was a picture of a dog – some small, fluffy thing…

Why was I not surprised.

An ECM Detective story - Killa Hertz and the case of the Missing Documents - Trudy's dog

The office wasn’t big, and fortunately. the suits had dispersed to their dark corners with neon lighting.

Trudy lifted a pile of paper from a chair, and dragged it over to her desk, so I could sit on it. With her thin fingers, she logged into the system and opened the crawl log. She slid her chair to one side, so I could see it.

Because the documents were stored in Documentum, as well as SharePoint, there were two content sources. (That’s another name for the place where the documents are.)

One was the default one which pointed at all the documents in the SharePoint repositories  (as well as the web pages, etc. that made up SharePoint). The other pointed to the Documentum repository.

I pulled out my notepad, grabbed a pen that was lurking under some industry magazine on Trudy’s desk, and started writing things down.

“OK, Trudy – According to this crawl log, 354,054 documents in the Documentum repository have been crawled. But  how many are actually there?

to be continued…

Part 4

 

Recommended Content on Amazon


Killa Hertz & The Case of the Missing Documents – Part 2

… continued from Part 1 [All Episodes]An ECM Detective story - Killa Hertz and the case of the Missing Documents

Private Eye Killa Hertz has received a call from Trudy who works at a law firm. Their search tool was not returning all their documents. Killa heads to their office to investigate.

Killa Hertz & The Case of the Missing Documents – Part 2

Trudy introduced some of the suits. They were lawyers, and they all looked like they had spent too much time in small rooms with neon lighting.

“Ok – give me the facts,” I said. They started their story.

They kept all their critical documents in a Documentum system. For a document to get into the system required checks, reviews, and sign-offs. And to get access to a document was also damn difficult. Not just anyone could waltz in and read, or modify a document. It sounded more secure than Sing Sing

An ECM Detective story - Killa Hertz and the case of the Missing Documents - EMC security is like a prison

Some hired gun, with an IQ that was higher than my hourly rate, had recommended that they also use SharePoint 2007. This meant that they could also use SharePoint’s storage repositories for less critical documents, as well as create a “Portal” so that specific users would have their own customized work area. The only Portal I knew of was the back door of O’Leary’s bar. A “Portal” that I was frequently being ejected from by large gorillas.

Trudy explained that they were using special web parts that allowed SharePoint to display the folders, and documents stored in Documentum. They had created lots of SharePoint sites so that each ambulance chaser could work easily work with the documents, and tools, that they needed.

To make it easy for these shysters to search across everything (SharePoint and Documentum), an “Enterprise Search” site had been set up.

Trudy explained, in an excited voice, that they used something called “eResults” that let SharePoint crawl, and index, the documents in Documentum. The results were combined with the info that SharePoint had on the docs, etc, that it had in its own repositories, and when a search was done, the results were all displayed in one easy-to-read page.

A few of the lower level lawyers (the ones in the cheap suits) had complained that the search was not returning documents that they knew were in the system. It wasn’t until one of the partners of the law firm had complained that it become serious.

That’s when I was called…

to be continued…

…. Part 3

 

Recommended Content on Amazon