As an editor and information architect, I find the topic of creating indexes somewhat interesting, although like many things that are theoretically interesting the actual application of it is sure tedium. As a result I’ve been putting off indexing the new book Fly Fishing in the Russian Far East because I knew it was going to be a long and tedious process. However, the book really needed one and we needed to get it done. (Does anybody have a copy of the first Joy of Homebrewing, before it was indexed? Great book, nearly useless.)
At any rate, a beautiful book needs a beautiful index. I finally set down to do it and realized I’d had a stroke of good luck, as the author had thought about this a little bit and sent me an Excel file with a list of categories and all of the words in each category. I realized that what I had here was a concordance file. Now strictly speaking a concordance is a list of words in a book. For instance, all of the times “Jesus” is listed in the Bible, while an index is a list of ideas. In other words, sometimes we’ll go to an index looking for a certain word, but often we are looking for an answer to a question. “What flies should I use for grayling?” As a result there are a fascinating number of classifications and taxonomies used and each book will pretty much require a unique solution. In our case, the author Mikhail Skopets had divided it up into Geographic Regions, Species, Flies, and General. This is pretty much exactly how I would use the index to answer questions about the book. Once again, when I went off research just how to do this, I couldn’t find the info, so once I figured it out I thought I would share.
Tip: As an aside, if you ever get stuck on any taxonomy problem, it always helps to start your organization by the old who, what, where, why, when, and how. Here we have the Where, What, and How covered. The General category then picks up all of the subheads such as “Fishing Techniques for Stargazers” which were too granular to keep in the TOC. I think you can effectively answer any question about finding information in this book with this scheme.
In general there are two ways to index a book in Word. You can go through the entire book manually and mark up each word, term, or section you think you want to index, then you go to References | Insert Index and it will generate a very nice alphabetic index, complete with subentries and cross references. The other way is to generate a concordance file and have Word go through and find all of the words in the file.
I’m not going to be one of those bloggers who reposts a help file or somebody else’s tutorial and calls it content. There is a lot of information out there on how to build Word indexes. However to get what we wanted, an alphabetico-classed index for the semantically inclined, Word doesn’t get you there unless you want to manually edit the file, so this will tell you how to do it automatically using Word.
There are some caveats here and I’m still in proof-of-concept so I’ll call them out, but I hope you find this helpful.
Caveat 1: Word is a word processing platform, not a publishing platform. It’s not really made for books, so we have to bend it up a little bit.
What I realized was Mikhail had given me a concordance file. Now Word really isn’t a publishing platform, and so it doesn’t get everything right, but in this case they did a smart thing. A concordance file is, as I said, a list of words in a document (book) that you want to find. But what Word has done that makes it brilliant is they use a two-column table for their concordance file. The first column is the list of words you are looking for, the second column is the idea you want to link to that word.
This makes it much more useable than a strict concordance file. For instance in the left column you can see the actual text from the book, including things like the scientific name. Odds are pretty good that while the scientific name for round whitefish is of passing interest to me, unless I’m a professional ichthyologist (which I almost did become at one point) I won’t be using that term to look them up in the index. So in the right column are the terms that you will see in the index to find the text in the left column. Therefore, the text “Round Whitefish (Prosopium cylindraceum)” will be in the index as its full text name “Round Whitefish (Prosopium cylindraceum)”: its common name, “Round Whitefish;” and its Latin name “Prosopium cylindraceum.” You have to make lots of choices like this and I’m still editing the concordance file to file to optimize it.
Tip: Using a concordance file lets you iterate through your index design much easier than manually marking up text.
Caveat 2: Word doesn’t always behave as advertised.
Notice in a couple of places I’ve appended the term “Latin:” to the terms in the right column? Well that’s because in theory that should give me a heading with subheadings in the index file, like so:
Coregonus ussuriensis 99
Coregonus subautumnalis, 234
So far, I have not gotten this to work. Big sad face because that would be pretty sweet. And while we are at it, I had taken the 20 or so separate chapters and put them into a Master Document, the way you are supposed to build long documents in Word, but I could not get the collapsed chapters to expand ever again and therefore could not index them. I tried this in both Word 2007 and 2010. Master Documents have long been known to be poorly implemented, in fact in my VBA days I wrote my own implementation so we could use that functionality at Microsoft, but after fighting with it for a couple of hours and rebuilding multiple versions, I just gave up and made one long document. Your results may vary and if you can get Master Documents to work, this indexing technique should still theoretically work.Okay, so I took the all-up concordance file and made an Index. To do this you will first have to make your own concordance file, which is just a two-column table (this is no small feat on its own by the way, but still a time saver, to ease it I suggest collecting terms as you write and edit) and save it as a document. Go to References | Index and click Insert Index (in 2010, for 2007, well you’ll figure it out):
Click AutoMark and navigate to and choose your index file. Word then passes through the document and puts a field code around every instance of every word in the left column.
You can click on and edit these if you need to add sub entries, etc. So for instance, I could go back through and add my “Latin names:” bit here to make that work. These will expand your document and change pagination. You can turn them off by either ALT+F9 (I think) or hitting the little paragraph marker thingy on the tool bar (the Show/Hide control).
Caveat 3: Concordance files do not automatically give you all of the subentries and see alsos you would get from manual tagging. But that’s okay.
Because while I called this Creating Categorized Indexes in Word (and it is Indexes and not Indices, btw), what we are really going to do is Insert Multiple Indexes in Word. So you can mix and match techniques to suit your fancy.
Okay, so the first time you hit References | Index and click Insert Index, it places the index in the book. Now go to the end of the book, place your cursor there and do it again. This places the actual index in the book. If you just run an all up concordance file you get a very nice, not very useful (for this book) alphabetized index. I inserted the “Index” heading here, but Word formats the rest in several canned styles, which you can edit manually if you want:
Caveat 4: Manually editing Word References (TOC, indexes, etc.) is a bear. But it doesn’t matter.
Because if you are doing this for publication you are probably going to import this into Indesign or other actual publishing program and do all of your styling there. If you need to repaginate, etc you can right-click on the index and choose to either update pages or update the whole file. You might want to do this, for instance, after you hide the field codes.
Tip: Before I mentioned that Word is for documents, by which I mean electronic documents, and not for published books. As such, the index is actually linked to the pages and you can CTRL+SHIFT a topic to go to it in the document. Okay, well I never put an index in something I wasn’t printing and don’t find this so valuable. So, if you are absolutely positively done, and want to format the index for printing, then it’s much easier to right-click the index to highlight it and hit CTRL+SHIFT+F9 to break all of the links. That is not an easy hot key sequence to find so you might want to file it away.
So, that was to show how to use a concordance file. To break the index up into categories is just a few more simple steps. So first, break your concordance file into multiple files: Concordance_fish, Concordance_geo, etc. (I recommend using the Split Table feature, if you don’t know about it.) But also keep an all-up version. Again, you pretty much want to do this once your all up version is completely edited so I would do my trial runs there and then edit the separate docs, you’ll see why in a bit.
Then, if you have already indexed the book remove all of the existing field codes by going to Replace | Special | Field to get ^d for the search term and leave Replace with: blank:
Sorry, don’t have SnagIt on this computer so can’t get the Special menu, but you’ll figure it out. Also a handy S&R to remember, by the way. Oh, and delete the index if you have one. Okay, now you have a document with no field markers and no index. Go to your concordance files and pick the one that represents the category you want first. Use that file to index it, Insert the index. Remove all of the field codes in the document, index with the next file, and so on. Each time it is smart enough to ask you if you want to replace the existing index, say no. If you don’t remove the field codes between passes, then on each pass you will get the full index from all of the files. Now if you really, really need the field codes to be hyperlinked to the index, I think you can run the all up file one last time at the end, it will certainly add the codes to the text, whether you can link them back to the indexes in the back I haven’t fully tested yet.
Remember I was saying that this allows you to iterate through you design? Well there is a great example in this shot. In the text in the chapters for each fish species there is a subheading called “Fishing Techniques.” Well here you can see that by using that term in the concordance file it picked them all up, and there are a lot of them because Mikhail has really studied them and this is one of the strengths of the book. But this pile of page numbers would be pretty frustrating if I just want to go Asp fishing for the afternoon. So, I think I’ll actually go through the text and add the fish name to each of these subheadings and then add all of those entries separately into the concordance file. That way I’ll get Steelhead Fishing Techniques, Taimen Fishing Techniques, etc. They will also be put under the correct category (General). Likewise with “Habitat and Biology.” Information design is iterative. This is a very powerful tool.
Caveat 5: You absolutely want to do this the very last thing.
I’ll be honest, we are just now importing this into Indesign, and while I take it Indesign will move over the Word entries, once we add images and stuff over there, the page numbers will all have to be manually adjusted at the end of the process, but I take it that happens anyway. I did however create a categorized (alphabetico-classed, if you will) index file that at least gives me the relative placement of the terms. And despite fighting with Master Document for over two hours and doing a fair bit of editing on the concordance file, I generated a 1,795-entry index in under 4 hours. Again, I think the blog took longer than the actual work. I’m sure if you put this in your bag of tricks that you will be able to expand on the technique and refine its application. If so, please give me a shout! Damn, I just broke 2000 words on this “short” post.
- Write a book
- Write a concordance file (do it as you write or edit the book)
- Break it into categories
- Insert an index based on the first category
- Remove index fields
- Insert second index
- Do not replace existing
- Repeat until done
To read more from Jon Tobey, visit is home blog here.