Friday, February 22, 2008

Managing papers with GMail

As a graduate student, I read a lot of papers. Then, I often want to write notes about these papers, categorize them, find them quickly, etc. However, despite being a common problem for graduate students (or anyone else keeping track of documents), there are few free solutions that are any good. Thus, I rolled my own using GMail.

Available Solutions are Limited

Unfortunately, there aren't many free solutions for managing papers. In fact, the only decent one I've found is Richard Cameron's CiteULike. CiteULike provides all the necessities: online storage, tagging, metadata search, and note taking. It also has two other draws: one-click paper bookmarking from supported sites, and social features for sharing and collaboration.

However, CiteULike has a deal-breaker for me: its search capabilities are very limited. It provides keyword search only over paper titles, author last names, venues, and a part of the abstract (to the best of my knowledge, since it doesn't list what it searches). It does not search the paper's full text, or even your notes. This can make finding papers based on vaguely remembered information very difficult.

Using GMail to Manage Papers

To address CiteULike's limited search, I decided to manage papers with GMail. The basic idea is that I keep each paper and its notes in an email thread. Then, further notes are replies to the thread. This supports writing richly formatted notes, as well as GMail's search over each paper's full text and any notes I've written.

Below, I describe the steps to set up the solution, add a new paper, take notes on a paper, and find a paper I've read. Finally, I compare the advantages of using this solution to using CiteULike.

Setup

Setup is trivial, consisting of creating a new gmail account for storing papers. I'll refer to this account as papers@gmail.com.

Adding a New Paper

After creating the account, I add new papers by sending email to papers@gmail.com. To ease finding the paper later, I use the following steps, which take only a minute or so:

  1. Start a new email to papers@gmail.com. Then, fill in the paper information. The key is to put the paper title as the subject, and include the author name, venue, and any other metadata you may want to search for later. The image to the right is an example (click to enlarge).
  2. Attach the PDF or PS file of the paper to the email.
  3. Send the email. Since it's to me, it will appear in my inbox.
  4. Respond to the email with the full text of the paper (if necessary, delete any other text first). To get the text from the PS or PDF file, I use the pstotext or ps2ascii Linux programs. The xclip program is handy for putting the text in the clipboard, from which I paste it into the response.

These steps accomplish three things. First, they store the PDF or PS of the paper in GMail. Second, they make the paper's full text searchable. Finally, they put the paper's author, venue, year, and other important data in an email with an attachment. This last is important because it lets me search over just this information by restricting the search to emails with attachments (see the section on finding papers below).

Taking Notes

Papers I have added but not finished reading are in my inbox. As I read a paper, I add notes by replying to the conversation from within GMail (first deleting the quoted text). Thus, my notes can use GMail's rich text features, such as lists and bolding.

Once I finish reading a paper, I tag the conversation with appropriate tags. Finally, I archive the conversation.

Finding a Paper

To find a paper, I use GMail's search functionality. This searches the full paper text and all notes, and supports searching on tags and dates. Furthermore, due to how I add papers, I can find paper titles by restricting the search to email subjects, or restrict it to emails with attachments to find author names, venues, and other information in the first email of each paper.

Comparing Solutions

Given the above procedure, GMail can compete with CiteULike as a system for managing papers. However, though better in some ways, it is also limited in others.

Specifically, my solution has these limitations:

  • No one-click adding of papers from supported sites.
  • No automatic BibTex generation. However, though not quite as good, BibTex entries from Citeseer, Google Scholar, or other sites can still be saved as notes.
  • Can't easily edit existing notes. Instead, must copy and paste the old note into a new note, then delete the original.
  • No social or community features, such as sharing papers.

However, my solution has these advantages:

  • Can search the full text of papers and notes.
  • Supports more sophisticated searches, including dates.
  • Richly formatted notes, and a nice interface for writing and reading them.
  • Can easily print or forward one or all notes about a paper (tip: before printing/forwarding all notes, delete the note containing the full paper text, then restore it afterwards).

Depending on what advantages are more important to you, it may be worth giving this a try.

6 comments:

Anonymous said...

I always did find this a curiously elegant and simple means of organization. :) Maintainable and clear.

Pedro DeRose said...

Yeah, it works surprisingly well. Someday I may write a Firefox extension or somesuch to add one-click paper bookmarking. Then it'd compare even better with CiteULike.

Madalin said...

You might want to try zotero inside firefox.

Pedro DeRose said...

madalin: You might want to try zotero inside firefox.

Looking at the web site, zotero does look very promising. I'll have to try it out and make a post!

ymerej said...

Did you consider google docs to do this?

Pedro DeRose said...

In truth, I didn't consider Google Docs. However, to my understanding, it wouldn't support searching within the PDF files. Also, I'm not sure the interface would end up as nicely for skimming papers and notes; GMail's conversation interface works out surprisingly well.

Actually, something I found recently and am considering is Evernote. It has rich notes, and I think it can search within PDFs.