11/30/17

Download all your email information using this Python 3.6 program

Python 3.6 program to download email information


I have been collecting my emails in gmail for years and need to trim them to get more space. The were taking up over 5 Gigs and there are over 100,000 of them.

I looked for tools to do this, and the best I have found is thunderbird, but it did not satisfy me.

So, I decided to write a program to downlaod all the email information except the actual attachments to then later write code to sift though the emails and get me some analytics so I can trim batches of emails that meet arbitrary criteria that I could control.

I looked at using the python imap and email modules for this. But when putting together examples from the web, I ran into a lot of issues:


  1. speed, most examples only fetched one email at a time, when imap supports doing batches in mone fetch
  2. character encoding errors, especially with emails from years ago
  3. lots of other details.


We I got the code working and it downloads my 100,000 emails in about 40 minutes.

I then save the data for analysis in to either a json file or a csv file.

Here are the fields in the files:

n: number of email from 1 to n in order fetched
From: information from email header
To: information from email header
Subject: information from email header
Date: information from email header
Received: information from email header (useful for senders ip address or compute domail)
Rfc822msgid: unique message id (for gmail, you can just paste this into find box to pull up that email)
Size: total message size including attachments
uid: imap unique id to fetch same email
Attachments: python list format of filenames of attachments
text/plain:  plan text if in email body
text/html: if the email does not have plain text, then it will try to fill in the html text of the body

You just edit the code and set the username and password variables.
You must also set the ouput_filename variable to the file path and name of where you want the output file, do not add an extension, that is indicated in the next variable: output_type as either json or csv.

This program works for gmail, but you must go to gmail settings and enable imap.
This program has not been tested with other email services, but by adjusting the imapAddress, and if needed add the port number to the call (change bold number):

ms = imaplib.IMAP4_SSL(imapAddress, port=993 ) # open imap session ms


Enjoy.
Source code:  download_emails.py