11/30/17

Download all your email information using this Python 3.6 program

Python 3.6 program to download email information


I have been collecting my emails in gmail for years and need to trim them to get more space. The were taking up over 5 Gigs and there are over 100,000 of them.

I looked for tools to do this, and the best I have found is thunderbird, but it did not satisfy me.

So, I decided to write a program to downlaod all the email information except the actual attachments to then later write code to sift though the emails and get me some analytics so I can trim batches of emails that meet arbitrary criteria that I could control.

I looked at using the python imap and email modules for this. But when putting together examples from the web, I ran into a lot of issues:


  1. speed, most examples only fetched one email at a time, when imap supports doing batches in mone fetch
  2. character encoding errors, especially with emails from years ago
  3. lots of other details.


We I got the code working and it downloads my 100,000 emails in about 40 minutes.

I then save the data for analysis in to either a json file or a csv file.

Here are the fields in the files:

n: number of email from 1 to n in order fetched
From: information from email header
To: information from email header
Subject: information from email header
Date: information from email header
Received: information from email header (useful for senders ip address or compute domail)
Rfc822msgid: unique message id (for gmail, you can just paste this into find box to pull up that email)
Size: total message size including attachments
uid: imap unique id to fetch same email
Attachments: python list format of filenames of attachments
text/plain:  plan text if in email body
text/html: if the email does not have plain text, then it will try to fill in the html text of the body

You just edit the code and set the username and password variables.
You must also set the ouput_filename variable to the file path and name of where you want the output file, do not add an extension, that is indicated in the next variable: output_type as either json or csv.

This program works for gmail, but you must go to gmail settings and enable imap.
This program has not been tested with other email services, but by adjusting the imapAddress, and if needed add the port number to the call (change bold number):

ms = imaplib.IMAP4_SSL(imapAddress, port=993 ) # open imap session ms


Enjoy.
Source code:  download_emails.py

7 comments:

  1. Solved:
    https://stackoverflow.com/questions/25413301/gmail-login-failure-using-python-and-imaplib
    https://github.com/google/gmail-oauth2-tools/wiki/OAuth2DotPyRunThrough
    https://pymotw.com/2/imaplib/

    ReplyDelete
    Replies
    1. Gerry Jenkins - Technology Learning (Teklern): All Your Email Information Using This Python 3.6 Program >>>>> Download Now

      >>>>> Download Full

      Gerry Jenkins - Technology Learning (Teklern): All Your Email Information Using This Python 3.6 Program >>>>> Download LINK

      >>>>> Download Now

      Gerry Jenkins - Technology Learning (Teklern): All Your Email Information Using This Python 3.6 Program >>>>> Download Full

      >>>>> Download LINK 56

      Delete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. Hi Gerry! What an awesome piece of code. Thanks for all your time.
    I'm very new to python and am running into only one problem: I've modified it to suit my needs and it runs perfectly in Thonny. But when I run the file from Terminal in MacOS, I get the errors saying:
    cannot import name BytesParser
    I have installed email using pip install, as I know that Thonny keeps its own Python libraries. Any suggestions?

    Thanks
    Bruce

    ReplyDelete
  4. it only downloaded the first 50 emails from gmail.Any idea how can i download all the emails at one go.

    ReplyDelete
  5. I used python Jupyter notebook and saved it to csv format.

    ReplyDelete
  6. while using the Jupyter notebook, im getting the following errors

    ---------------------------------------------------------------------------
    TypeError Traceback (most recent call last)
    in
    26 for id,msg in enumerate((m[1] for m in data if isinstance(m,tuple))):
    27 pos = i + id + 1
    ---> 28 parts = decode_email(msg, pos, key_map)
    29 pos = i + id + 1
    30 parts['uid'] = str(int(uids[pos - 1]))

    in decode_email(msg_str, pos, key_map)
    3 p = BytesParser()
    4 message = p.parsebytes(msg_str) # get header
    ----> 5 parts = parse_parts(message, key_map) # add header parts specified in key_map
    6 parts['Size'] = len(msg_str)
    7 plain_body = ''

    in parse_parts(msg, key_map)
    8 f = key_map[hkey]
    9 if f:
    ---> 10 fparts = f(raw)
    11 for k in fparts: parts[k] = fparts[k]
    12 else: parts[hkey] = raw

    TypeError: 'str' object is not callable
    -------------------------------------------------------------------------

    can anyone help me solve this

    ReplyDelete

Please comment or give suggestions for follow up articles