Feb. 11, 2019, noon

How to automate repetitive Microsoft Word Documents using Python 3.

I remember a couple years ago at my internship, I had to edit and update the values of a generic report to send to multiple clients. The only thing I had to do was just change the company name, year of the report, and update the table values. It was a pretty generic report so it didn't require much editing. I remember how tedious it took to open up the document, find the values I had to replace and save it. I would often double check to make sure I had replaced all the values and I had to do this for multiple reports. Back then my python skills nor my VBA skills were up to par to try to automate this process so I hadn't thought about trying. Now, with my graduation imminent, I have started looking for jobs for the future (I mean, I have always been, but now it's critical I do so). One of the things that was extremely tedious for me was writing a cover letter. I know a personalized cover letter for each company is what I should be doing but most of the jobs I've been searching and applying for pretty much have the same requirements so I wrote a generic one. And, for me to open up a cover letter document and change the company name, position and where I found the job each time was quite tedious. So I decided to automate the process where all I had to do was enter the company name, position and the where I found the position on the terminal and voila - a cover letter in less than a second!

 

 

How to do it (assuming you have python already downloaded)

We're going to use the python-docx package for simple script. This package allows you to write a new word document or edit an existing one. So on your cmd (on Windows) or terminal enter the following:

 

pip install python-docx
 
 
Now in your text editor or IDE, we import the package.
 
 
#main.py
from docx import Document
from docx.shared import Inches
 

 

From reading the python-docx documentation , I will only need the .add_paragraph function to add pararaphs to the document,  .paragraph_format.first_line_indent function to indent the first line of each paragraph, and .save function to save the document. (Note: the full documentation also shows you how to create tables, headings, change font size, etc.).

To reduce the amount of lines we write on this program, we only need to create two functions. 

 

def savefile(filename):
    filesave = "{whatever}_Lorem_Impsum.docx" #the file that it will be saved as
    fileformat = filesave.format(whatever=filename) #replaces the {whatever} with filename 
    return doc.save(file) #saves the file
 
 
In this function, the filename variable will be whatever you want to input. And we are able to use that input through .format where it
will replace {whatever}. This will make more sense later when we see what the filename variable will be. It is important to add .doc or .docx at the end of the string so Python will know what to save the document as. If you put something else or not put it at all, you may not be able to open the file. We then use doc.save from the docx package to save our file. This function is useful because it will allow us to differentiate the different files we have created.
 
 
def addpara(paragraph):
    add = doc.add_paragraph(paragraph) #add paragraph
    addIndent = add.paragraph_format.first_line_indent = Inches(0.25) #indents the paragraph
    return addIndent
 
 
In this addpara function,  we are able to add a paragraph and indent it all in one line by calling on it later on. 
 
Now we begin writing the program.
 
if __name__ == '__main__':
    count = 0
    tries = 1
    while count < tries:
        doc = Document()
        company = input("Company name? ")
        job_title = input("Position applying for: ")
        source = input("where did you find the job posting? ")
 
 
Here we have a while loop to keep the program from closing and allows you to create as much documents as you want. The input(...) allows you to input whatever you want when prompted when run the program. We set it to a variable so we can call it later on. 
 
 
...........
while count < tries:
        doc = Document()
        comp = input("Company name? ") #input allows us to enter a value when we run the program
        job_title = input("Position applying for: ")
        source = input("where did you find the job posting? ")

        #Start the cover letter 
        Dear = "Dear {company}'s Hiring Manager," 
        format_dear = Dear.format(company=comp) 

 

The .format(company=comp) will replace the {company} from the string with variable comp. So for example, 

 

>>>comp = "Apple"
>>>Dear = "Dear {company}'s Hiring Manager," 
>>>format_dear = Dear.format(company=comp)
>>>print(format_dear)
"Dear Apple's Hiring Manager,"

 

Now we add another paragraph. Since we are creating a document from scratch, we will have to type what we want to say here and set it as a variable just like what we did previously. This time we will be using triple quotes because it will allow us to write a multi-line string.

 

...............
                #Start the cover letter 
        Dear = "Dear {company}'s Hiring Manager," 
        format_dear = Dear.format(company=comp) 
        addpara(format_dear) 
        #new paragraph
        firstPara = """Lorem Ipsum blah di blah blah {job_title} blah di blah from {source} blah blah blah blah blah blah blah blah cool no way blah blah\
blah blah blah"""
        format_first = firstPara.format(job_title=job_title, source=source)  #replaces the {..} with the variables we set as the input
        #call on our addpara function

        addpara(format_first) #paragraph now created                                                                    

 

Notice the blue back slash that is present in the string. This back slash is important because it allows a continuation of the line the string is on. Therefore, if you go on a new line continuing the string without the back slash, Python will interpret it as whitespace, and it will mess up the formatting of the document. In addition, notice how in our formatting line, we format two variables. We can format as many variables as we want as long as you add a ,.

Now if we don't have anything to replace, we simply skip the .format line and just use the addpara function like so:

 

...............
        secondPara = """hello this is my second paragraph blah di blah di blah""" 
    addpara(secondPara)

 

Once you feel that you are ready to save, we can simply call on our savefile function we wrote earlier. 

 

............... 
        secondPara = """hello this is my second paragraph blah di blah di blah""" 
    addpara(secondPara)

    #Saving the file

    savefile(comp) #use our savefile function to save the file
    print(comp, "file has been saved") #printing it lets us know that it executed the previous line

 

Here in the savefile function, I use the variable comp, because I want to name the file using the company name. You can change this to whatever you want. The print statement lets us know the previous line was executed. Now we save the file. 

 

How to run the program

To use this file, simply go on the folder to where this file is saved and click on it. It should open up the terminal/cmd and will prompt you the first question we put. Type in your answer, then press enter. This will prompt you the next question we asked as part of the input. Eventually, after answering the last question and pressing enter, the document will be saved on the same folder the python file is on. This program will keep running until you close the window.

 

Conclusion

This was a quick, dirty and simple tutorial on how to automate the process of changing select words, phrases, and values in word documents without having to even open the word document itself. While my example is very simple, by reading the full documentation, it will allow you to automate other things such as creating tables and changing their values.