w3resource

Mastering python-docx: Create and Manipulate

In today's digital age, automating document creation is a valuable skill for developers, data analysts, and educators. The python-docx library is a powerful, open-source Python package that allows you to programmatically create, read, and modify Microsoft Word (.docx) files without needing Microsoft Word installed.

Whether you're generating reports, invoices, educational materials, or templates, python-docx makes it easy to handle Word documents in your Python scripts.

Mastering python-docx

What is python-docx?

python-docx is a Python library for working with .docx files (the format used by Microsoft Word 2007 and later). It supports:

  • Creating new Word documents from scratch
  • Reading and extracting content from existing documents
  • Modifying documents (adding text, images, tables, etc.)
  • Applying formatting like bold, italic, fonts, colors, and alignment

It does not support the older .doc format, but it preserves complex elements (like headers or footnotes) it doesn't fully understand when saving files.

The official documentation is available at python-docx.readthedocs.io, and the source code is on GitHub.

Installation

Installing python-docx is straightforward using pip:


pip install python-docx
Collecting python-docx
  Downloading python_docx-1.2.0-py3-none-any.whl.metadata (2.0 kB)
Requirement already satisfied: lxml>=3.1.0 in i:\users\me\anaconda3\lib\site-packages (from python-docx) (5.3.0)
Requirement already satisfied: typing_extensions>=4.9.0 in i:\users\me\anaconda3\lib\site-packages (from python-docx) (4.12.2)
Downloading python_docx-1.2.0-py3-none-any.whl (252 kB)
Installing collected packages: python-docx
Successfully installed python-docx-1.2.0

This will also install the required dependency lxml.

Getting Started: Creating a Simple Document

Let's start with the basics.

Python Code :


from docx import Document

# Create a new Document
doc = Document()

# Add a heading
doc.add_heading('Welcome to python-docx Library!', level=0)

# Add a paragraph
doc.add_paragraph('Here is a simple paragraph added programmatically.')

# Add a bulleted list
doc.add_paragraph('Color1', style='List Bullet')
doc.add_paragraph('Color2', style='List Bullet')
doc.add_paragraph('Color3', style='List Bullet')

# Save the document
doc.save('d:/sample1_document.docx')

This code creates a new Word file with a title, paragraph, and bullet points.

Adding Formatted Text

Python Code :


from docx import Document
from docx.shared import RGBColor, Pt

doc = Document()
paragraph = doc.add_paragraph()
run = paragraph.add_run('This text is bold and red.\n')
run.bold = True
run.font.color.rgb = RGBColor(255, 0, 0)

run = paragraph.add_run('This is italic and larger.')
run.italic = True
run.font.size = Pt(18)

doc.save('d:/sample_formatted_text.docx') 

Adding Tables

Tables are useful for structured data.

Python Code :


from docx import Document

doc = Document()
doc.add_heading('Sample Table', level=1)

table = doc.add_table(rows=4, cols=3)
table.style = 'Table Grid'

# Header row
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Name'
hdr_cells[1].text = 'Age'
hdr_cells[2].text = 'City'

# Data rows
row_cells = table.rows[1].cells
row_cells[0].text = 'Iorwerth Alfred'
row_cells[1].text = '20'
row_cells[2].text = 'New York'

row_cells = table.rows[2].cells
row_cells[0].text = 'Fatmire Maya'
row_cells[1].text = '25'
row_cells[2].text = 'London'

row_cells = table.rows[3].cells
row_cells[0].text = 'Henok Culann'
row_cells[1].text = '25'
row_cells[2].text = 'Paris'

doc.save('d:\sample_table_document.docx')

Adding Images

Insert pictures easily:

Python Code :


from docx import Document
from docx.shared import Inches

doc = Document()
doc.add_heading('Document with Image', 0)
doc.add_paragraph('Here is an image:')
doc.add_picture('d:/sample_image.jpeg', width=Inches(4.0))

doc.save('d:/image_document.docx')

Reading an Existing Document

To extract text from a .docx file:


from docx import Document

doc = Document('d:/sample_table_document.docx')

for paragraph in doc.paragraphs:
    print(paragraph.text)

# Extract tables
for table in doc.tables:
    for row in table.rows:
        for cell in row.cells:
            print(cell.text) 

Adding Headers and Footers with python-docx

Headers and footers are essential for professional documents like reports, theses, or educational materials. They typically contain page numbers, document titles, dates, or logos, appearing consistently at the top (header) or bottom (footer) of each page.

The python-docx library makes it straightforward to add and customize headers and footers programmatically. Headers and footers are tied to document sections—most documents have one section, but you can create more for different layouts (e.g., portrait vs. landscape pages).

Basic Header and Footer

By default, a new document has one section with empty header and footer paragraphs.


from docx import Document

doc = Document()

# Access the first (and only) section
section = doc.sections[0]

# Header
header = section.header
header_paragraph = header.paragraphs[0]
header_paragraph.text = "My Document Title - Confidential"

# Footer (e.g., page number centered)
footer = section.footer
footer_paragraph = footer.paragraphs[0]
footer_paragraph.text = "\tPage "  # We'll add page number separately below

# Add some content to see the effect
doc.add_heading('Sample Report', 0)
for i in range(10):
    doc.add_paragraph(f"This is paragraph {i+1}.")

doc.save('d:/sample_header_footer.docx')
 

Advanced Features

  • Styles : Customize or apply built-in styles.
  • Headers and Footers : Add page numbers, logos, etc.
  • Sections : Control page orientation and margins.
  • Core Properties : Set author, title, etc.

For templating (replacing placeholders), consider combining with libraries like docxtpl.

Why use python-docx in Education?

For educational portals and teachers:

  • Generate personalized certificates or worksheets
  • Automate report cards
  • Create interactive lesson plans with embedded data
  • Process student submissions programmatically

It's perfect for integrating document automation into learning management systems.

Summary

python-docx is a lightweight, efficient tool that brings Word document automation to Python. It's ideal for beginners and experts alike, enabling everything from simple reports to complex dynamic documents.

Explore the official documentation for more advanced features.

Previous : Mastering python-docx

Test your Python skills with w3resource's quiz



Follow us on Facebook and Twitter for latest update.