Documentation


What Documentation Options Exist?

In software development, there are numerous documentation options that can be broadly divided into two main categories: documentation as a standalone document and documentation within the code. Both approaches offer different advantages and should be used according to need and context to provide a comprehensive information source.

In the case of research software, in addition to the aspect of software documentation, documenting experiments conducted with the software is also an essential task. It is advisable to consider aspects of reproducibility as well. A simple mechanism for this is a RUNME script that performs the essential tasks and whose description and documentation apply to the software itself.

Documentation as a Standalone Document

This type of documentation includes various documents created and made available alongside the code, covering a wide range of information. These include:

  • README Files: Provide an introduction and overview of the project, its goals, and how to use or contribute to it. More Information
  • Installation Instructions: Guide users through the process of installing the software. This also includes dependencies of the software on other software packages and the required working environment, if not covered in a separate document. Best Practices
  • User Manuals: Provide detailed instructions on how to use the software and explain its functions. Creating Manuals
  • Contributing Documents: Explain how interested parties can contribute to the project, including style conventions and guidelines for pull requests. Example Contributing Guide
  • Developer Documentation: Provides technical details and explanations for developers working on or extending the project. This documentation can be both internal (for team members) and external (for the developer community). It is important, especially for larger projects, to also map team hierarchies (core developers, module owners; quality control and testing)

Finally, a list of general guides for creating documentation:

Documentation Within the Code

This form of documentation is embedded directly in the source code and includes:

  • Variable and Function Names: Should be chosen to be self-explanatory and clarify the purpose or function of the code element. Naming Conventions
  • Docstrings and Annotations: Provide explanations and context for functions, classes, and modules directly in the code. They are particularly useful because many languages have tools (see below) that can be used to automatically generate documentation from them.
  • Comments: Provide additional explanations or context for specific code sections or blocks that may not be immediately understandable. Comments should not be colloquial formulations of individual lines of code, but rather explain the ideas or concept behind a code block. They can also serve to leave temporary notes or TODOs for future revisions. Good Commenting Practices

Documentation in the Scientific Context

Both types of documentation play an important role in software development and use. While standalone documentation provides a broad overview and detailed instructions, documentation within the code enables a deeper understanding of the project’s functionality and internal structure. The previous presentation is generally applicable to any type of software. So, what are the specific requirements for documenting scientific software? For general software, a relatively small team of developers and contributors often creates the software for a much larger group of users. Thus, the focus of documentation is heavily on installation instructions and user manuals. The users of scientific software, on the other hand, are often hardly more than the developers themselves. Moreover, especially in small projects, software is often developed by doctoral students and reused and further developed by the next generation. Co-existence of developers and users is thus much less likely than with general software. Additionally, for general software, it is usually sufficient to achieve ONE usable result. For the scientific process, however, the most accurate reproduction of the exact result of a computer experiment or data analysis is of extreme importance. Consequently, the focus of documentation shifts much more towards documenting the creation processes and developer documentation. Questions to be answered include:

  • What research question should be answered?
  • What possibly simplifying assumptions were made and may need to be reconsidered, removed, or simply taken into account in subsequent extensions?

Often, development is directly tied to funding, and with the end of funding, the software project is also paused (ideally temporarily). Therefore, there is usually no direct handover by the current team to their successors. Here, additional requirements for process documentation arise. In addition to the already mentioned questions about reproducibility, questions of traceability, such as design decisions or deployment procedures, are also central. It is also important to document authentications and permissions in such processes.

What Requirements Are There for Software Documentation?

The requirements for software documentation are diverse and include various documentation types such as interface documentation, how-tos, tutorials, and getting started guides for users, process documentation, codes of conduct, and style guides for developers. There are numerous proposals, approaches, and best practices for effective documentation, including ARC24/C4, the Divio Documentation System, and Architecture Decision Records. It is important to emphasize that self-documenting code is important but not a panacea (Limits of self-documenting code).

The language of the documentation should generally be English, with user manuals possibly also available in English and the native language of the respective target group. It is important to consider ethical aspects, such as avoiding discrimination, and not simply documenting “on the fly,” but always keeping the target audience of the respective documentation type in mind and considering their knowledge of the project and expectations of it. Additionally, it should be well thought out which documentation types are provided to keep the additional effort for creating and especially maintaining the documentation manageable. The first rule must be that every form of documentation is always updated in parallel with the code. Worse than missing documentation is incorrect and thus misleading documentation.

Important in the context of good scientific practice:

What Software Supports Writing Documentation?

When writing documentation, especially for application programming interfaces (APIs), various software tools can be helpful, listed here in alphabetical order:

  • doxygen: A documentation generator used to create documentation from annotated source code files. It supports multiple programming languages, including C++, C, Java, Objective-C, Python, and others. Doxygen can extract the structure of a program from the source code and generate documentation from structured comments embedded in the code to create technical documentation in various formats such as HTML and LaTeX. More on doxygen

  • javadoc: A documentation generator for Java code that creates HTML documentation from Java source code. Javadoc uses special comments in the source code to generate the documentation, providing a standardized method for documenting Java APIs. More about javadoc

  • pydoc: Similar to javadoc, but for Python. Pydoc automatically generates documentation from Python modules. The documentation can be displayed as text in the terminal or as an HTML page. Pydoc uses the docstrings embedded in Python code to extract information about functions, classes, and modules. More about pydoc

  • Swagger (now known as OpenAPI Specification): A toolkit for designing, building, documenting, and using RESTful web services. Swagger allows developers to describe the structure of their APIs so that machines can understand them, facilitating the creation of documentation and SDKs and improving the discovery and use of APIs. More about Swagger/OpenAPI

  • UML (Unified Modeling Language): A standardized modeling language that allows the structure and design of software to be visually represented. UML offers various diagram types to visualize different aspects of a system, including class diagrams, sequence diagrams, and use case diagrams. More about UML

All these tools have in common that they generate the resulting document from code components and/or code annotations to minimize the overhead for writing the documentation.

Where Can I Make Documentation Publicly Available?

The publication of documentation should ideally take place “close” to the code, for example, in the same repository. Furthermore, CI-based solutions from the repository, such as readthedocs.com or github.io, provide an automated way to keep documentation up-to-date. If the software is developed on the GitLab instances of GWDG and MPCDF, it makes sense to also provide the documentation there as GitLab Pages.

Local solutions such as the integrated help browser of MATLAB or GNU Octave’s integrated development environment (IDE) are also good options for keeping documentation close to the working environment of developers and users.

However, it is generally important to make documentation (publicly) available, as this avoids users having to generate it themselves with the above tools (version differences can have fatal consequences or create high time expenditures) and, in the case of public availability, for example on central documentation servers or community platforms, additional visibility is generated for the software.

How Should I License the Documentation?

The licensing of the documentation should, if possible, be similar to that of the software itself. Alternatively, Creative Commons (CC) licenses (CC BY/CC0) can be used. A disclaimer in the documentation is as important as in software licensing, where CC licenses already contain such a disclaimer in the full text, such as Section 5 in the case of CC BY 4.0. If Max-Planck-Innovation is involved in the commercialization of the software, the documentation can also be licensed as know-how. In this case, it will also be provided with a proprietary license.

How Should Documentation Be Written to Avoid and Prevent Discrimination?

To avoid and prevent discrimination, care should be taken in the creation of documentation to select examples carefully, use gender-neutral language, and avoid discriminatory terms such as “master-slave,” preferring alternatives such as “hub-spokes” or “main-subordinate.”