Parcourir la source

Merge pull request #15 from learning-at-home/update_readme

Update README + Docs
UARTman il y a 5 ans
Parent
commit
e59158f744

+ 46 - 23
README.md

@@ -1,41 +1,64 @@
 ## Tesseract
 
-[![<ORG_NAME>](https://circleci.com/gh/learning-at-home/tesseract.svg?style=svg)](https://circleci.com/gh/learning-at-home/tesseract)
-
+[![Build status](https://circleci.com/gh/learning-at-home/tesseract.svg?style=shield)](https://circleci.com/gh/learning-at-home/tesseract)
+[![Documentation Status](https://readthedocs.org/projects/learning-at-home/badge/?version=latest)](https://learning-at-home.readthedocs.io/en/latest/?badge=latest)
 
 Distributed training of large neural networks across volunteer computers.
 
 ![img](https://i.imgur.com/GPxolxb.gif)
 
-__[WIP]__ - this branch is in progress of updating. If you're interested in supplementary code for [Learning@home paper](https://arxiv.org/abs/2002.04013), you can find it at https://github.com/mryab/learning-at-home .
-
+**[WIP]** - this branch is a work in progress. If you're interested in
+supplementary code for [Learning@home paper](https://arxiv.org/abs/2002.04013),
+you can find it at https://github.com/mryab/learning-at-home .
 
 ## What do I need to run it?
-* One or several computers, each equipped with at least one GPU
-* Each computer should have at least two open ports (if not, consider ssh port forwarding)
-* Some popular Linux x64 distribution
-  * Tested on Ubuntu16.04, should work fine on any popular linux64 and even MacOS;
-  * Running on Windows natively is not supported, please use vm or docker;
+
+- One or several computers, each equipped with at least one GPU
+- Each computer should have at least two open ports (if not, consider ssh port
+  forwarding)
+- Some popular Linux x64 distribution
+  - Tested on Ubuntu16.04, should work fine on any popular linux64 and even
+    MacOS;
+  - Running on Windows natively is not supported, please use vm or docker;
 
 ## How do I run it?
-1. Clone or download this repo. `cd` to its root directory.
-2. Grab or build a working python enviromnent. [Anaconda](https://www.anaconda.com/) works fine.
-3. Install packages from `requirements.txt`
-4. Go to [./experiments](./experiments) and follow the README.md from there
 
+Currently, there is no way to do it easily. There are some tests (you can check [`./tests/benchmark_throughput.py`](./tests/benchmark_throughput.py)
+ or look into CI logs) and we want to expand them. If you want to
+do something complex with it, please contact us by opening an issue (less preferred: [telegram](https://t.me/justheuristic)).
+
+## `tesseract` quick tour
+
+**Trainer process:**
 
-## tesseract quick tour
+- **`RemoteExpert`**(`tesseract/client/remote_expert.py`) behaves like a pytorch
+  module with autograd support but actually sends request to a remote runtime.
+- **`GatingFunction`**(`tesseract/client/gating_function.py`) finds best experts
+  for a given input and either returns them as `RemoteExpert` or applies them
+  right away.
 
-__Trainer process:__
-  * __`RemoteExpert`__(`lib/client/remote_expert.py`) behaves like a pytorch module with autograd support but actually sends request to a remote runtime.
-  * __`GatingFunction`__(`lib/client/gating_function.py`) finds best experts for a given input and either returns them as `RemoteExpert` or applies them right away.
+**Runtime process:**
 
-__Runtime process:__
-  * __`TesseractRuntime`__ (`lib/runtime/__init__.py`) aggregates batches and performs inference/training of experts according to their priority. 
-  * __`TesseractServer`__ (`lib/server/__init__.py`) wraps runtime and periodically uploads experts into `TesseractNetwork`.
+- **`TesseractRuntime`** (`tesseract/runtime/__init__.py`) aggregates batches
+  and performs inference/training of experts according to their priority.
+- **`TesseractServer`** (`tesseract/server/__init__.py`) wraps runtime and
+  periodically uploads experts into `TesseractNetwork`.
 
-__DHT:__
-   * __`TesseractNetwork`__(`lib/network/__init__.py`) is a node of Kademlia-based DHT that stores metadata used by trainer and runtime.
+**DHT:**
+
+- **`TesseractNetwork`**(`tesseract/network/__init__.py`) is a node of
+  Kademlia-based DHT that stores metadata used by trainer and runtime.
 
 ## Limitations
-WIP
+
+**DHT**:
+
+- DHT functionality is severely limited by its inability to traverse NAT.
+- Because of this all the features that require DHT are in deep pre-alpha state
+  and cannot be used without special setup.
+
+**Runtime**:
+* You can achieve 4x less network load by passing quantized uint8 activations across experts.
+    Implement your own quantization or wait for tesseract v0.8.
+* Currently runtime can form batches that exceed maximal batch_size by task_size - 1. 
+    We will fix that in the nearest patch.

+ 19 - 0
docs/Makefile

@@ -0,0 +1,19 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS    =
+SPHINXBUILD   = sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

BIN
docs/_static/bug.gif


BIN
docs/_static/bug.odp


BIN
docs/_static/bug_preview.gif


+ 4 - 0
docs/_static/fix_rtd.css

@@ -0,0 +1,4 @@
+/* work around https://github.com/snide/sphinx_rtd_theme/issues/149 */
+.rst-content table.field-list .field-body {
+    padding-top: 8px;
+}

+ 245 - 0
docs/conf.py

@@ -0,0 +1,245 @@
+# -*- coding: utf-8 -*-
+#
+# Configuration file for the Sphinx documentation builder.
+#
+# This file does only contain a selection of the most common options. For a
+# full list see the documentation:
+# http://www.sphinx-doc.org/en/master/config
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+import sys
+
+from recommonmark.transform import AutoStructify
+from recommonmark.parser import CommonMarkParser
+
+
+# -- Project information -----------------------------------------------------
+sys.path.insert(0, '..')
+src_path = '../tesseract'
+project = 'tesseract'
+copyright = '2020, Learning@home & contributors'
+author = 'Learning@home & contributors'
+
+# The short X.Y version
+version = ''
+# The full version, including alpha/beta/rc tags
+release = 'latest'
+branch = 'master'
+
+
+# -- General configuration ---------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+#
+# needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    'sphinx.ext.autodoc',
+    'sphinx.ext.autosummary',
+    'sphinx.ext.doctest',
+    'sphinx.ext.mathjax',
+    'sphinx.ext.linkcode',  # link to github, see linkcode_resolve() below
+    'sphinx.ext.napoleon',  # alternative to numpydoc
+]
+
+# see http://stackoverflow.com/q/12206334/562769
+numpydoc_show_class_members = False
+
+mathjax_path = ('https://cdn.mathjax.org/mathjax/latest/MathJax.js?'
+                'config=TeX-AMS-MML_HTMLorMML')
+
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The suffix(es) of source filenames.
+# You can specify multiple suffix as a list of string:
+#
+source_parsers = {'.md': CommonMarkParser}
+source_suffix = ['.rst', '.md']
+
+# The master toctree document.
+master_doc = 'index'
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#
+# This is also used if you do content translation via gettext catalogs.
+# Usually you set "language" from the command line for these cases.
+language = None
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = 'sphinx'
+
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = 'sphinx_rtd_theme'
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further.  For a list of options available for each theme, see the
+# documentation.
+#
+# html_theme_options = {}
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+
+# Custom sidebar templates, must be a dictionary that maps document names
+# to template names.
+#
+# The default sidebars (for documents that don't match any pattern) are
+# defined by theme itself.  Builtin themes are using these templates by
+# default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
+# 'searchbox.html']``.
+#
+# html_sidebars = {}
+
+
+# -- Options for HTMLHelp output ---------------------------------------------
+
+# Output file base name for HTML help builder.
+htmlhelp_basename = 'tesseractdoc'
+
+
+# -- Options for LaTeX output ------------------------------------------------
+
+latex_elements = {
+    # The paper size ('letterpaper' or 'a4paper').
+    #
+    # 'papersize': 'letterpaper',
+
+    # The font size ('10pt', '11pt' or '12pt').
+    #
+    # 'pointsize': '10pt',
+
+    # Additional stuff for the LaTeX preamble.
+    #
+    # 'preamble': '',
+
+    # Latex figure (float) alignment
+    #
+    # 'figure_align': 'htbp',
+}
+
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title,
+#  author, documentclass [howto, manual, or own class]).
+latex_documents = [
+    (master_doc, 'tesseract.tex', 'tesseract Documentation',
+     'Learning@home \\& contributors', 'manual'),
+]
+
+
+# -- Options for manual page output ------------------------------------------
+
+# One entry per manual page. List of tuples
+# (source start file, name, description, authors, manual section).
+man_pages = [
+    (master_doc, 'tesseract', 'tesseract Documentation',
+     [author], 1)
+]
+
+
+# -- Options for Texinfo output ----------------------------------------------
+
+# Grouping the document tree into Texinfo files. List of tuples
+# (source start file, target name, title, author,
+#  dir menu entry, description, category)
+texinfo_documents = [
+    (master_doc, 'tesseract', 'tesseract Documentation',
+     author, 'tesseract', 'One line description of project.',
+     'Miscellaneous'),
+]
+
+
+# -- Options for Epub output -------------------------------------------------
+
+# Bibliographic Dublin Core info.
+epub_title = project
+
+# The unique identifier of the text. This can be a ISBN number
+# or the project homepage.
+#
+# epub_identifier = ''
+
+# A unique identification for the text.
+#
+# epub_uid = ''
+
+# A list of files that should not be packed into the epub file.
+epub_exclude_files = ['search.html']
+
+
+# -- Extension configuration -------------------------------------------------
+
+# -- Options for intersphinx extension ---------------------------------------
+
+# Example configuration for intersphinx: refer to the Python standard library.
+intersphinx_mapping = {'https://docs.python.org/': None}
+
+# -- Options for todo extension ----------------------------------------------
+
+# If true, `todo` and `todoList` produce output, else they produce nothing.
+todo_include_todos = True
+
+
+def setup(app):
+    app.add_stylesheet("fix_rtd.css")
+    github_doc_root = 'https://github.com/rtfd/recommonmark/tree/master/doc/'  # TODO
+    app.add_config_value('recommonmark_config', {
+        'url_resolver': lambda url: github_doc_root + url,
+        'auto_toc_tree_section': 'Contents',
+        'enable_math': True,
+        'enable_inline_math': True,
+        'enable_eval_rst': True,
+        # 'enable_auto_doc_ref': True,
+    }, True)
+    app.add_transform(AutoStructify)
+
+
+#  Resolve function for the linkcode extension.
+
+
+def linkcode_resolve(domain, info):
+    def find_source():
+        # try to find the file and line number, based on code from numpy:
+        # https://github.com/numpy/numpy/blob/master/doc/source/conf.py#L286
+        obj = sys.modules[info['module']]
+        for part in info['fullname'].split('.'):
+            obj = getattr(obj, part)
+        import inspect
+        import os
+        fn = inspect.getsourcefile(obj)
+        fn = os.path.relpath(fn, start=os.path.dirname(src_path))
+        source, lineno = inspect.getsourcelines(obj)
+        return fn, lineno, lineno + len(source) - 1
+
+    if domain != 'py' or not info['module']:
+        return None
+    try:
+        filename = 'tesseract/%s#L%d-L%d' % find_source()
+    except Exception:
+        filename = info['module'].replace('.', '/') + '.py'
+    return "https://github.com/learning-at-home/tesseract/blob/%s/%s" % (branch, filename)

+ 31 - 0
docs/index.rst

@@ -0,0 +1,31 @@
+``learning@home::tesseract``
+====================================
+
+Tesseract lets you train huge neural networks on computers provided by volunteers. Powered by pytorch
+
+.. image:: _static/bug.gif
+
+User guide:
+
+.. toctree::
+  :maxdepth: 2
+
+  user/quickstart.md
+
+
+API documentation:
+
+.. toctree::
+  :maxdepth: 2
+
+  modules/client.rst
+  modules/runtime.md
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
+
+.. _GitHub: https://github.com/learning-at-home/tesseract

+ 35 - 0
docs/make.bat

@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=.
+set BUILDDIR=_build
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.http://sphinx-doc.org/
+	exit /b 1
+)
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
+
+:end
+popd

+ 19 - 0
docs/modules/client.rst

@@ -0,0 +1,19 @@
+tesseract.client
+================
+
+.. automodule:: tesseract.client
+
+.. currentmodule:: tesseract.client
+
+.. raw:: html
+
+  This module lets you connect to distributed Mixture-of-Experts or individual experts hosted
+  <strike>in the cloud cloud</strike> on someone else's computer.
+  <br><br>
+
+.. autoclass:: RemoteExpert
+   :members: forward
+
+.. autoclass:: GatingFunction
+   :members:
+   :member-order: bysource

+ 3 - 0
docs/modules/runtime.md

@@ -0,0 +1,3 @@
+# Runtime 
+
+TODO i explain runtime

+ 2 - 0
docs/requirements.txt

@@ -0,0 +1,2 @@
+recommonmark
+sphinx_rtd_theme

+ 1 - 0
docs/user/quickstart.md

@@ -0,0 +1 @@
+# Quick start

+ 11 - 1
setup.py

@@ -1,12 +1,22 @@
 from pkg_resources import parse_requirements
 from setuptools import setup
+import codecs
+import re
+import os
+
+here = os.path.abspath(os.path.dirname(__file__))
 
 with open('requirements.txt') as requirements_file:
     install_requires = [str(requirement) for requirement in parse_requirements(requirements_file)]
 
+# loading version from setup.py
+with codecs.open(os.path.join(here, 'tesseract/__init__.py'), encoding='utf-8') as init_file:
+    version_match = re.search(r"^__version__ = ['\"]([^'\"]*)['\"]", init_file.read(), re.M)
+    version_string = version_match.group(1)
+
 setup(
     name='tesseract',
-    version='0.7',
+    version=version_string,
     description='',
     long_description='',
     author='Learning@home authors',

+ 2 - 0
tesseract/__init__.py

@@ -2,3 +2,5 @@ from .client import *
 from .network import *
 from .server import *
 from .utils import *
+
+__version__ = '0.7.1'

+ 20 - 3
tesseract/client/gating_function.py

@@ -12,6 +12,21 @@ from ..utils import nested_map, check_numpy, run_and_await_k
 
 
 class GatingFunction(nn.Module):
+    """
+    A torch module that selects experts across the network and averages their predictions
+
+    :param in_features: common input size for experts and gating function
+    :param grid_size: tesseract dimensions that form expert uid (see below)
+    :param uid_prefix: common prefix for all expert uids
+     expert uid follows the pattern {uid_prefix}{0...grid_size[0]}.{0...grid_size[1]}...{0...grid_size[-1]}
+    :param network: TesseractNetwork where the experts reside
+    :param num_workers: number of threads for parallel network operation
+    :param k_best: queries this many experts with highest scores
+    :param k_min: makes sure at least this many experts returned output
+    :param timeout_after_k_min: waits for this many seconds after k_min experts returned results.
+     Any expert that didn't manage to return output after that delay is considered unavailable
+    :param expert_padding: internal value used to denote "absent expert". Should not coincide with any expert uid.
+    """
     def __init__(self, *, in_features, grid_size: Tuple[int], network, num_workers=None,
                  k_best, k_min=1, timeout_after_k_min=1.0, uid_prefix='', expert_padding=None):
         super().__init__()
@@ -25,8 +40,9 @@ class GatingFunction(nn.Module):
     def forward(self, input: torch.Tensor, *args, **kwargs) -> Tuple[List[List[RemoteExpert]], torch.Tensor]:
         """
         Choose k best experts with beam search, then call chosen experts and average their outputs.
+
         :param batch: named tensors, each tensor has 0-th axis dedicated to batch (aka batch-first
-        :return: averaged predictions of all experts that delivered on time
+        :returns: averaged predictions of all experts that delivered on time
         """
         assert len(input.shape) == 2
 
@@ -68,12 +84,13 @@ class GatingFunction(nn.Module):
     def beam_search(self, grid_scores: List[torch.Tensor], k_best: int, **kwargs) -> List[List[RemoteExpert]]:
         """
         Find and return k best experts in the grid using (exact) beam search of the product space
+
         :param grid_scores: scores predicted for each dimension in the grid,
         :type grid_scores: a sequence of tensors of shape[batch_size, self.grid_size[i]]
         :param k_best: how many of the top experts participate in the computation
         :param kwargs: extra keyword parameters passed to self.network.first_k_active
-        :returns: a list of *batch_size* lists that contain chosen experts for one sample
-            each inner list contains RemoteExpert instances for *up to* k_best experts
+        :returns: a list of *batch_size* lists that contain chosen experts for one sample each inner list contains \
+         RemoteExpert instances for *up to* k_best experts
         """
         assert len(grid_scores) == len(self.grid_size)
         assert all(len(dim_scores.shape) == 2 for dim_scores in grid_scores)

+ 1 - 0
tesseract/client/remote_expert.py

@@ -25,6 +25,7 @@ class RemoteExpert(nn.Module):
         self._info = None
 
     def forward(self, *args, **kwargs):
+        """ Call RemoteExpert for the specified inputs and return its output(s). Compatible with pytorch.autograd. """
         assert len(kwargs) == len(self.info['keyword_names']), f"Keyword args should be {self.info['keyword_names']}"
         kwargs = {key: kwargs[key] for key in self.info['keyword_names']}
         # Note: we put keyword arguments in the same order as on a server to prevent f(a=1, b=2) != f(b=2, a=1) errors

+ 1 - 1
tesseract/runtime/__init__.py

@@ -45,7 +45,7 @@ class TesseractRuntime(threading.Thread):
                     outputs = pool.process_func(*batch)
                     output_sender_pool.apply_async(pool.send_outputs_from_runtime, args=[batch_index, outputs])
                     progress.update(len(outputs[0]))
-                    progress.desc = f'{pool.uid=} {len(outputs[0])=}'
+                    progress.desc = f'pool.uid={pool.uid} batch_size={len(outputs[0])}'
             finally:
                 self.shutdown()
 

+ 1 - 1
tesseract/runtime/task_pool.py

@@ -116,7 +116,7 @@ class TaskPool(TaskPoolBase):
         return batch_tasks
 
     def run(self, *args, **kwargs):
-        print(f'Starting pool, {os.getpid()=}')
+        print(f'Starting pool, pid={os.getpid()}')
         pending_batches = {}  # Dict[batch uuid, List[SharedFuture]] for each batch currently in runtime
         output_thread = threading.Thread(target=self._pool_output_loop, args=[pending_batches],
                                          name=f'{self.uid}-pool_output_loop')