Wednesday, October 14, 2015

from __experimental__ import something_new : running scripts from the command line.

EDIT: I just found out that the notation "from __experimental__ import" had already been suggested in a different context than the one I have been working on.   Perhaps I should use "__nonstandard__" instead of "__experimental__" to avoid any confusion.



In a post I wrote yesterday, I mentioned a way to run "experimental" code containing non-standard Python syntax (e.g. new keywords, not recognized by Python's interpreter) by using an "import hook" to convert the code into a proper Python syntax prior to executing it.   One caveat of the approach I used was that it only worked if the "experimental" code was imported.  This restriction is also present in the MacroPy project (which is something I stumbled upon and is definitely a much more substantial project than the little toy I created.)

Today, I have a new version that can effectively be run from the command line.  (I believe that the approach I use could also work for the MacroPy project).  This is version 4 found in this github repository.

I will start with a concrete example taken from the repository (file test.py); the code below contains keywords and constructs that are definitely not valid in Python.


'''This is not a valid Python module as it contains two
   non-standard keywords:  repeat and function.  However,
   by using a custom importer, and the presence of the special
   import line below, these non-standard keywords will be converted
   into valid Python syntax prior to execution.
'''
from __experimental__ import repeat_keyword, function_keyword  # magic! :-)


def normal_syntax():
    '''Creates the list [4, 4, 4] by using the normal Python syntax,
       with a for loop and a lambda-defined function.
    '''
    res = []
    g = lambda x: x**2
    for _ in range(3):
        res.append(g(2))
    return res


def experimental_syntax():
    '''Creates the list [4, 4, 4] by using an experimental syntax
       with the keywords "repeat" and "function", otherwise
       using the same algorithm as the function called "normal_syntax".
    '''
    res = []
    g = function x: x**2
    repeat 3:
        res.append(g(2))
    return res


if __name__ == '__main__':
    if normal_syntax() == experimental_syntax():
        print("Success")
    else:
        print("Failure")

If you try to run this program from the command line using "python test.py" at your command/shell prompt ... it will definitely fail.  However, using the code from the repository, you can run it via "python import_experimental.py test".  The code inside import_experimental.py, which has many more comments than I would normally write, is the following:

''' A custom Importer making use of the import hook capability

https://www.python.org/dev/peps/pep-0302/

Its purpose is to convert would-be Python module that use non-standard
syntax into a correct form prior to importing them.
'''

# imp is deprecated but I wasn't (yet) able to figure out how to use
# its replacement, importlib, to accomplish all that is needed here.
import imp
import re
import sys

MAIN = False
from_experimental = re.compile("(^from\s+__experimental__\s+import\s+)")


class ExperimentalImporter(object):
    '''According to PEP 302, an importer only requires two methods:
       find_module and load_module.
    '''

    def find_module(self, name, path=None):
        '''We don't need anything special here, so we just use the standard
           module finder which, if successful,
           returns a 3-element tuple (file, pathname, description).
           See https://docs.python.org/3/library/imp.html for details
        '''
        self.module_info = imp.find_module(name)
        return self

    def load_module(self, name):
        '''Load a module, given information returned by find_module().
        '''

        # According to PEP 302, the following is required
        # if reload() is to work properly
        if name in sys.modules:
            return sys.modules[name]

        path = self.module_info[1]  # see find_module docstring above
        module = None

        if path is not None:   # path=None is the case for some stdlib modules
            with open(path) as source_file:
                module = self.convert_experimental(name, source_file.read())

        if module is None:
            module = imp.load_module(name, *self.module_info)
        return module

    def convert_experimental(self, name, source):
        '''Used to convert the source code, and create a new module
           if one of the lines is of the form

               ^from __experimental__ import converter1 [, converter2, ...]

           (where ^ indicates the beginning of a line)
           otherwise returns None and lets the normal import take place.
           Note that this special code must be all on one physical line --
           no continuation allowed by using parentheses or the
           special \ end of line character.

           "converters" are modules which must contain a function

               transform_source_code(source)

           which returns a tranformed source.
        '''
        global MAIN
        lines = source.split('\n')

        for linenumber, line in enumerate(lines):
            if from_experimental.match(line):
                break
        else:
            return None  # normal importer will handle this

        # we started with: "from __experimental__ import converter1 [,...]"
        line = from_experimental.sub(' ', line)
        # we now have: "converter1 [,...]"
        line = line.split("#")[0]    # remove any end of line comments
        converters = line.replace(' ', '').split(',')
        # and now:  ["converter1", ...]

        # drop the "fake" import from the source code
        del lines[linenumber]
        source = '\n'.join(lines)

        for converter in converters:
            mod_name = __import__(converter)
            source = mod_name.transform_source_code(source)

        module = imp.new_module(name)
        # From PEP 302:  Note that the module object must be in sys.modules
        # before the loader executes the module code.
        # This is crucial because the module code may
        # (directly or indirectly) import itself;
        # adding it to sys.modules beforehand prevents unbounded
        # recursion in the worst case and multiple loading in the best.
        sys.modules[name] = module

        if MAIN:  # see below
            module.__name__ = "__main__"
            MAIN = False
        exec(source, module.__dict__)

        return module


sys.meta_path = [ExperimentalImporter()]

if __name__ == '__main__':
    if len(sys.argv) >= 1:
        # this program was started by
        # $ python import_experimental.py some_script
        # and we will want some_script.__name__ == "__main__"
        MAIN = True
        __import__(sys.argv[1])

One could easily write a shell script/bat file which would simplify execution to something like "my_python test"

It would be nice to remove the "imp" dependency and use the proper functions/methods from the importlib module, something which I have not been able to figure out (yet).  Anyone familiar with the importlib module is more than welcome to do it and tell me about it. ;-)

Also, writing more useful code converters than the two toy ones I created would likely be an interesting project.

No comments: