Monday, August 10, 2009

PEP Submission

Now that the coding is more or less complete, I have produced and submitted the following P.E.P. I have not officially been assigned a P.E.P. number but I stuck 3144 on there as the newest one when I began writing this one was 3143.

PEP: 3144
Title: Asynchronous I/O For subprocess.Popen
Author: (James) Eric Pruitt, Charles R. McCreary, Josiah Carlson
Status: Draft
Type: Standards Track
Requires: 324
Created: 04-08-2009
Content-Type: text/plain
Python-Version: 3.2

Abstract:
In its present form, the subprocess.Popen implementation is prone to
dead-locking and blocking of the parent Python script while waiting on
data from the child process.

Copyright:
This P.E.P. is licensed under the Open Publication License;
http://www.opencontent.org/openpub/.

Motivation:
A search for "python asynchronous subprocess" will turn up numerous
accounts of people wanting to execute a child process and communicate
with it from time to time reading only the data that is available
instead of blocking to wait for the program to produce data [1] [2]
[3]. The current behavior of the subprocess module is that when a user
sends or receives data via the stdin, stderr and stdout file objects,
dead locks are common and documented [4] [5]. While communicate can be
used to alleviate some of the buffering issues, it will still cause
the parent process to block while attempting to read data when none is
available to be read from the child process.

Rationale:
There is a documented need for asynchronous, non-blocking
functionality in subprocess.Popen [6] [7, comments] [2] [3]. Inclusion
of the code would improve the utility of the Python standard library
that can be used on Unix based and Windows builds of Python.
Practically every I/O object in Python has a file-like wrapper of some
sort. Sockets already act as such and for strings there is StringIO.
Popen can be made to act like a file by simply using the methods
attached the the subprocess.Popen.stderr, stdout and stdin file-like
objects. But when using the read and write methods of those options,
you do not have the benefit of asynchronous I/O. In the proposed
solution the wrapper wraps the asynchronous methods to mimic a file
object.

Reference Implementation:
I have been maintaining a Google Code repository that contains all of
my changes including tests and documentation [9] as well as blog
detailing the problems I have come across in the development process
[10].

I have been working on implementing non-blocking asynchronous I/O in
the subprocess.Popen module as well as a wrapper class for
subprocess.Popen that makes it so that an executed process can take
the place of a file by duplicating all of the methods and attributes
that file objects have.

[1] http://mail.python.org/pipermail/python-bugs-list/2006-December/036524.html
[2] http://ivory.idyll.org/blog/feb-07/problems-with-subprocess
[3] http://stackoverflow.com/questions/636561/how-can-i-run-an-external-command-asynchronously-from-python
[4] http://docs.python.org/library/subprocess.html#subprocess.Popen.wait
[5] http://docs.python.org/library/subprocess.html#subprocess.Popen.kill
[6] http://bugs.python.org/issue1191964
[7] http://code.activestate.com/recipes/440554/
[8] http://code.google.com/p/subprocdev/source/browse/doc/subprocess.rst?spec=svn2c925e935cad0166d5da85e37c742d8e7f609de5&r=2c925e935cad0166d5da85e37c742d8e7f609de5#437
[9] http://code.google.com/p/subprocdev
[10] http://subdev.blogspot.com/

Monday, July 27, 2009

Prep the PEP

The CRLF Windows issues is the only thing holding me back from producing a patch and PEP and submitting it for inclusion into the Python Core. I posted on comp.lang.python and until I get a response, I will be gathering sources, references and planning out my PEP as well as looking over my code and checking to see where optimizations can be made.

Friday, July 24, 2009

Unrelated

While chatting with an Argentinean buddy, he used an amusing portmanteau. He referred to "Windows" as "Guindous," a combination of the word Windows and the Spanish verb "guindar" or "to freeze." I was rather amused and felt like sharing the term with people.

stdout

I discovered that Wing IDE has beta releases with Python 3.1 support and have been using them to debug my code. My unit tests for my subprocess.Popen changes now work flawlessly on both Linux and Windows. The only things holding me back now are issues with stdout being opened in text mode by libc. See the following message from Amaury on Python-Dev:

> Ah so any streams opened in text mode on Windows will read '\n' newlines as
> '\r\n'?

No, it is the libc stdout which is opened in text mode. This simple program:
int main() {
printf("Hello\n");
}
when run like this:
program > out
will create a file ending with \r\n.

ReadFile and WriteFile (and other functions from the win32 API) are
unaware of this, and faithfully transmit the bytes without
modification.
This is causing my unit test for my file wrapper to fail; all of my '\n' newlines are converted to '\r\n.' I would greatly appreciate suggestions on how to deal with it as well as using ctypes or adding to _subprocess.c. If you are not subscribed to Python-Dev, you can view the community's discussion on whether ctypes or C code would be a better solution at this link; http://mail.python.org/pipermail/python-dev/2009-July/090720.html

Thursday, July 16, 2009

C You Later

I have abandoned modifying _subprocess.c in favor of a path that will be much easier than attempting to pull out the parts of Mark Hammond's C++ code and converting it to C: Python ctypes. After importing ctypes, I can call "ctypes.windll.kernel32.PeekNamedPipe" to access the PeekNamedPipe function which is used in the modifications to subprocess.py. In C/C++ multiple return values are generally handled by passing variable by reference. In ctypes, this functionality is duplicated so I must pass the variables that I will need to get data from using the ctypes.byref() function and then parse said data into a tuple.

Tuesday, July 14, 2009

MSVC++ Runtime Library

When examining the full build output from Microsoft Visual C++ Express, it was discovered there was an error that was along the lines of "MSC_DLL_LIB" and the program make_versioninfo.exe. Executed alone, make_versioninfo.exe complained of missing MSVC++ DLL runtime libraries and after re-install Microsoft Visual C++ Express, Python now builds successfully.

I adjusted Mark Hammond's code to be integrated into the _subprocess.c file and now that I can compile Python on Windows, I will be testing it to make sure it functions as expected and integrating the other two routines from PyWin32 that I need for my modification to subprocess.py.

Not DEP

I have modified one of the three functions from Mark Hammond's PyWin32 library to be integrated into the _subprocess.c file of the Python 3.1 source code. When I first attempted to compile the Python source code in Windows yesterday, I encountered a linking error: "LINK : fatal error LNK1181: cannot open input file '.\python31_d.lib'". Unable to compile the code, I went to the Python-Dev list for assistance and a few suggestions were made but none seemed to work. I disabled Windows Data Execution Protection and restarted my machine and downloaded the Python 3.1 source code from the SVN library. That code compiled fine and I considered the issue resolved.

As the old saying goes "correlation does not imply causation." I have, again, encountered the same linking error and can no longer compile the Pytohn source. I downloaded both the Python 3.2 code from the Subversion repository again; did not compile. I downloaded the official Python 3.1 code Gzipped and Bzipped tarbills; neither compiled.

Luckily, I thought ahead when I woke up and prepared to compile the _subprocess.c file with Mark Hammond's code integrated into it. I downloaded the Python 3.1 tar ball and attempted to compile taking screenshots at each step.

I downloaded the Python 3.1 source tar ball from python.org.


Using WinRAR, I extracted the source code.

I tried double clicking the pcbuild.sln file and right clicking and selecting open but neither of those actions did anything. The cursor changed to an hour glass briefly but Visual C++ did not open.

I then opened Visual C++ and went to "File," then "Open."


I browsed for the pcbuild.sln file and selected it to open.
A modal error dialog about a limitation in my version of Visual C++ popped up.

The project opens after hitting "OK" so I right click on the "python" portion of the project and hit "Build."

It proceeds to build but eventually errors at the end of the process.
Build log output: http://pastebin.com/m616681cc

Friday, July 10, 2009

Mid Term Summary

Asynchronous I/O for the Python subprocess module has been successfully implemented in both the Python 2k and Python3k branches. Building off of that functionality, I also implemented a wrapper class for the asynchronous I/O so that child processes could act as stand-ins for file objects. The unit tests I produced for the asynchronous I/O and file-like object run flawlessly.

In the last half of Google Summer of Code, I will be be perfecting my file-like object for subprocess.Popen, porting Mark Hammond's win32 modules to C and integrating them with the _subprocess.c file as the asynchronous I/O depends on his code to run. When all is said and done, I will produce a PEP for my changes and attempt to get them integrating into future Python releases.

Thursday, July 2, 2009

Core Conversion Complete

My file wrapper and Josiah's code now work in Python 3.1 and the changes have been committed to the Google Code project. The next thing to tackle is converting Mark Hammond's code from C++ to C and integrating it into the _subprocess.c file.

Tuesday, June 30, 2009

Not Far From The Tree

Now that my implementation of subprocess.Popen is complete and unit tested, I will be moving on to moving my changes to Python 3.1. I just compiled Python 3.1 on Debian, all modules included, and pdb is still mal-functioning so if the code conversion isn't as smooth as I anticipate, I will be using print statements to debug which is a bit difficult due to the nature of subprocess.Popen.

The license of my project has been changed from GPL to Apache License 2.0 so that there is a chance of it being integrated into the Pytohn core. There are still some issues that I may have handled incorrectly as far as my subprocess.Popen file wrapper goes, mainly handling the "mode" argument that is used when one opens a file. Right now, it is mostly ignored, the exception being universal newline support.

Saturday, June 27, 2009

Updates

Though it has been a while since I have made a post, things are going well right now. Since my last post, I have created a Google Code Repository located at http://code.google.com/p/subprocdev/. I have integrated, modified and written tests for the code I got from Josiah Carlson. Two of the functions he left outside of his Popen patch that were moved into subprocess.Popen. I made some aesthetic and functional changes to his recv_some and send_all functions.

Eventually, I got my code and Josiah's code just about fully unit tested in Python 2.7 and from there, proceeded to see if I could convert it over to 3.0. After I did this, I realized that it still ran, for the most part, in Python 2.7. With the following code, I attempted to get around the only thing that was causing me grief:

import sys
if sys.version_info[0] == 3:
from io import BufferedWriter as buffer
else:
class str(object):
def __init__(self,a,b=None):
self = a.__str__()
def isinstance(a,b):
if b == str:
return hasattr(a,'strip')
try:
return a == b(a)
except:
return False

In Python3.0, the str function can also accept character encoding as an argument so I tried to overload the Python 2.7 str class to make it 3.0 compliant to no avail so I will be producing a separate branch for Python 2.7 and 3.0 development in my Google Code Repository but any suggestions on getting around the str issue are greatly appreciated.

Thursday, June 18, 2009

After contacting Josiah Carlson, author of the activestate patch for asynchronous I/O in subprocess.Popen, he was more than happy to release the code into the public domain so my license conflicts with his patch are now resolved. Some of the tests I wrote for his patch have failed and he agreed to take a look at what I wrote to help me get the tests running properly.

Upon running my tests, he realized that pywin32 is not part of the standard library which complicates my project a bit as far as Windows is concerned so I am now looking for alternatives to Josiah Carlson's asynchronous I/O patch for subprocess.Popen on Windows. If I cannot find an alternative, I will be using cTypes with the parts of Mark Hammond's code that I need, license permitting. Any suggestions are greatly appreciated.

Wednesday, June 17, 2009

My Original Proposal

My proposal was never posted so I am putting it up now.

Proposal

* A long running issue with the subprocess module is the handling of asynchronous io (http://bugs.python.org/issue1191964, http://ivory.idyll.org/blog/feb-07/problems-with-subprocess). Several patches have been proposed and several alternative solutions suggested. The patches and the alternative solutions will be implemented initially for 2.7 and then for 3.0. The subprocess test suite will be augmented to fully test this new functionality.

* Reimplement the commands module in terms of subprocess functions (http://ivory.idyll.org/blog/mar-07/replacing-commands-with-subprocess). Since the commands module will soon deprecated, this will provide a platform independent transition to those still using the commands module.


The modifications to the subprocess module will be presented initially for code review on Rietveld. A message will be added to the above referenced issues so that all interested parties can comment.

Schedule

Start of program:

* Implement command modules using subprocess
* Extend test suite
* Post code review



Midterm evaluation:

* Address issues from code review and post patch
* Synthesize the patches from the above links
* Extend test suite
* Post code review


Final evaluation:

* Address issues from code review and post patch

Friday, June 12, 2009

Who will police the police?

It appears the recent release of Python 3.0 have a buggy pdb. When I first ran into the issue, I thought it might have had something to do with my unit tests or possible incompatibilities between the subprocess.Popen module and pdb. I wrestled with it before finally throwing a bit of "Google-Fu" at it and discovered that I was not the only person having this issue. (http://bugs.python.org/issue6126)

Python 2.7 is supposed to be a 2.X compatible backport of the features and syntax found in Python 3.0 so any code added to the 2.7 branch must be either a bug fix or a backport of a Python 3.0 feature. Before I learned about that, my proposal was directed towards Python 2.7. The pdb module in Python 2.7 works flawlessly and did not give me any trouble while I was debugging my code with it.

Until the issues with the Python 3.0 pdb module are resolved, I will be taking advantage of the forward compatiblity and coding in Python 2.7 so that I will have a functional code debugger.

Tuesday, June 2, 2009

Google Code and License Conflicts

Until fairly recently, Google Code only supported Subversion code control publicly and Mercurial had to be specifically requested but now, Mercurial or Subversion can be used. Since I am using Mercurial to manage my branch of subprocess, this will make getting feedback on my code from the community much easier thanks to the large number of review features available.

As I was filling out the form to create a code repository, I ran into the issue of code licensing: Python has it's own unique license, the Python License, which is considered compatible with GPL according to the Free Software Foundation. Most of the code I am using comes from activestate.com and as such falls under an MIT license. After contacting the Python-Dev list, I am still at a bit of loss on selecting a license for the Google Code project and what impact the different licenses will have on being able to get my code pushed into the Python core.

Wednesday, May 27, 2009

Personal Revision Control

After visiting with my mentor, he had me to look into using Mercurial with a Subversion repository. Subversion code control and all revisions are centralized so while anyone can check out code, if you produce your own patches and would then like to revert your changes, you are out of luck since you need commit privileges to produce revision checkpoints. Mercurial, being a distributed revision control system, allows the user to have their own revision checkpoints even if they do not have commit privileges. Eventually I came across hgsvn which downloads a Subversion branch and then converts it to a Mercurial branch.

After downloading and installing the latest version and reading the documentation, everything seemed pretty straightforward. I executed the following command:

$ hgimportsvn http://svn.python.org/projects/python/trunk/Lib personalrc

Everything appeared to be going smoothly until I looked inside the folder and noticed that nearly none of the files from the /python/trunk/Lib branch were in there. I spent quite a bit of time trying to figure out why it didn't work until I finally realized that, for some reason, it was not checking out the newest revision of code. After entering the web interface of Python's SVN repository, I saw the latest revision was 72964. So I executed the following command:

$ hgimportsvn -r 72964 http://svn.python.org/projects/python/trunk/Lib personalrc

Everything was in its place now and I had the code revision control the Mercurial provides.

Monday, May 25, 2009

File I/O and Async

I have finished implementing the File I/O class for Popen which using the asynchronous I/O patch mentioned in my proposal from active state. The test for the File I/O class was written ahead of time but I strayed away from the TDD philosophy for the borrowed code as I did not fully understand how it was supposed to work. Since my File I/O class depends on the asynchronous output working properly, I am using those tests for a little bit in hopes of getting proper coverage.

I have run into the issue of debugging my code. The code I wrote failed the test I created and I have not been able to figure out why. It doesn't help that I have not gotten Wing IDE 3.1 to work with Python 2.7.

Friday, May 22, 2009

File I/O

In attempting to create a file object to act as a helper class, I ran into the issue of blocking I/O. I was initially going to implement the file helper class before implementing the versions of asynchronous I/O with Popen before doing so but then I realized I would run into issues of blocking when the file.write function was called; since we are treating it like a file object, the output isn't immediately important to us so blocking doesn't make sense for this. I have been looking into a specific piece of code on activestate with an async I/O implementation for Popen and one of the comments contained information on non-blocking I/O but for Posix systems only. I need to figure out how to keep the I/O from blocking on Windows as well since cross platform compatibility is one of my overall goals.

Wednesday, May 20, 2009

Unfamiliar Territory

As I am not used to working with code I have not written, I am working on a visual diagram of the subprocess functions to give me an idea of the general stack trace for functions as they are called. It will give me something I can physically reference without having to backtrack while I am in the middle of coding. My first addition to the subprocess module is to be able to treat the subprocess objects like file I/O objects so that write and read and readlines can be used however as this was not in my initial proposal I would like some feedback and criticism.

Sunday, May 17, 2009

Installed

I now have Python 2.7 installed on my system alongside my Python 2.5 installation. For developing the subprocess module, I have copied the subprocess.py file from the source. Because the Windows module requires an additional component written in C which I am not familiar with, I am going to begin developing on Linux first and then see if it runs on Windows as I may not need to modify any of the Windows C code.

Sunday, May 10, 2009

Python 2.7 Built

Python 2.7 Alpha 0 successfully built. Focusing on getting Python Virtual Env to work with it so I may use it alongside my existing Python 2.5 installation.

Monday, May 4, 2009

I have obtained the Python 2.7 Alpha 0 through subversion to develop subprocess so that it may be incorporated into the official 2.7 release. Next step is to setup Python Virtual Environment so I am able to use it alongside my current Python 2.5.4 build.