Fun with Visual Basic and Perl...
Gavin Smyth and Richard CatonThis is an article which appeared in EXE magazine in October 1998, under the title of "Ménage à Trois" (That's "Menage a Trois" for those of you with browsers that can't hack accents).
Just after this appeared in print, ActiveState announced the availability of their COM interface to Perl (PerlCOM and PerlCtrl, part of the Perl Developers Kit, making most of the effort we went to here redundant and making this comment rather silly! Oh well, it was a good training exercise...
As part of our work in Scientific Generics, we often have to massage the output of various software tools to a form suitable for feeding into other tools - nothing unusual about that, but recently we saw a need to merge a couple of programs, one written in Perl and the other in Visual Basic. This proved to be more of a challenge than we expected.
What we wanted to do was interactively extract some 2d polygon information from a file produced by a circuit design package and manipulate it on-screen. We already had suitable extraction code written in Perl and bitmap and vector graphics code in Visual Basic (to be more exact, Visual Basic with some support DLLs). We quickly rejected the approaches of porting one system to the other's language. The Perl code would have been too tortuous to translate to VB as it consisted mainly of pattern matching and string manipulation. Although Perl itself contains no graphical output mechanism, it is relatively easy to build and install Perl/Tk (even for 32-bit Windows), a version of Perl which includes the Tk widget set of Tcl/Tk fame. However, this is yet another graphics library to learn whereas we were already familiar with VB, and accessing the support DLLs from Perl would probably be as much work as connecting Perl and VB. For our immediate application, the major interface function to deal with was a 'hit test,' taking an (x,y) co-ordinate pair as input and returning a couple of strings and an unbounded list of co-ordinate pairs representing the outline of a polygon.
Having decided to try to link the two languages, there were essentially two options open to us: use a custom intermediate C/C++ layer, or use the Microsoft Script Control with PerlScript.
With an intermediate layer, there are obviously two interfaces to worry about: I will tackle the simpler first, C++ to Perl. (we say C++ deliberately here since, as will be explained later, C++, Visual C++ to be precise, support was important.) The standard Perl documentation describes the interface in detail and there is a good overview chapter in Advanced Perl Programming. In summary, we need to instantiate a Perl interpreter, load the code, and can then invoke functions defined in the script, before finally disposing of the interpreter. Parameters and results are passed on a Perl stack, which is manipulated via a few macros provided in header files present in the standard Perl distribution.
Building the Perl interpreter into C++ was quite easy, but there were a couple of troublesome areas:
Perl's connectivity is with C and not C++ so we had to insert some mucky preprocessor hackery to stop it getting confused, shown in Figure 1. The Perl headers use DEBUG differently to the normal Microsoft meaning, so we protected the definition around the Perl include; Perl includes math.h, but the inclusion of perl.h has to occur within an extern "C", causing complaints when the template definitions in math.h are parsed, so we explicitly included it first, outside the C linkage; and finally, Perl defines a Copy macro which interferes with a function declared later, but fortunately which we did not need it anyway so the workaround was simply to undefine it.
#ifdef DEBUG
# define OLD_DEBUG
# undef DEBUG
#endif
#include <math.h> // Not because we need it, but because Perl screws it!
extern "C"
{
# include "EXTERN.h"
# include "perl.h"
}
#undef DEBUG
#undef Copy
#ifdef OLD_DEBUG
# define DEBUG
# undef OLD_DEBUG
#endif |
Perl code can employ routines from a number of external modules, some of which may be separately compiled libraries. The Perl interpreter built into our program can quite happily access Perl modules but needs a bit more help with C modules: these modules are quite common - for example, socket handling uses one, and file access uses another. An initialisation function needs to be defined to bootstrap every compiled module being used, such as that in Figure 2, illustrating the initialisation function for the Perl IO module. In addition, we must link the library files for these modules into the build process. While all this is easy to do, it does mean that the application has to be rebuilt if we change our Perl script to use a different set of compiled modules.
extern "C"
{
void boot_IO _((CV* cv));
}
static void xs_init(void)
{
char *file = __FILE__;
newXS("IO::bootstrap", boot_IO, file);
} |
Figure 3 shows the Perl initialisation and deinitialisation code. The third and fourth arguments to perl_parse are an argc/argv combination: argv[0] behaves as an executable name and is essentially unimportant here, and argv[1] is the name of the script to load and run; further arguments and options can be specified by populating more of the argv array.
void CPerlSheet::LoadPerl()
{
m_Perl = perl_alloc();
perl_construct( m_Perl );
static char* argv[] = { "PerlScan", "scan.pl", NULL };
perl_parse( m_Perl, xs_init, sizeof( argv ) / sizeof( argv[ 0 ] ) - 1, argv, NULL );
}
void CPerlSheet::UnloadPerl()
{
perl_destruct(m_Perl);
perl_free(m_Perl);
m_Perl = NULL;
} |
Our Perl script, along with the other Perl modules, loads the circuit design file and parses it into internal structures, as well as providing the hit test function we are really interested in. As mentioned earlier, communication with Perl is via a stack. We pass two integer arguments and expect a pair of strings and a list of co-ordinate pairs in return. It is much easier to flatten the whole result data structure and return all the elements on the stack than it is either to return a structure or to populate a passed in structure pointer: this is not because of any problem with the Perl script, but because it is easier to process the flattened list in C++. Figure 4 shows a fragment of code which just prints out the results and Figure 5 contains some much simplified Perl test code. The following stack manipulation macros from the Perl distribution are used here:
dSP - declare and initialise a local stack pointerENTER - create a stack frameSAVETMPS - prepare to place temporary variables on the stackPUSHMARK - mark the start of the stack so that Perl knows how many arguments to processXPUSHs - push a scalar variablePUTBACK - update the global stack pointer with respect to the local oneSPAGAIN - reread the global stack pointer back into the local onePOPi - pop off an integerPOPs - pop off a scalar variableFREETMPS - release temporary variables (in this case, the returned values)LEAVE - release the stack frameThese are explained in detail in the Perl on-line documentation. Notice that the return list is seen by C++ in reverse order to Perl.
dSP;
ENTER;
SAVETMPS;
PUSHMARK( sp );
// Push x and y
XPUSHs( sv_2mortal( newSViv( x ) ) );
XPUSHs( sv_2mortal( newSViv( y ) ) );
PUTBACK;
int stackSize = perl_call_pv( "Lookup", G_ARRAY );
SPAGAIN;
for( int i = 0; i < ( stackSize - 2 ) / 2; ++i )
{
int y = POPi;
int x = POPi;
cout << "(" << x << ", " << y << ")\n";
}
char* s2 = SvPVX( POPs );
char* s1 = SvPVX( POPs );
cout << s1 << endl << s2 << endl;
PUTBACK;
FREETMPS;
LEAVE; |
sub Lookup
{
my( $x, $y ) = @_;
my @ReturnArray = ( "String 1", "String 2", $x, $y, $x*2, $y*2, $x*3, $y*3 );
@ReturnArray;
} |
Now, what about VB to C++? The simplest mechanism from the C++ point of view is to expose a few functions in a DLL. Unfortunately, moving arrays this way is rather unpleasant, so we would have to extract the returned co-ordinates one at a time, implying that the list must be stored 'below' VB . Since we might have more than one hit test outstanding at any one time, we could not use a single global array, so we have to allocate them on the heap, and take care to deallocate them when we are finished with them - this is easier to control in C++ than in Perl. Having dabbled with the Active Template Library (ATL), we wondered if that could help: making the return structure a proper COM object means VB will 'know' how and when to deallocate it without extra assistance. (The ATL is why we used Visual C++ and therefore why we had to go to so much trouble to sort out the Perl include files.) Because we define the structures in a type library, we can treat them almost as any other control in VB. The bad news is that creating a suitable ATL control is not a trivial piece of work! We could have created an MFC control instead: that would have been only marginally easier, but brings a huge amount of unwanted baggage with it, both in terms of run time DLLs and in terms of source code to support operations in which we were not interested.
Visual C++'s ATL wizard (standard with VC 5.0, but available for download from the Microsoft site for VC 4.2) quickly creates a project into which you can insert controls and other COM objects. A little bit of experimentation suggested that a full control is the easiest type of object to manipulate in VB, but that VB can also access simpler COM objects - for example, controls show up in the VB components dialog box, can be placed on forms, and can have their properties set in the property box; the objects termed 'simple COM objects' in ATL do not have these abilities, but can still be used in VB.
We created a couple of objects: a PerlSheet full control which is responsible for handling the Perl interpreter, and a Component simple object which contains the hit test return values. The most interesting function of the IPerlSheet interface is HitTest which returns an IComponent interface. Note the 'I's in the previous sentence: we are dealing with COM interfaces, which will be implemented in ATL classes - VB knows about interfaces, but cares little about the objects behind the scenes. The IDL source generated by the ATL wizard can be improved further: each class has a default interface and this should be marked hidden to tidy up the presentation of the class to object browsing tools (the OLE/COM browser tool that comes with Visual C++ is very useful for checking that you have defined the types you think you have); some COM containers can be more efficient if the classes they are using are marked nonextensible; and finally, since we wanted to restrict the Component object to be created only as the result of our hit test routine, we marked that interface noncreatable.
[
object,
uuid(B75E7D8D-1B05-11D2-A475-00A0245146B9),
dual,
hidden,
nonextensible,
helpstring("IPerlSheet Interface"),
pointer_default(unique)
]
interface IPerlSheet : IDispatch
{
[id(1), helpstring("Hit test")] HRESULT HitTest([in] short x, [in] short y, [out,retval] IComponent ** component);
}; |
The IDL definition for the PerlSheet interface is shown in Figure 6, and note the result value for the hit test function, a pointer to pointer to the component object interface: although the function itself is declared to have an HRESULT return value, the function's [out, retval] parameter appears as the result within VB while the HRESULT is used to trigger VB error processing. We have made all of the integer parameters short (16 bits) so that they map directly on to VB's Integer type. (It would probably have been more efficient to handle 32-bit integers, but within VB, using the Integer type comes much more naturally than using Long.) The hit test implementation, shown in Figure 7, includes the call to Perl discussed earlier, and instantiates a Component object in which the results are placed: following the rules of COM, the object is not simply created via C++'s new operator, but instead uses COM creation, provided by the CComObject template, and the actual result of the function is the IComponent interface of the Component object (it is apparently not enough to cast the component object to an IComponent), so that VB can deal with it. We could instead have returned an IDispatch interface, but an IComponent permits VB to use early binding instead of late.
The only other interesting function in the PerlSheet class is FinalRelease, in which the Perl interpreter is unloaded. There is also a FinalConstruct function, where I initially thought of invoking the Perl load routine: however, that function is called regardless of the user mode under which the control is operating (run-time or design-time), and we did not want to instantiate the interpreter at design time. We tried using the GetAmbientUserMode function to indicate which mode was applicable, but that seems to fail when executed from FinalConstruct - maybe the ambient object is not available that early in the component's lifetime, since GetAmbientUserMode behaved exactly as expected within the other methods: we did not have time to investigate, and so simply tied the creation of the Perl interpreter to the hit test function as we knew that would only be invoked at run time.
STDMETHODIMP CPerlSheet::HitTest(short x, short y, IComponent ** component)
{
if( component == NULL )
return E_POINTER;
if( !m_Perl )
LoadPerl();
if( !m_Perl )
return E_FAIL;
CComObject< CComponent >* comp;
HRESULT hr = CComObject< CComponent >::CreateInstance( &comp );
if( !SUCCEEDED( hr ) )
return E_FAIL;
// Stack manipulation and call omitted, but as shown earlier
comp->m_NumVertices = ( stackSize - 2 ) / 2;
comp->m_Vertices = new Vertex[ comp->m_NumVertices ];
if( !comp->m_Vertices )
{
comp->Release();
return E_OUTOFMEMORY;
}
for( int i = 0; i < comp->m_NumVertices; ++i )
{
comp->m_Vertices[ i ].y = POPi;
comp->m_Vertices[ i ].x = POPi;
}
comp->m_Description = SvPVX( POPs );
comp->m_RefName = SvPVX( POPs );
PUTBACK;
FREETMPS;
LEAVE;
comp->QueryInterface( IID_IComponent, (void**)component );
return S_OK;
} |
The Component class stores the two return strings as well as a vector of return co-ordinates and merely contains access functions for these. I chose to implement them as properties with only the get accessors defined so that they would be read only. A fragment of VB code to use the control is shown in Figure 8, where ctlSheet is a PerlSheet control placed on the VB form. Since Comp is an object, it has to be assigned using Set instead of the (default) Let: omitting the Set results in a very unhelpful VB error message - 'Run-time error 91: Object variable or With block variable not set.'
Dim Comp as Component Set Comp = ctlSheet.HitTest (x, y) Debug.Print Comp.RefName, Comp.Description, Comp.NumVertices Debug.Print "First coord: ", Comp.X(1), Comp.Y(1) |
This mechanism suffers from one major flaw: if the Perl to VB interface changes, we have to change code in three languages. There is an alternative: expose the entire Perl API to VB, so that the C++ layer never needs to change. Although an interesting prospect, this sounded like far too much work for us at the moment, but the following describes something close to it...
Microsoft's Script Control lets COM aware applications invoke scripts: these scripts can be written in a number of languages, for example: the Microsoft supported VBScript and JScript, or PerlScript, such as that available from ActiveState. The interface to the scripting engine is not particularly efficient, effectively letting us execute a single statement or call one function at a time. Although the call can accept an arbitrary list of scalar parameters, it can return only a single value. More interesting is the ability to expose objects to the scripting engine, and hence to the script itself. Figure 9 shows a VB code fragment which executes a Perl script, where ctlScript is the Script Control and txtOutput is a text box, showing how Perl can access controls on a VB form.
ctlScript.Language = "PerlScript"
ctlScript.AddObject "textBox", txtOutput
ctlScript.AddCode "sub output { textBox->{text} = 'hello'; }"
ctlScript.Run "output" |
Within the Perl code, the text box can be manipulated under the name of textBox, the first argument to AddObject: properties are read and written as members of a hash table (object->{propertyName}), and in theory methods can be invoked in a similar way (object->method()). However, experimentation indicated that method invocation did not work from PerlScript. This was unfortunate, because we had thought of using an invisible list box to store the co-ordinate values from the hit test function, using the list box's AddItem method within Perl.
Since the Run function can return only a single value, we were forced to create a Perl global variable for the results of the hit test and access the fields within it via trivial Perl functions: a simplified version of the code is illustrated in Figure 10.
ctlScript.AddCode "@global = ( 0, 1, 2, 3 );"
ctlScript.AddCode "sub accessGlobal { my( $index = @_ ); $global[ $index ]; }"
Result = ctlScript.Run ("accessGlobal", 2) |
In theory, once again, we should be able to use the Script Control's Eval method to evaluate an arbitrary expression in the script language, for example, replacing the call to Run and the accessGlobal function in Figure 10 with: Result = ctlScript.Eval ("$global[2]" ). However, this did not seem to work either.
It should be mentioned that the Eval and object method invocation operations both work as expected with JScript or VBScript so either we misunderstood the rather sparse documentation or there are a few bugs in the implementation of PerlScript we were using. Please note that ActiveState do point out that their current PerlScript package is a beta.
For brevity, the code samples shown here have had the entire Perl script specified within the argument to AddCode. It is trivial to place the bulk of the code elsewhere: define the Perl code as a module and then merely specify 'use Module' to AddCode.
With either of the mechanisms for handling the transfer of data structures we were tied to global data structures again, either a VB control or a Perl structure, making the processing of multiple hit tests awkward. Even if we had had no problems with methods or Eval, this makes PerlScript less attractive than the more complex C++ buffer layer.
We like the simplicity of the Script Control interface to PerlScript: one major benefit of this is that code written in only two languages has to be changed if the interface changes, compared with three when the custom C++ layer is included. However, the custom COM object layer scores much higher for the cleanness of the interface to VB - no creating dummy VB controls (when method invocation works) or using Perl globals to fudge the return of large structures. For the short term, our project will use an intermediate C++ layer, but PerlScript remains a viable option where parameter and result passing requirements are simpler.
Advanced Perl Programming, Srinivasan, S., O'Reilly 1997, ISBN 1-56592-220-4. As the title suggests, this book covers the bits of Perl you might want to explore after you have learned how to write a few moderately sized scripts. It includes discussions on complex data structures, packages leading on to object orientation, persistence and databases, networking with sockets and remote procedure calls, Perl/Tk, invoking C from Perl and vice versa. It is a book well worth reading, though the errata sheet available on the O'Reilly site is rather long.
Professional Visual C++ 5 ActiveX COM Control Programming, Li, S. and Economopoulos, P., Wrox Press 1997, ISBN 1-861000-37-5. This book gives a very good introduction to ActiveX objects, following a theoretical discussion of COM with the development of a control from scratch, before showing how much easier it is with ATL or MFC. The book is very pro-Microsoft, including a few sections that read like marketing blurb, but does fill a lot of the gaps in the Microsoft ATL documentation.
ActiveState site - http://www.activestate.com
Microsoft scripting site - http://msdn.microsoft.com/scripting
ATL - information can be found from Microsoft via http://msdn.microsoft.com/visualc or http://www.microsoft.com/com and there is a good online article describing how to define IDL for efficient COM object access at http://msdn.microsoft.com/library/techart/msdn_vbscriptcom.htm. There are also a number of independent ATL sites, the ones I found most useful being: World Of ATL and Widgetware's FAQ, although neither has been updated for quite a while.
Last modified on 9th April 2000