What would Perl do...
I always wondered why some people write unreadable Perl. The most common reason given seems to be 'Its faster that way'.
And so... using DTrace, and the extra probes I added, I thought I'd take a look.
# dtrace -l | grep perl
85614 perl1226 libperl.so Perl_sv_free del_sv
85615 perl1226 libperl.so Perl_sv_replace del_sv
85616 perl1226 libperl.so perl_run main_enter
85617 perl1226 libperl.so perl_parse main_enter
85618 perl1226 libperl.so perl_destruct main_enter
85619 perl1226 libperl.so perl_construct main_enter
85620 perl1226 libperl.so perl_alloc main_enter
85621 perl1226 libperl.so perl_run main_exit
85622 perl1226 libperl.so perl_parse main_exit
85623 perl1226 libperl.so perl_destruct main_exit
85624 perl1226 libperl.so perl_construct main_exit
85625 perl1226 libperl.so perl_alloc main_exit
85626 perl1226 libperl.so Perl_sv_dup new_sv
85627 perl1226 libperl.so Perl_newSVrv new_sv
85628 perl1226 libperl.so Perl_newSVsv new_sv
85629 perl1226 libperl.so Perl_newRV_noinc new_sv
85630 perl1226 libperl.so Perl_newSVuv new_sv
85631 perl1226 libperl.so Perl_newSViv new_sv
85632 perl1226 libperl.so Perl_newSVnv new_sv
85633 perl1226 libperl.so Perl_vnewSVpvf new_sv
85634 perl1226 libperl.so Perl_newSVpvn_share new_sv
85635 perl1226 libperl.so Perl_newSVhek new_sv
85636 perl1226 libperl.so Perl_newSVpvn new_sv
85637 perl1226 libperl.so Perl_newSVpv new_sv
85638 perl1226 libperl.so Perl_sv_newmortal new_sv
85639 perl1226 libperl.so Perl_sv_mortalcopy new_sv
85640 perl1226 libperl.so Perl_newSV new_sv
85641 perl1226 libperl.so Perl_pp_sort sub-entry
85642 perl1226 libperl.so Perl_pp_dbstate sub-entry
85643 perl1226 libperl.so Perl_pp_entersub sub-entry
85644 perl1226 libperl.so Perl_pp_last sub-return
85645 perl1226 libperl.so Perl_pp_return sub-return
85646 perl1226 libperl.so Perl_dounwind sub-return
85647 perl1226 libperl.so Perl_pp_leavesublv sub-return
85648 perl1226 libperl.so Perl_pp_leavesub sub-return
Using these probes, we can write some 'D' that tells us what Perl is doing at each of its phases - startup, parsing, execution, and cleanup.
First off, accessing function call parameters:
Given 3 essentially identical programs
#!/usr/local/bin/perl -Tw
use strict;
my $initial = "there once was a fish. Its feet were small";
my $post = func($initial);
print "$post\n";
sub func {
$_[0] =~ s/there/There/;
return $_[0];
}
#!/usr/local/bin/perl -Tw
use strict;
my $initial = "there once was a fish. Its feet were small";
my $post = func($initial);
print "$post\n";
sub func {
my ($val) = @_;
$val =~ s/there/There/;
return $val;
}
#!/usr/local/bin/perl -Tw
use strict;
my $initial = "there once was a fish. Its feet were small";
my $post = func($initial);
print "$post\n";
sub func {
my $val = shift;
$val =~ s/there/There/;
return $val;
}
There is a myth (***) that using $_[0] is faster, as it doesn't create a temporary variable...
Dtrace shows this to be untrue:
== call1.pl ==========================================================
perl*::perl_alloc:main_enter
perl*::perl_alloc:main_exit, (0/0) (53119 nS)
perl*::perl_construct:main_enter
perl*::perl_construct:main_exit, (12/0) (564370 nS)
perl*::perl_parse:main_enter
--> BEGIN, ./call1.pl
--> bits, /usr/local/lib/perl5/5.8.8/strict.pm
<-- bits, /usr/local/lib/perl5/5.8.8/strict.pm (3/2) (48060 nS)
--> import, /usr/local/lib/perl5/5.8.8/strict.pm
<-- import, /usr/local/lib/perl5/5.8.8/strict.pm (1/0) (15398 nS)
<-- BEGIN, ./call1.pl (160/80) (1025874 nS)
perl*::perl_parse:main_exit, (299/42) (2856399 nS)
perl*::perl_run:main_enter
--> func, ./call1.pl
<-- func, ./call1.pl (1/0) (47723 nS)
perl*::perl_run:main_exit, (0/1) (265677 nS)
perl*::perl_destruct:main_enter
perl*::perl_destruct:main_exit, (0/2) (20763 nS)
total, total (0/0) (3789064 nS)
== call2.pl ==========================================================
perl*::perl_alloc:main_enter
perl*::perl_alloc:main_exit, (0/0) (53251 nS)
perl*::perl_construct:main_enter
perl*::perl_construct:main_exit, (12/0) (509684 nS)
perl*::perl_parse:main_enter
--> BEGIN, ./call2.pl
--> bits, /usr/local/lib/perl5/5.8.8/strict.pm
<-- bits, /usr/local/lib/perl5/5.8.8/strict.pm (3/2) (36748 nS)
--> import, /usr/local/lib/perl5/5.8.8/strict.pm
<-- import, /usr/local/lib/perl5/5.8.8/strict.pm (1/0) (9797 nS)
<-- BEGIN, ./call2.pl (160/80) (924250 nS)
perl*::perl_parse:main_exit, (299/38) (2545953 nS)
perl*::perl_run:main_enter
--> func, ./call2.pl
<-- func, ./call2.pl (1/0) (42165 nS)
perl*::perl_run:main_exit, (0/1) (142393 nS)
perl*::perl_destruct:main_enter
perl*::perl_destruct:main_exit, (0/2) (20851 nS)
total, total (0/0) (3301007 nS)
== call3.pl ==========================================================
perl*::perl_alloc:main_enter
perl*::perl_alloc:main_exit, (0/0) (52927 nS)
perl*::perl_construct:main_enter
perl*::perl_construct:main_exit, (12/0) (607783 nS)
perl*::perl_parse:main_enter
--> BEGIN, ./call3.pl
--> bits, /usr/local/lib/perl5/5.8.8/strict.pm
<-- bits, /usr/local/lib/perl5/5.8.8/strict.pm (3/2) (37066 nS)
--> import, /usr/local/lib/perl5/5.8.8/strict.pm
<-- import, /usr/local/lib/perl5/5.8.8/strict.pm (1/0) (10171 nS)
<-- BEGIN, ./call3.pl (160/80) (924824 nS)
perl*::perl_parse:main_exit, (297/37) (2543981 nS)
perl*::perl_run:main_enter
--> func, ./call3.pl
<-- func, ./call3.pl (1/0) (41833 nS)
perl*::perl_run:main_exit, (0/1) (140527 nS)
perl*::perl_destruct:main_enter
perl*::perl_destruct:main_exit, (0/2) (20273 nS)
total, total (0/0) (3395310 nS)
allocations / deallocations:
474 / 122 call3.pl
476 / 123 call2.pl
476 / 127 call1.pl
Counting up the number of allocations and deallocations in the (0/1) output - and
"<-- func, ./call2.pl (1/0) " is always the same... one allocation.
After all the test runs, I also print out the total allocations for the script,
and it seems that the "my $val = shift" version is the most efficient -
using two fewer allocations (apparently during the parse phase).
The deallocation count is interesting too - with "$_[0]" using 5 more deallocations during
the parse phase and "my ($val) = @_;" using one more than the "my $val = shift" option.
In an attempt to reduce the allocations doesn't seem to help - the following code resulting in 474 allocations,
shift case, but with 3 extra deallocations, again in the parsing phase. Increasing the number of times that func
is called only increases the benefits of using shift.
#!/usr/local/bin/perl -Tw
use strict;
my $initial = "there once was a fish. Its feet were small";
$_ = $initial;
my $post = func();
print "$post\n";
sub func {
s/there/There/;
return $_;
}