What would Perl do...

I always wondered why some people write unreadable Perl. The most common reason given seems to be 'Its faster that way'. And so... using DTrace, and the extra probes I added, I thought I'd take a look.
# dtrace -l | grep perl
85614   perl1226        libperl.so                      Perl_sv_free del_sv
85615   perl1226        libperl.so                   Perl_sv_replace del_sv
85616   perl1226        libperl.so                          perl_run main_enter
85617   perl1226        libperl.so                        perl_parse main_enter
85618   perl1226        libperl.so                     perl_destruct main_enter
85619   perl1226        libperl.so                    perl_construct main_enter
85620   perl1226        libperl.so                        perl_alloc main_enter
85621   perl1226        libperl.so                          perl_run main_exit
85622   perl1226        libperl.so                        perl_parse main_exit
85623   perl1226        libperl.so                     perl_destruct main_exit
85624   perl1226        libperl.so                    perl_construct main_exit
85625   perl1226        libperl.so                        perl_alloc main_exit
85626   perl1226        libperl.so                       Perl_sv_dup new_sv
85627   perl1226        libperl.so                      Perl_newSVrv new_sv
85628   perl1226        libperl.so                      Perl_newSVsv new_sv
85629   perl1226        libperl.so                  Perl_newRV_noinc new_sv
85630   perl1226        libperl.so                      Perl_newSVuv new_sv
85631   perl1226        libperl.so                      Perl_newSViv new_sv
85632   perl1226        libperl.so                      Perl_newSVnv new_sv
85633   perl1226        libperl.so                    Perl_vnewSVpvf new_sv
85634   perl1226        libperl.so               Perl_newSVpvn_share new_sv
85635   perl1226        libperl.so                     Perl_newSVhek new_sv
85636   perl1226        libperl.so                     Perl_newSVpvn new_sv
85637   perl1226        libperl.so                      Perl_newSVpv new_sv
85638   perl1226        libperl.so                 Perl_sv_newmortal new_sv
85639   perl1226        libperl.so                Perl_sv_mortalcopy new_sv
85640   perl1226        libperl.so                        Perl_newSV new_sv
85641   perl1226        libperl.so                      Perl_pp_sort sub-entry
85642   perl1226        libperl.so                   Perl_pp_dbstate sub-entry
85643   perl1226        libperl.so                  Perl_pp_entersub sub-entry
85644   perl1226        libperl.so                      Perl_pp_last sub-return
85645   perl1226        libperl.so                    Perl_pp_return sub-return
85646   perl1226        libperl.so                     Perl_dounwind sub-return
85647   perl1226        libperl.so                Perl_pp_leavesublv sub-return
85648   perl1226        libperl.so                  Perl_pp_leavesub sub-return
Using these probes, we can write some 'D' that tells us what Perl is doing at each of its phases - startup, parsing, execution, and cleanup. First off, accessing function call parameters: Given 3 essentially identical programs
#!/usr/local/bin/perl -Tw

use strict;

my $initial = "there once was a fish. Its feet were small";
my $post = func($initial);
print "$post\n";

sub func {
    $_[0] =~ s/there/There/;
    return $_[0];
}
#!/usr/local/bin/perl -Tw

use strict;

my $initial = "there once was a fish. Its feet were small";
my $post = func($initial);
print "$post\n";

sub func {
    my ($val) = @_;
    $val =~ s/there/There/;
    return $val;
}
#!/usr/local/bin/perl -Tw

use strict;

my $initial = "there once was a fish. Its feet were small";
my $post = func($initial);
print "$post\n";

sub func {
    my $val = shift;
    $val =~ s/there/There/;
    return $val;
}
There is a myth (***) that using $_[0] is faster, as it doesn't create a temporary variable... Dtrace shows this to be untrue:
== call1.pl ==========================================================
  perl*::perl_alloc:main_enter
  perl*::perl_alloc:main_exit,  (0/0) (53119 nS)
  perl*::perl_construct:main_enter
  perl*::perl_construct:main_exit,  (12/0) (564370 nS)
  perl*::perl_parse:main_enter
   --> BEGIN, ./call1.pl
    --> bits, /usr/local/lib/perl5/5.8.8/strict.pm
    <-- bits, /usr/local/lib/perl5/5.8.8/strict.pm (3/2) (48060 nS)
    --> import, /usr/local/lib/perl5/5.8.8/strict.pm
    <-- import, /usr/local/lib/perl5/5.8.8/strict.pm (1/0) (15398 nS)
   <-- BEGIN, ./call1.pl (160/80) (1025874 nS)
  perl*::perl_parse:main_exit,  (299/42) (2856399 nS)
  perl*::perl_run:main_enter
   --> func, ./call1.pl
   <-- func, ./call1.pl (1/0) (47723 nS)
  perl*::perl_run:main_exit,  (0/1) (265677 nS)
  perl*::perl_destruct:main_enter
  perl*::perl_destruct:main_exit,  (0/2) (20763 nS)
total, total (0/0) (3789064 nS)
== call2.pl ==========================================================
  perl*::perl_alloc:main_enter
  perl*::perl_alloc:main_exit,  (0/0) (53251 nS)
  perl*::perl_construct:main_enter
  perl*::perl_construct:main_exit,  (12/0) (509684 nS)
  perl*::perl_parse:main_enter
   --> BEGIN, ./call2.pl
    --> bits, /usr/local/lib/perl5/5.8.8/strict.pm
    <-- bits, /usr/local/lib/perl5/5.8.8/strict.pm (3/2) (36748 nS)
    --> import, /usr/local/lib/perl5/5.8.8/strict.pm
    <-- import, /usr/local/lib/perl5/5.8.8/strict.pm (1/0) (9797 nS)
   <-- BEGIN, ./call2.pl (160/80) (924250 nS)
  perl*::perl_parse:main_exit,  (299/38) (2545953 nS)
  perl*::perl_run:main_enter
   --> func, ./call2.pl
   <-- func, ./call2.pl (1/0) (42165 nS)
  perl*::perl_run:main_exit,  (0/1) (142393 nS)
  perl*::perl_destruct:main_enter
  perl*::perl_destruct:main_exit,  (0/2) (20851 nS)
total, total (0/0) (3301007 nS)
== call3.pl ==========================================================
  perl*::perl_alloc:main_enter
  perl*::perl_alloc:main_exit,  (0/0) (52927 nS)
  perl*::perl_construct:main_enter
  perl*::perl_construct:main_exit,  (12/0) (607783 nS)
  perl*::perl_parse:main_enter
   --> BEGIN, ./call3.pl
    --> bits, /usr/local/lib/perl5/5.8.8/strict.pm
    <-- bits, /usr/local/lib/perl5/5.8.8/strict.pm (3/2) (37066 nS)
    --> import, /usr/local/lib/perl5/5.8.8/strict.pm
    <-- import, /usr/local/lib/perl5/5.8.8/strict.pm (1/0) (10171 nS)
   <-- BEGIN, ./call3.pl (160/80) (924824 nS)
  perl*::perl_parse:main_exit,  (297/37) (2543981 nS)
  perl*::perl_run:main_enter
   --> func, ./call3.pl
   <-- func, ./call3.pl (1/0) (41833 nS)
  perl*::perl_run:main_exit,  (0/1) (140527 nS)
  perl*::perl_destruct:main_enter
  perl*::perl_destruct:main_exit,  (0/2) (20273 nS)
total, total (0/0) (3395310 nS)

allocations / deallocations:
     474 /      122 call3.pl            
     476 /      123 call2.pl            
     476 /      127 call1.pl            
Counting up the number of allocations and deallocations in the (0/1) output - and "<-- func, ./call2.pl (1/0) " is always the same... one allocation.
After all the test runs, I also print out the total allocations for the script, and it seems that the "my $val = shift" version is the most efficient - using two fewer allocations (apparently during the parse phase).
The deallocation count is interesting too - with "$_[0]" using 5 more deallocations during the parse phase and "my ($val) = @_;" using one more than the "my $val = shift" option.
In an attempt to reduce the allocations doesn't seem to help - the following code resulting in 474 allocations, shift case, but with 3 extra deallocations, again in the parsing phase. Increasing the number of times that func is called only increases the benefits of using shift.
#!/usr/local/bin/perl -Tw

use strict;

my $initial = "there once was a fish. Its feet were small";
$_ = $initial;
my $post = func();
print "$post\n";

sub func {
    s/there/There/;
    return $_;
}