Dean::Util - Utilities created by Dean Serenevy



NAME

Dean::Util - Utilities created by Dean Serenevy


SYNOPSIS

 use Dean::Util qw/map_pair nsign min_max/;
 ...

Then later, to remove dependance on Dean::Util

 perl -MDean::Util -we insert_Dean_Util_functions The/Module.pm


DESCRIPTION

This is a set of utility functions that I find myself rewriting frequently. Normally, putting functions into a module introduces a dependency on that module which can be a hassle in some situations. This is a ``smart'' module which is capable of replacing the use Dean::Util... line with the code for the requested functions. Thus, machines that have Dean::Util installed can use it as a module, but when requested, a (Dean::Util) dependency-free version of the file may be made.


EXPORTED FUNCTIONS

:utility - Using Dean::Util

list_Dean_Util_functions

This function prints a column-formatted list of the functions included in the Dean::Util package.

check_Dean_Util_functions

This function attempts to verify that the Dean/Util.pm is properly structured. This function is intended to be run only by people who make changes to the Dean/Util.pm file to check that their code is properly formatted for the module to parse.

get_Dean_Util_code

Returns a hash ref with an entry of the following type for each function and variable defined in Dean::Util.

 name => { code    => '...',
           pod     => '...',
           depends => [ 'thing 1', 'thing 2', ... ]
         }

Some additional information may be included in each sub-hash for debugging purposes or internal use.

insert_Dean_Util_functions

Replaces all occurances of ``use Dean::Util ...;'' (``...'' is everything up to first semi-colon, so don't use qw; ;) with the actual source code of the functions requested from Dean::Util. The original files are saved to a backup file which is just the original filename with a ~ appended. The list of files to modify is either included as a list of arguments or is read from @ARGV.

As in the function get_Dean_Util_function_string, the special symbols INCLUDE_POD and POD_ONLY may be used to indicate that all further inclusions (restricted to each individual ``use'' block) should include their POD documentation before the code, or exclude the code and only output the POD documentation. Example:

 use Dean::Util qw/max min INCLUDE_POD join_multi map_pair/;
 use Dean::Util qw/is_num is_int/;
 # ... later, possibly even after __END__
 use Dean::Util qw/POD_ONLY is_num is_int/;

Would include code and POD documentation for join_multi and map_pair. The code and POD documentation for is_num and is_int would be inserted separately.

Note: Multiple use Dean::Util inclusions may result in multiple subroutine definitions so don't use the same function twice unless they are in different scopes.

upgrade_Dean_Util_functions

Once insert_Dean_Util_functions has been used to ``export'' a list of Dean::Util functions, this command will replace Dean::Util function blocks with more recent function versions, thus upgrading the exported script.

get_Dean_Util_function_string

Returns the source code for the functions provided as arguments. If the argument list is empty, the function list is taken from @ARGV.

The special symbols INCLUDE_POD and POD_ONLY may be used to indicate that all further inclusions should include their POD documentation before the code, or exclude the code and only output the POD documentation. Example:

 get_Dean_Util_function_string qw/max min INCLUDE_POD join_multi map_pair/;

Would include the POD documentation for only join_multi and map_pair.

 get_Dean_Util_function_string qw/POD_ONLY format_cols/;

Would return just the POD documentation for format_cols.


EXPORTABLE FUNCTIONS

:numerical - Numerical Functions

$pi

The string, pi, to 30 digits after the decimal.

$e

The string, e, to 30 digits after the decimal.

max

See also: List::Util max

Return the maximum number in a list of values. All arguments must be numeric, use max_dirty for untrusted or mixed data.

min

See also: List::Util min

Return the minimum number in a list of values. All arguments must be numeric, use min_dirty for untrusted or mixed data.

max_dirty

Return the maximum number in a list of values. This version of max should be used for untrusted data since undefined or non-numeric values are silently ignored rather than trowing errors.

min_dirty

Return the minimum number in a list of values. This version of min should be used for untrusted data since undefined or non-numeric values are silently ignored rather than trowing errors.

fmax

 fmax { block } @list
 fmax \&sub, @list

Return the maximum function value given by evaluating the given code at each element of the list. The code may be either a subroutine reference or a code block. $_ will be set to each list entry and will also be passed in as the first (and only) argument. If the code returns any undefined or non-numeric values, perl will issue warnings.

fmin

 fmin { block } @list
 fmin \&sub, @list

Return the minimum function value given by evaluating the given code at each element of the list. The code may be either a subroutine reference or a code block. $_ will be set to each list entry and will also be passed in as the first (and only) argument. If the code returns any undefined or non-numeric values, perl will issue warnings.

fmax_dirty

 fmax_dirty { block } @list
 fmax_dirty \&sub, @list

Return the maximum function value given by evaluating the given code at each element of the list. The code may be either a subroutine reference or a code block. $_ will be set to each list entry and will also be passed in as the first (and only) argument. If the code returns any undefined or non-numeric values, they will be ignored.

fmin_dirty

 fmin_dirty { block } @list
 fmin_dirty \&sub, @list

Return the minimum function value given by evaluating the given code at each element of the list. The code may be either a subroutine reference or a code block. $_ will be set to each list entry and will also be passed in as the first (and only) argument. If the code returns any undefined or non-numeric values, they will be ignored.

minimizer

 minimizer { block } @list
 minimizer \&sub, @list

Return the item of @list which yields the minimum value when evaluated by the given code. The code may be provided either as a subroutine reference or a code block. $_ will be set to each list entry and will also be passed in as the first (and only) argument. If the code returns any undefined or non-numeric values, perl will issue warnings.

maximizer

 maximizer { block } @list
 maximizer \&sub, @list

Return the item of @list which yields the maximum value when evaluated by the given code. The code may be provided either as a subroutine reference or a code block. $_ will be set to each list entry and will also be passed in as the first (and only) argument. If the code returns any undefined or non-numeric values, perl will issue warnings.

minimizer_dirty

 minimizer_dirty { block } @list
 minimizer_dirty \&sub, @list

Return the item of @list which yields the minimum value when evaluated by the code. code may be either a subroutine reference or a code block. $_ will be set to each list entry and will also be passed in as the first (and only) argument. If the code returns any undefined or non-numeric values, they will be ignored and the corresponding list item will not be considered as a minimizer.

Note however that no filtering is performed on @list so undefined values will be passed to the subroutine as a normal element.

maximizer_dirty

 maximizer_dirty { block } @list
 maximizer_dirty \&sub, @list

Return the item of @list which yields the maximum value when evaluated by the code. code may be either a subroutine reference or a code block. $_ will be set to each list entry and will also be passed in as the first (and only) argument. If the code returns any undefined or non-numeric values, they will be ignored and the corresponding list item will not be considered as a minimizer.

Note however that no filtering is performed on @list so undefined values will be passed to the subroutine as a normal element.

ceil($)

If the argument is numeric, then returns the smallest integer which is greater than or equal to the given argument. Otherwise this function will spew warnings.

ceil_dirty($)

If the argument is numeric, then returns the smallest integer which is greater than or equal to the given argument. Otherwise this function will return undef.

floor($)

If the argument is numeric, then returns the largest integer which is less than or equal to the given argument. Otherwise this function spwes warnings.

floor_dirty($)

If the argument is numeric, then returns the largest integer which is less than or equal to the given argument. Otherwise this function returns undef.

sum

See also: List::Util sum

Returns the sum of all numeric entries in a list. Undefined/non-numeric values cause warnings.

product

See also: List::Util reduce

Returns the product of all numeric entries in a list. Undefined/non-numeric values cause warnings.

average

Returns the average over all entries in a list. Undefined or non-numeric entries will spew warnings.

sum_dirty

Returns the sum of all numeric entries in a list. Undefined/non-numeric values are ignored.

product_dirty

Returns the product of all numeric entries in a list. Undefined/non-numeric values are ignored.

average_dirty

Returns the average over all entries in a list. Undefined or non-numeric entries contribute a 0 to the average.

min_max

Returns a pair ($m, $M) which is the minimum and maximum numbers, respectively, in a list of values without looping over the list twice. Undefined or non-numeric values will cause warnings.

max_min

Returns a pair ($M, $m) which is the maximum and minimum numbers, respectively, in a list of values without looping over the list twice. Undefined or non-numeric values will cause warnings.

min_max_dirty

Returns a pair ($m, $M) which is the minimum and maximum numbers, respectively, in a list of values without looping over the list twice. Undefined or non-numeric values are silently ignored.

max_min_dirty

Returns a pair ($M, $m) which is the maximum and minimum numbers, respectively, in a list of values without looping over the list twice. Undefined or non-numeric values are silently ignored.

sieve_of_eratosthenes

 my $sieve = sieve_of_eratosthenes( $n );
 sieve_of_eratosthenes( $m, $sieve );

Constructs a bitstring $sieve using the Sieve of Eratosthenes so that:

 vec($sieve, $n, 1) == 1   iff   $n is prime

If a sieve (or an undefined scalar) is provided as a second argument, it will be appended to.

Note: Since perl's length command deals only in bytes, this subroutine will round $n up to make sure that $sieve is correct to a whole number of bytes. In particular, you are guaranteed to be able to trust $sieve up to $n = 8 * length($sieve) - 1.

is_prime

Determine primality. Constructs the Sieve of Eratosthenes to determine primality. The sieve is reused for each call to is_prime so scripts are encouraged to prepare the sieve by calling is_prime on a large number before making multiple calls to is_prime.

 # SLOW: takes 21.89 seconds
 @primes = grep is_prime($_), 1..400000;
 # FAST: takes 1.387 seconds
 @primes = reverse grep is_prime($_), reverse 1..400000;

This function may take some shortcuts if it can so if you want to prepare the sieve append the option ``force_sieve'',

 # SLOW:
 is_prime( 400000 ); # this test shortcuts since 400000 is even
 @primes = grep is_prime($_), 1..400000;
 # FAST:
 is_prime( 400000, force_sieve => 1 );
 @primes = grep is_prime($_), 1..400000;

next_prime

 my $m = next_prime( $n )

Compute the next prime integer larger than $n.

base_hash

Given a base, this function returns a hash which may be used in future calls to the other base functions.

A base is described by:

 integer <= 36 (0-9 a-z)
 array ref     (list of symbols, length == base, index i == i, yes you get to define zero)
 string        (string of symbols, shortcut for [split //, $str]
 hash ref      (the output of a previous call to base_hash, this is silly in this case)

base2base

 base2base( string, base, base )

String may be decimal. The following symbols are tried (in order) to be used as the punctuation between the integer and fraction part of the number:

 . , : ; _ | / \ - + ' ` "

Bases are described by:

 integer <= 36 (0-9 a-z)
 array ref     (list of symbols, length == base, index i == i, yes you get to define zero)
 string        (string of symbols, shortcut for [split //, $str]
 hash ref      (the output of base_hash)

base2integer

 base2integer( string, base )

Convert a string to another base. The string may not be a decimal.

Base is described by:

 integer <= 36 (0-9 a-z)
 array ref     (list of symbols, length == base, index i == i, yes you get to define zero)
 string        (string of symbols, shortcut for [split //, $str]
 hash ref      (the output of base_hash or symbol => value pairs)

base2decimal

 base2decimal( string, base )

String may be decimal. The following symbols are tried (in order) to be used as the punctuation between the integer and fraction part of the number:

 . , : ; _ | / \ - + ' ` "

Base is described by:

 integer <= 36 (0-9 a-z)
 array ref     (list of symbols, length == base, index i == i, yes you get to define zero)
 string        (string of symbols, shortcut for [split //, $str]
 hash ref      (the output of base_hash)

decimal2base

 decimal2base( string, base )

String may be decimal. The following symbols are tried (in order) to be used as the punctuation between the integer and fraction part of the number:

 . , : ; _ | / \ - + ' ` "

Base is described by:

 integer <= 36 (0-9 a-z)
 array ref     (list of symbols, length == base, index i == i, yes you get to define zero)
 string        (string of symbols, shortcut for [split //, $str]
 hash ref      (the output of base_hash)

factorial

 factorial( $n )

Returns $n! if $n is a non-negative integer.

:stat_prob - Statistical / Probability

prob_model_invariants

 prob_model_invariants( \%model, %options )

The model is a hash with keys the outcomes and values the corresponding probabilities. At most one of the probabilities may be undefined in which case it will be computed automatically (as $1 - \sum p_i$) and added to your passed probability model.

roll_dice

Roll n dice (default 1) and return the results. In scalar context, only the sum is returned. In list context, the individual rolls are returned as well as the final sum of the values (the sum is returned in the last position).

randomize

See also: List::Util shuffle

Randomize a list of values. Essentially the Fisher-Yates shuffle code from perlfaq4 (``How do I shuffle an array randomly?''). If the array is passed by reference then it will be altered, otherwise a copy is made. Returns a new list or a reference to a list depending on context.

one_var

 one_var( @data );
 one_var( \@data );
 one_var( \@data, $sorted );

Returns a hash (or hash reference if called in scalar context) of one-variable statistics on the input data. If the $sorted parameter is not defined then the data is assumed to be not sorted and the subroutine will make its own sorted copy of the data. If the $sorted parameter is defined but false, then the subroutine will sort @data in place (@data will be altered). If the $sorted parameter is true then the data will be assumed to be already sorted. The returned hash will have the following keys:

average
mean
x-bar

The average value of the data

sum
sum x

The summation of the data

sum_sq
sum x^2

The sum of the squares of the data

Svar
sample_variance

The sample variance, 1/n-1 * sum (x_i - average)^2

Sx
sample_standard_deviation

The sample standard deviation, sqrt( Svar )

variance
sigma_sq

The population variance, E( (X - E(X))^2 )

sigma
standard_deviation

The population standard deviation, sqrt( variance )

n

The number of measurements in the sample

min

The smallest data element

max

The smallest data element

Q1

The first quartile computed using broken ``Basic Math Course Method''.

Q2
med
median

The sample median

Q3

The third quartile computed using broken ``Basic Math Course Method''.

char:sum
char:Sigma
char:sigma

The corresponding Unicode characters: ``\x{2211}'', ``\x{03A3}'', ``\x{03C3}''. Be warned that char:sum is a different symbol than char:Sigma and that the terminal that you are writing to will need to understand UTF-8 font encodings.

Note: the list only needs to be sorted to compute the quartiles, min, median, and max values. If you are not interested in these values then you can speed up the computation by providing $sorted with a true valued (regardless of whether the data is sorted) and simply ignore those values in the output.

percentile

 percentile($p, @data)
 percentile($p, \@data)
 percentile($p, \@data, $sorted)
 percentile($p, \@data, %options)

Return the $p-th percentile using the weighted average at X_{(n+1)p} method (http://www.xycoon.com/method_2.htm) That is, the number such that approximately 100 * $p of the data values are less than or equal to the given value. If an array reference is given as well as a third true value, the data will be assumed to be already sorted. The following options are available.

sorted

Boolean value indicating whether the data are sorted already. If not, they will be sorted numerically.

method

One of ``midpoint'', ``floor'', ``ceil'', or ``scaled''. This controls what to do when a percentile divider is between two entries. The default behavior is ``scaled'', the returned percentile will be an appropriate linear combination of the neighboring entries. The ``midpoint'' method always returns the midpoint of the neighboring entries. Finally, the ``floor'' and ``ceil'' methods always return the lower or higher neighbor respectively.

The ``method'' also affects the return value when return => "index" is enabled.

return

Either ``value'' or ``index''. Affects whether we return the actual percentile value, or simply its index in the array.

correlation

 my $r = correlation( \@X, \@Y );
 my %I = correlation( \@X, \@Y );
 my $r = correlation( \@X, \@Y, %options );

Pearson product-moment correlation coefficient.

one_var_x
one_var_y

The result hash from one_var()

sd_x
sd_y
mean_x
mean_y

The sample standard deviation and mean of x and y.

permutations

 permutations( $n );
 permutations( @list );  # 1 < @list !!
 permutations( \@list );

Return a list of all permutations of the given input list.

Note: This subroutine is slow and inefficient. If you want to use this for any real purpose then you should consider using Algorithm::Permute or Algorithm::FastPermute from cpan.

k_arrangements

 k_arrangements( \@list, $k );
 k_arrangements( $n, $k );

Return a list of all arrangements (sub-permutations) of the given input list of length $k. If $n and $k are both integers, then simply the number of $k arrangements is returned.

Note: This subroutine is slow and inefficient. If you want to use this for any real purpose then you should consider looking up an XS module on CPAN.

arrangements

 arrangements( $n );
 arrangements( \@list );
 arrangements( \@list, $k );
 arrangements( $n, $k );
 arrangements( @list );  # @list > 2 !!!

Return a list of all arrangements (sub-permutations) of the given input list (regardless of length). If the list is provided as a reference and an integer $k is provided then the results will be restrictetd to length $k as in the k_arrangements subroutine.

Note: This subroutine is slow and inefficient. If you want to use this for any real purpose then you should consider looking up an XS module on CPAN.

k_combinations

 k_combinations( \@list, $k );
 k_combinations( $n, $k );

Return a list of all combinations of the given input list of length $k.

Note: This subroutine is slow and inefficient. If you want to use this for any real purpose then you should consider looking up an XS module on CPAN.

combinations

 combinations( $n );
 combinations( \@list );
 combinations( \@list, $k );
 combinations( $n, $k );
 combinations( @list );  # @list > 2 !!!

Return a list of all combinations of the given input list (regardless of length). If the list is provided as a reference and an integer $k is provided then the results will be restrictetd to length $k as in the k_combinations subroutine.

Note: This subroutine is slow and inefficient. If you want to use this for any real purpose then you should consider looking up an XS module on CPAN.

npdf

 npdf $x
 npdf $x, $mu
 npdf $x, $mu, $sigma

Compute the probability P( X = $x ) assuming a normal distribution with mean $mu and standard deviation $sigma. $mu and $sigma are assumed to be 0 and 1 respectively if they are missing. $sigma must be positive.

ncdf

 ncdf $x
 ncdf $x, $mu
 ncdf $x, $mu, $sigma

Compute the probability P( X <= $x ) assuming a normal distribution with mean $mu and standard deviation $sigma. $mu and $sigma are assumed to be 0 and 1 respectively if they are missing. $sigma must be positive.

:math - Mathematical Functions

dotprod(\@\@)

 my $d = dotprod @x, @y;
 my $d = &dotprod(\@x, [1,2,3]);

Compute the dot product of two vectors

modular_inverse

 $inverse = modular_inverse( $x, $m );

Compute the inverse of $x in the group Z_m. The inverse will be within the set [0..$m-1].

Note: $x must be relatively prime to $m.

gcd

Compute the Greatest Common Divisor of a list of integers using the Euclidean algorithm. Negative numbers are treated as positives by this routine.

extended_euclidean_algorithm

 ($alphs, $beta, $d) = extended_euclidean_algorithm($a, $b)

For a pair of integers, a and b, perform the extended Euclidean algorithm to compute alpha, beta, and d such that:

 d = alpha * a  +  beta * b

In particular, if d = 1 then alpha = a^-1 mod b.

frac

 my ($N, $D) = frac( $dec )

Convert a decimal to a fraction. Returns undef if number is not rationalizable (must have repeating decimals).

ndiff(&;@)

 my $df = ndiff \&f;
 my $df = ndiff \&f, $x;

Perform numerical differentiation using the central difference formula.

 f'(a) \approx ( f(a+h) - f(a-h) ) / (2h)

If M \approx f(a) \approx f''(c) for all c \in [a-h, a+h], then the total error (truncation plus round-off) is on the order of:

 error = M * (h^2/6 + eps/h)

where eps is the machine epsilon (eps = 2E-16 on 32-bit perl; (1 + 2E-16 != 1), however (1 + (2E-16)/2 == 1) ). Thus, error is minimized when h \approx \sqrt[3]{eps}. We choose h = 2**(-20) = 0.00000095367431640625.

Examples:

 sub f { $_[0]**2 }
 my $df = ndiff \&f;
 printf "%.5f  |  %.5f\n", f($_), $df->($_) for 0..10;
 say "f'(3) = ", ndiff(\&f, 3);
 $df = ndiff { $_ ** 2 };

Nintegrate

 Nintegrate { block } $a, $b, $n
 Nintegrate \&sub, $a, $b, $n

Integrate a function between two values using a composite Simpson's rule. The last argument $n is optional and specifies the number of intervals to divide the region into. The default is 1000.

The function is assumed to be continuous with continuous derivatives up to order 4. $n should be even, but we adjust it if it is not. The error is given by,

             5
        (b-a)     (4)
 err = --------  f  ( x )
             4
        180 n

for some x in the interval (a,b).

interpolating_function

 interpolating_function \%function, $message, $nowarn

Returns a perl subroutine which interpolates %function linearly using interpolate. $message is an optional message that will be used if an input value is given which is out of range of the interpolator.

interpolate

 interpolate $x, \%function, \@keys, $message, $nowarn

Perform an interpolation of the provided function at the point $x. The keys of the function need not be evenly spaced, the value is approximated linearly. The last two parameters are optional, @keys is a sorted list of the keys of the function and $message is used in the error message that is printed if $x is out of range of the interpolator.

continuous_compounding

 continuous_compounding P => $P, r => $r, t => $t;
 continuous_compounding A => $A, P => $P, r => $r, t => $t, solve_for => $q;

Given any three of ``A'' (Accumulated balance), ``P'' (Principal balance), ``r'' (interest Rate), and ``t'' (Time to withdrawal), this function will return the fourth. If all four values are provided (presumedly one of them will be undefined or contain garbage) then you must provide a ``solve_for'' key which points to one of ``A'', ``P'', ``r'', or ``t''. All values are case insensitive.

discrete_compounding

 discrete_compounding P => $P, r => $r, t => $t, n => $n;
 discrete_compounding A => $A, P => $P, r => $r, t => $t, n => $n, solve_for => $q;

Given ``n'' (Number of compoundings per year) and any three of ``A'' (Accumulated balance), ``P'' (Principal balance), ``r'' (interest Rate), and ``t'' (Time to withdrawal), this function will return the fourth. If all five values are provided (presumedly one of them will be undefined or contain garbage) then you must provide a ``solve_for'' key which points to one of ``A'', ``P'', ``r'', or ``t''. All values are case insensitive.

savings_plan

 savings_plan pmt => $pmt, r => $r, t => $t, n => $n;
 savings_plan A => $A, pmt => $pmt, r => $r, t => $t, n => $n, solve_for => $q;

Given ``n'' (Number of deposits per year), ``r'' (interest Rate), and any two of ``A'' (Accumulated balance), ``pmt'' (Payment amount), and ``t'' (Time to withdrawal), this function will return the third. If all five values are provided (presumedly one of them will be undefined or contain garbage) then you must provide a ``solve_for'' key which points to one of ``A'', ``pmt'', ``r'', or ``t''. All values are case insensitive.

loan_payment

 loan_payment pmt => $pmt, r => $r, t => $t, n => $n;
 loan_payment L => $L, pmt => $pmt, r => $r, t => $t, n => $n, solve_for => $q;

Given ``n'' (Number of deposits per year), ``r'' (interest Rate), and any two of ``L'' (Loan amount), ``pmt'' (Payment amount), and ``t'' (Time to full payback), this function will return the third. If all five values are provided (presumedly one of them will be undefined or contain garbage) then you must provide a ``solve_for'' key which points to one of ``A'', ``pmt'', ``r'', or ``t''. All values are case insensitive.

union

 union( $L1, $L2, ... )

Return the list of (string) elements which appear in any of the given arrays. Objects are stringified, and the string values are returned. This may be upgraded to be smarter someday.

intersection

 intersection( $L1, $L2, ... )

Return the list of (string) elements which appear in all of the given arrays. Objects are stringified, and the string values are returned. This may be upgraded to be smarter someday.

difference

 difference( $L1, $L2, ... )

Return the list of (string) elements which appear in $L1 but not in any of the subsequent arrays. Objects are stringified, and the string values are returned. This may be upgraded to be smarter someday.

:list - List Utilities

binary_search(&@)

 binary_search { $_ > 4 } @sorted_nums;
 binary_search \&f, @sorted_nums;

Implements a binary search. Second argument must be an array (not a list) and must be sorted. Returns the index of the first element for which the function &f returns true. Returns undef if there is no such element.

Function must return true for all elements larger than desired element. To search for a particular element, the following must be done:

 my $i = binary_search { $_ >= 4 } @sorted_nums;
 $i = undef unless $sorted_nums[$i] == 4;

natural_sort

A ``fast, flexible, stable sort'' that sorts strings naturally (that is, numerical substrings are compared as numbers).

Code lifted from tye on perlmonks: http://www.perlmonks.org/?node_id=442285

Limitations: http://www.perlmonks.org/?node_id=483466

  It doesn't "properly" sort negative numbers, non-fixed decimal values,
  nor integers larger than 2^32-1.

natural_cmp

A fast, flexible, stable comparator that sorts strings naturally (that is, numerical substrings are compared as numbers).

Code lifted from tye on perlmonks: http://www.perlmonks.org/?node_id=442285

Limitations: http://www.perlmonks.org/?node_id=483466

  It doesn't "properly" sort negative numbers, non-fixed decimal values,
  nor integers larger than 2^32-1.

cartesian

 cartesian \@list1, \@list2, ...
 cartesian $n1, $n2, ...

Form the cartesian product of the elements in the lists. That is, all lists of the form [ $e1, $e2, ... ] where $e1 comes from @list1, and so on. This function returns an array reference in scalar context, and a list in list context.

In the second form, the lists [1..$n1], [1..$n2], ... will be constructed, and the cartesian product of those lists will be computed. Note however, that the two forms can not be combined, you must either provide only arrays or only numbers.

transposed

 transposed \@LoL

Transpose the (possibly non-regular) list of lists @LoL. Returns a new list reference containing the objects in @LoL.

flatten

 flatten @LoLoLoL

Will recursively run through each element of the input list and will return all components as a single large list. Lists may be arbitrarily nested and any objects which are not perl ARRAY's will be considered plain elements. The expansion is done depth-first. Returns a reference in scalar context, and the list of elements in list context.

Example:

 @y = flatten [1, 2, 3], [4, 5], [[6, 7], 8, 9];
 say "Hooray!" if "@y" eq "1 2 3 4 5 6 7 8 9";

find_index

 find_index \&f, \@array
 find_index { BLOCK } \@array
 find_index { BLOCK } \@array, $start, $stop, $step

May be called with either a function or a block as the first argument. The function will then begin at $start (or zero) and then step by $step (or 1) until we reach $stop (or the end of the array).

$_ will be set to the current array entry which will also be passed to the function as its only argument. Thus you may use either $_ or $_[0] within your function.

$start may be greater then $stop in which case we will proceed backwards. In all cases the sign of $d will be adjusted if necessary so that we finish in finite time.

find_index_with_memory

 find_index_with_memory \&f, \@array
 find_index_with_memory { BLOCK } \@array
 find_index_with_memory { BLOCK } \@array, $start, $stop, $step

May be called with either a function or a block as the first argument. The function will then begin at $start (or zero) and then step by $step (or 1) until we reach $stop (or the end of the array).

The function will set the caller's $a to the previous array entry and $b to the current array entry and will also pass the two entries to the function as its only arguments. Thus you may use either $a, $b or $_[0], $_[1] as the previous and current entries respectively.

$start may be greater then $stop in which case we will proceed backwards. In all cases the sign of $d will be adjusted if necessary so that we finish in finite time.

first

See also: List::Util first

 first \&sub, @list         # if @list is not list of arrays
 first { block }  @list     # if @list is not list of arrays
 first { block } \@list
 first { block } \@list, $start_pos

Return the first item of @list for which the code returns true. Code may be either a subroutine reference or a code block. $_ will be set to each list entry and will also be passed in as the first (and only) argument. You may pass @list by reference (which means that you must pass it by reference if it contains an array reference in its first entry). If you pass @list by reference and provide a third argument, then the tird argument will be taken to be the first position that should be checked.

first_pos

See also: List::MoreUtils first_index

 first_pos \&sub, @list
 first_pos { block } @list
 first_pos { block } \@list, $start_pos

Return the index of the first item of @list for which the code returns true. Code may be either a subroutine reference or a code block. $_ will be set to each list entry and will also be passed in as the first (and only) argument. You may pass @list by reference (which means that you must pass it by reference if it contains an array reference in its first entry). If you pass @list by reference and provide a third argument, then the tird argument will be taken to be the first position that should be checked. In this case the returned index will still correspond correctly to a position in @list.

bucketize

 my %buckets = bucketize { block } @list;
 my %buckets = bucketize \&tagger, @list;
 my $buckets = bucketize \&tagger, @list;

Partition items into buckets given a generic tagger. Returns hash ref in scalar context. Tagger should accept a single argument (or use $_) and should return a tag indicating the bucket to place the item in. Function is called in list context so that the following works as expected:

 %by_file_type = bucketize { /\.([^\.]+)$/ } @images;

Also note that values are given as bound aliases, so they can also be ``cleverly'' modified:

 # ("foo-bar", "foo-baz", "bip-bop")
 #  becomes: ( foo => ["bar","baz"], bip => ["bop"] )
 my %buckets = bucketize { s/^([^-]+)-//; $1 } @x;

partition

See also: List::MoreUtils part

 ($true, $false) = partition { block } @list
 ($true, $false) = partition \&test_func, @list

Partitions a list into two lists based on the truth value of a subroutine or block. The return value is two array references, the first of which is the elements of the original list for which the function returned true, and the second are those elements for which the function returned false.

even_positions

 @list_2 = even_positions @list_1;
 @list_2 = even_positions \@list_1;

Returns the elements of the list that have even indices. Argument may be list or arrayref, always returns a list of values.

odd_positions

 @list_2 = odd_positions @list_1;
 @list_2 = odd_positions \@list_1;

Returns the elements of the list that have even indices. Argument may be list or arrayref, always returns a list of values.

suggestion_sort

 suggestion_sort \@list, \@preferred

Returns @list sorted by the order of the objects in @preferred. All elements are matched as strings and elements of @list that are not in @preferred are placed at the end of the resulting list in a way that preserves their original ordering within @list.

Notes: Undefined entries will be ignored. Only the first appearence of an element in the @preferred list will be considered. Repetions in @list will be reduced to a single occurrence.

unique

See also: List::MoreUtils uniq

 my @u = unique @list;
 my @u = unique \@list;
 my $h = unique @list;
 my $h = unique \@list;

Takes a list (or reference to an array) and returns a list of unique (up to stringification) objects in apparently random order. In scalar context, a histogram (hash with objects as keys, and counts as values) is returned.

Note: List::MoreUtils::uniq preserves the original order of the elements.

lex_sort

 lex_sort @list_of_lists
 lex_sort sub{  }, @list_of_lists

Sort the lists lexicographically element-wise. The sorting subroutine may use the package variables $a and $b or may take two arguments, but need only worry about element-wise comparison.

Example:

 lex_sort( [qw/abc ac a/], [qw/abc ab c d/], [qw/x y z/], [qw/abc ab c/] )
 # gives:
 #  ( [qw/abc ab c/],
 #    [qw/abc ab c d/],
 #    [qw/abc ac a/],
 #    [qw/x y z/]
 #  )

Similarly with numerical data using: sub{ $a <=> $b }

:patterns - Tests and Patterns

$_re_int

Pattern which matches an integer expression. Beware, this pattern allows whitespace in the string which perl may not allow when interpreting strings as numbers. You may need to remove all whitespace from strings which match this pattern.

$_re_num

Pattern which matches an floating-point expression. Beware, this pattern allows whitespace in the string which perl may not allow when interpreting strings as numbers. You may need to remove all whitespace from strings which match this pattern.

$_re_exp

Pattern which matches an exponent part (Ex: 2.3 e -10) of a floating-point expression. Beware, this pattern allows whitespace in the string which perl may not allow when interpreting strings as numbers. You may need to remove all whitespace from strings which match this pattern.

$_re_wrd

Pattern which matches safe ``word-like'' data. This pattern does not match whitespace and most punctuation but does allow hyphens ``-'' and underscores.

is_int

Returns a true value if the argument looks like an integer expression. If no argument is provided, $_ is examined. Beware, this subroutine allows whitespace in the string which perl may not allow when interpreting strings as numbers. You may need to remove all whitespace from strings for which this subroutine returns true.

is_num

Returns a true value if the argument looks like a floating-point (or integer) expression. If no argument is provided, $_ is examined. Beware, this subroutine allows whitespace in the string which perl may not allow when interpreting strings as numbers. You may need to remove all whitespace from strings for which this subroutine returns true.

is_float

Returns a true value if the argument looks like a floating-point (or integer) expression. If no argument is provided, $_ is examined. Beware, this subroutine allows whitespace in the string which perl may not allow when interpreting strings as numbers. You may need to remove all whitespace from strings for which this subroutine returns true.

is_word

Returns a true value if the argument looks like a word. If no argument is provided, $_ is examined. Words do not have spaces and do not typically have punctuation, though hyphens ``-'' and underscores are allowed.

$_re_image_ext

Pattern which matches image-type filename extensions. The list of extensions matched (case insensitive) are:

BMP CMYK CMYKA DCM DCX DIB DPS DPX EPI EPS EPS2 EPS3 EPSF EPSI EPT FAX FITS FPX G3 GIF GIF87 GRAY ICB ICM ICO ICON IPTC JBG JBIG JP2 JPC JPEG JPG MAP MIFF MNG MONO MPC MTV MVG OTB P7 PAL PALM PBM PCD PCDS PCL PCT PCX PDB PGM PICON PICT PIX PLASMA PNG PNM PPM PSD PTIF RAS RGB RGBA RLA RLE ROSE SGI SUN SVG TGA TIF TIFF UYVY VDA VICAR VID VIFF VST WBMP X XBM XC XCF XPM XV XWD YUV

is_image_file

Returns a true value if the argument looks like an image file. If no argument is provided, $_ is examined. The ist of extensions matched (case insensitive) are:

BMP CMYK CMYKA DCM DCX DIB DPS DPX EPI EPS EPS2 EPS3 EPSF EPSI EPT FAX FITS FPX G3 GIF GIF87 GRAY ICB ICM ICO ICON IPTC JBG JBIG JP2 JPC JPEG JPG MAP MIFF MNG MONO MPC MTV MVG OTB P7 PAL PALM PBM PCD PCDS PCL PCT PCX PDB PGM PICON PICT PIX PLASMA PNG PNM PPM PSD PTIF RAS RGB RGBA RLA RLE ROSE SGI SUN SVG TGA TIF TIFF UYVY VDA VICAR VID VIFF VST WBMP X XBM XC XCF XPM XV XWD YUV

readonly

Returns true if scalar argument is readonly. (Taken from Scalar::Util.)

like_array

Returns true if the object can behave like an array. (This is just a nicer way to call UNIVERSAL::isa)

like_hash

Returns true if the object can behave like a hash. (This is just a nicer way to call UNIVERSAL::isa)

like_scalar

Returns true if the object can behave like a scalar. (This is just a nicer way to call UNIVERSAL::isa)

:parse - General Interpreters / Parsers

parse_user_agent

 my $hashref = parse_user_agent( $string );
 my %hash    = parse_user_agent( $string );

Given a user-agent string returns a hash containing the following fields. Fields which can not be determined are left undefined.

generic_os

Returns the generic operating system type: Windows, Mac, OS2, Linux, UNIX

os

Returns the specific operating system type: Windoiws Vista, Windows Server 2003, Windows XP, Windows 2000, Debian, ...

type

One of: browser, textbrowser, bot, downloader, mobile

Note: For this field, we try to make our best guess at which class the agent string fits into.

program

Quasi-canonicalized program name: Internet Explorer, Netscape, Mozilla, Firefox, wget, ...

version

Our best guess at the program version

engine

The Browser's rendering engine: Gecko, KHTML, MSHTML, Presto (opera), WebCore (apple), custom (other custom engines)

engine-version

The version of the rendering engine

user-agent

The unmodified user-agent string

obsolete

If true, the agent appears to be an obsolete web browser

str2hash

Parse a string into a hash using Text::Balanced::extract_delimited. This function recognises perl 5 style hashes as well as the basic perl 6 adverbial form. Items missing a value will set the corresponding hash value to true.

Example:

 str2hash 'foo, bar => "Hmmm, a comma", :baz<23>, :!bip, quxx => Spaces are fine'

Parses to:

 { foo => 1,
   bar => 'Hmmm, a comma',
   baz => 23,
   bip => 0,
   quxx => 'Spaces are fine',
 }

Unfortunately, the adverbial form will behave strangely with embedded commas:

 str2hash ':baz<Well, how odd>'

becomes

 { ':baz<Well' => 1,
   'how odd>'  => 1,
 }

unformat

WARNING: still quite experimental!

 unformat $fmt, @strings
 unformat \%options, @strings

Attempts to reverse the actions of sprintf or other formatted output (for instance date formats or apache logs). The return value is a list of reports (see below) unless these was only a single input string to parse in which case unformat may be safely called in scalar context.

format

The format string

as

Specify how to return the findings. By default just a list of matched components is returned however, we can also return the following reports:

hash

A hash mapping conversions (or their corresponding names, if given) to their corresponding strings. BEWARE KEY COLLISION

  { ~conv, str, ~conv, str, ... }
list

The default, the return values are each an array of strings that could have been used to generate one of the input strings.

  [ str, str, ... ]
list_list

Each return value is an array of two arrays the first of which is the list of strings returned by the ``list'' option. The second is the conversion instructions giving each corresponding string.

  [ [ str, str, ... ],  [ conv, conv, ... ] ]

Note, in this case, each list of conversions is an array reference pointing to the same array, so altering one will alter them all.

pairs

Each return value is a flat array of pairs:

  [ conv, str, conv, str, ... ]
regex

Return a regular expression that will match the given pattern. In scalar context just the list is returned. In list context the conversions will be returned also.

  ( regex, conv, conv, ... )
tuples

Each return value is an array of arrays each with two elements. First the conversion instruction and second the string that it matched.

  [ [conv, str], [conv, str], ... ]

In all ases except for the hash, the conversion instructions are the precise ones given in the format string, including any formatting options. For the hash however, the conversion are the simplified two-character labels (E.g. ``%s'' instead of ``% 35s'').

Additionally, the escape '%%' is treated as a string literal '%' and will not appear in any of the report types. A ``formatted percent'' (for instance ``%-05%'') will pass through the conversions and will appear in the reports if you define a special conversion for it (since we define no standard conversion for this case).

conversion_aliases

A hash of aliases between conversion types. Use this to map your custom conversion (for instance from the date formatting commands) to standard perl conversions. Conversions of the form ( a => "s" ) will preserve formatting options while aliaes that start with '%' ( Y => "%04d" ) will use the formatting options ``04'' rather than any options that may have appeared before the ``Y''. (Which would presumedly cause ``0035'' to parse to 35.) Conversion aliases are searched before conversions or special conversions. Once can also add aliases that include the conversion options to override other behavior ( '02Y' => '%02d', Y => 's' ).

special_conversions

A hash of conversions as in the conversions option but these conversions will be added to the list of standard conversions and will be consulted first should a standard conversion type appear in this listing.

conversions

A hash of conversions ( type => action ). Each ``type'' is simply the conversion type (E.g. the ``s'' in ``%- 10s'') and each action is a pattern that CAPTURES (preferrably non-greedily) the conversion type (for instance (s => '(.*?)')). The action could also be a subroutine which accepts two arguments. First the formatting options and second the conversion type. For instance, a sub action for the ``f'' conversion type might convert its arguments (".1", "f") into the pattern '(\d+\.\d{1})'.

Be sure that all of your conversions produce a pattern that captures exactly one substring.

Specifying this option replaces the built-in conversions which attempt to reverse the sandard perl conversions listed in the sprintf documentation.

conversion_map

If defined and a hash then the conversions in the above reports will be transformed by this hash. conversions will be first searched for in their full form (including formatting options) both with and without their leading '%', then searched for under only the converions type (both with and without the '%'). Anything not appearing in the conversion map will be treated normally as described above.

conversion_pattern
 Default: '(%([^a-zA-Z%]*)([%a-zA-Z]))'

Should capture three strings. The entire conversion pattern, any formatting options that may be present, and the conversion type. The default pattern captures single character conversions as well as the '%' escape (``%%''). See also the ``Limitations'' below.

Limitations: format conversions are assumed to be one character long. That is, conversions like ``%ld'' will be interpreted as ``%l''. This can be fixed by altering the conversion_pattern but I don't have the need to be careful about it. If you code up a more careful parser and are willing to share, feel free to send it and I will add it in.

Also, no locale information is considered. sprinf considers the ``LC_NUMERIC'' value to affect how numbers are formatted. We do not make such considerations here.

:canonicalize - Canonicalization

rtf2txt

 rtf2txt( file => $filename_or_handle )
 rtf2txt( string => $rtf_text )
 rtf2txt( $existing_file )
 rtf2txt( $rtf_text )

nicef

 nicef( $num, $digits )

Nicely formats sprintf(``%.${digits}f'', $num);

length2pt

Given a string like ``4in'' or ``2ft - 7in'', return the value as a number of points (72 points per inch). undef is returned if we can't parse the string.

Recognized units:

 pt
 in, ft, mi
 km, m, cm, mm, nm

uri_rel2abs

 my $url = uri_rel2abs( $path, $base )

Converts a path into an absolute path based at the given base unless the path is already absolute. Any file part of the base is ignored.

This subroutine is should be a proper rfc3986 uri parser as it is simply calls URI->new_abs. However, proper parsing pays a penalty in execution time. Compare the benchmarks between uri_rel2abs and uri_rel2abs_fast:

        Rate   URI  FAST
 URI   208/s    --  -93%
 FAST 3012/s 1350%    --

uri_rel2abs_fast

 my $url = uri_rel2abs_fast( $path, $base )

Converts a path into an absolute path based at the given base unless the path is already absolute. Any file part of the base is ignored.

This subroutine is not and will likely never be a reasonable implementation of a proper rfc3986 uri parser. At the moment, however, it appears to be ``good enough'' for typical web address (http, ftp, mms, ...) handling.

The uri_rel2abs function uses the URI module to properly produce an absolute uri, however at a significant speed cost.

        Rate   URI  FAST
 URI   208/s    --  -93%
 FAST 3012/s 1350%    --

glob2regexp

Constructs a regular expression pattern (string) that matches the same patterns as the given glob. The pattern matches a whole string and is anchored using ^ and $ unless the glob ends with * in which case the trailing .*$ will be removed. Keep this in mind if you wish to capture the pattern matched by the glob.

Current capabilities:

Globby chars

* match many chars; ? match one char

Escaping of globby chars

\** matches '\*Hello', \\\** matches "\\*Hello"

Grouping constructs

[abc] match a character, [^abc] don't match chars, {foo,bar} match options

Current restrictions:

The globby chars '*' and '?' may not appear within grouping constructs ('[]' and '{}').
Can't match grouping chars in groups: '[ab\]]' does not work.

str($)

Returns string form of argument (forces string context) if it is defined, otherwise returns the empty string.

replace_windows_characters

Replaces unsightly Extended Windows characters with reasonable ASCII equivalents.

 See: http://www.cs.tut.fi/~jkorpela/www/windows-chars.html

strip_space

Remove all space from the provided argument. If the argument is undefined, return the empty string.

sign($)

Returns ``+'' or ``-'' depending on the sign of the argument.

nsign($)

Returns ``'' or ``-'' depending on the sign of the argument.

canonicalize_newlines

Replace CRLF, CR, LF with the Perl magic \n. Arguments are modified in-place. If no arguments are provided then $_ is altered instead. Any undefined arguments are ignored. (though canonicalize_newlines(undef) will not alter $_).

canonicalize_newlines_copy

Replace CRLF, CR, LF with the Perl magic \n. Arguments are copied before canonicalization. If no arguments are provided then $_ is used instead. Any undefined arguments result in undefined output values.

canonicalize_timeword

Transform a reasonable (case-insensitive) abbreviations (or plural forms) of ``second'', ``minute'', ``hour'', ``day'', ``week'', ``month'', ``year'' into one of these canonical forms. Whitespace and mumerical values are allowed at the beginning of the string and will be ignored (and not included in the return value).

NOTE: minutes are preferred over months, thus ``m'' will return ``minute'' rather than ``month''.

qbash($)

Returns a string quoted for bash-like shells. The string must contin only printable characters or whitespace, otherwise the subroutine will die. The return value is an untainted string wrapped in single quotes ' that is ready (and safe) to pass to a shell.

stringify

 stringify( $thing, %options )

Stringifies Perl objects (SCALAR, HASH, or ARRAY based). Stringifies only a single object at a time, and accepts the options below. Note: CODE, GLOB, LVALUE, and Regexp references are not supported.

stringify_underlying_object

By default, overloaded stringification will be respected. Set this option to true to stringify the underlying object rather than use its overload function.

list_type

List which describes how lists are translated.

 DEFAULT: [ "[", ",", "]" ]
hash_type

List which describes how hashes are translated.

 DEFAULT: [ "{", "=>", ",", "}" ]

simple_range2list

 simple_range2list @ranges

Expand ``#,#..#,#-#,a..z,a-z,2:23,2:5:23,a:5:zz'' strings to lists. Beginning ending blocks may be anything matching [\w\.]+, though I'm not sure how well underscores will behave. Commas may separate multiple range chunks.

A plain value v (numerical or non-numerical) will produce the range 1..v or 'a'..v.

If no step size is given, The standard perl .. is used to expand the range.

Ranges with step sizes are incremented by the step size (may only be decimal valued if both start and end values are numerical) until the value exceeds the right hand value.

canonicalize_filename

 canonicalize_filename $f;
 $new = canonicalize_filename $f;
 canonicalize_filename $f, %options;

Removes anything too exotic from the file name $f. In void context, $f is modified, otherwise, $f is left unaltered and the modified file name is returned. In all cases the canonicalized name will be untainted. The following options will affect the bahavior of this subroutine. The default values are shown:

replacement => ``''

If a string value, invalid characters will be replaced with this value. If a hash reference then characters will be replaced by their corresponding values. Any values not present in the replacement hash will be replaced with the value in the 'DEFAULT' key (if present) or the empty string.

allow => 'print'

Must be one of 'print', 'basic', or a pattern matching A SINGLE legal character. The 'print' class will allow just about anything through that is not a control character including unicode characters and punctuation if your perl supports that. The 'basic' class should only allow characters that do not require escaping or quoting in a linux shell (currently allows: \w-+.~%).

allow_subdirs => 1

If true, subdirectory separators will be allowed (uses File::Spec to determine volume and directory separators for your system).

squash_duplicates => 'dwim'

If false, each invalid character will be replaced separately. If the value is 'like' then, repeated illegal values are replaced by only a single replacement value. If the value is any true value other than 'dwim' then, consecutive illegal values (even if they do not match) will be replaced with the replacement value for the first illegal character in the substring. Finally, if the value is 'dwim' then a replacement hash will cause the ``like'' behavior and a replacement string will result in ``true'' behavior.

Example:

 %replace = ( replacement => { ':' => "-", " " => "+" } );
 # 'dwim' default using replacement hash: gives "foo-+bar"
 canonicalize_filename( "foo: bar", allow => 'basic', %replace );
 # 'dwim' default using replacement string: gives "foo-bar"
 canonicalize_filename( "foo: bar", allow => 'basic', replacement => "-" );

trim

Trim leading/trailing whitespace. Trims $_ if no arguments provided. In void context, the arguments are altered, otherwise they are not changed and the trimmed values are returned.

:time - Time Management

now

Simply calls: DateTime->now(time_zone => ``local'');

This exists because I always forget how to properly get a current DateTime object.

ymd

Behaves like localtime in scalar context, but returns the date as ``YYYY-MM-DD''. Returns the components of that string in list context.

ymd_hms

Behaves like localtime in scalar context, but returns the date as ``YYYY-MM-DD HH:MM:SS''. Returns the components of that string in list context. Hours are presented in 24 hour format.

seconds2human

 seconds2human( seconds, start-unit, end-unit )

Convert an arbitrary number of seconds to a ``nice'' human-readable form. the second and third arguments are optional and specify the first and last time units presented (note specifying a start unit rounds the precision of your result to the given unit). The resulting data are separated by the value of $". Units available are: seconds, minutes, hours, days, months, and years. If the input seconds include a decimal portion, then the seconds value will be rounded to three places using the format "%.3f".

Example:

 seconds2human 99999999, 'd', 'mos.'   # gives: "38 months 17 days"
 local $" = ', ';
 seconds2human 99999999, 'm', 'hour'   # gives: "27777 hours, 46 minutes"

seconds2hms

 seconds2hms $sec
 seconds2hms $sec, $sep

Convert an arbitrary number of seconds to a ``hh:mm:ss'' string. The ``hh'' portion of the string will always be at least two digits long (but may be more if more than 99 hours are represented by given number of seconds.

seconds2time

 seconds2time $sec
 seconds2time $sec, $pad
 seconds2time $sec, %options

Convert a number of seconds (from 0 to 86400) to a ``h:mm AM/PM'' string. If a second $pad parameter is given, that symbol will be used to force the hour portion to be precisely 2 characters wide (typical values are 0 and `` ''). You may also fully specify ``pad'', ``AM'', ``PM'', and ``sep'' (separator, default ``:'') options. The AM and PM atrings should include a leading space if you want it.

human2seconds

Converts a human-written string of a timespan expressed in various abbreviations of seconds, minutes, hours, days, weeks, months, and years into an integer representing the same time span in seconds.

Subroutine dies if it is incapable of parsing the input string.

Examples:

 human2seconds "3 dys. 2hr 15m"   # 260820
 human2seconds "3q 2wk"           # dies: doesn't recognise 3q

%as_month

A hash containing mappings between various months and abbreviations to their full month names (all keys are lowercase):

  month => Month
  mon   => Month
  mon.  => Month
  ##    => Month
  #     => Month

Also includes 4 letter keys for September.

%as_month_number

A hash containing mappings between various months and abbreviations to their two digit month numbers (all keys are lowercase):

  month => ##
  mon   => ##
  mon.  => ##
  #     => ##

Also includes 4 letter keys for September.

:file_comp - File related computations

size_sum

Given a list of sizes (possibly negative) converts each entry to its corresponding number of bytes, sums the values and then converts the result back to a human readable size. Prefixes are computed base 2 (K = 1024, M = 1048576, ...).

Example:

 print size_sum qw/1.5MB -650kB -1253kB/;

size_sum_SI

DEPRECATED: size_sum now uses MB and MiB

Given a list of sizes (possibly negative) converts each entry to its corresponding number of bytes, sums the values and then converts the result back to a human readable size. Prefixes are treated as standard SI prefixes (K = 1000, M = 1000000, ...).

Example:

 print size_sum_SI qw/1.5MB -650kB -1253kB/;

size2bytes

Given a string like ``4MB'' or ``3TiB - 400G'', return the value as a number of bytes. undef is returned if we can't parse the string. Prefixes are computed base 2 (Ki = 1024, Mi = 1048576, ...) or using standard SI prefixes (K + 1000, M = 1000000).

size2bytes_2

Given a string like ``4MB'' or ``3TB - 400G'', return the value as a number of bytes. undef is returned if we can't parse the string. Prefixes are computed base 2 (K = 1024, M = 1048576, ...).

size2bytes_SI

DEPRECATED: size2bytes now uses MB and MiB

Given a string like ``4MB'' or ``3TB - 400G'', return the value as a number of bytes. undef is returned if we can't parse the string. Prefixes are treated as standard SI prefixes (K = 1000, M = 1000000, ...).

bytes2size

Print a human-readable string of the form 20.4MiB from the corresponding number of bytes (an integer). An optional second parameter specifies the minimal digits of accuracy which is 3 by default, 1.21 but 12.1). An optional third parameter specifies the minimum number of digits after the decimal place to keep which is 1 by default. Prefixes are computed using either base 2 (Ki = 1024, Mi = 1048576, ...).

bytes2size_SI

DEPRECATED: bytes2size now emits KiB, MiB, ...

Print a human-readable string of the form 20.4MB from the corresponding number of bytes (an integer). An optional second parameter specifies the minimal digits of accuracy which is 3 by default, 1.21 but 12.1). An optional third parameter specifies the minimum number of digits after the decimal place to keep which is 1 by default. Prefixes are treated as standard SI prefixes (K = 1000, M = 1000000, ...).

:file - File Operations

rofh

Read only filehandle

 my $fh = rofh $filename;
 my $fh = rofh \$mode, $filename;

Simply performs an open or croak with an appropriate message. If a string reference $mode is provided as a first argument it will be taken as the file mode (the default is ``<'').

wofh

Write only filehandle

 my $fh = wofh $filename;
 my $fh = wofh \$mode, $filename;

Simply performs an open or croak with an appropriate message. If a string reference $mode is provided as a first argument it will be taken as the file mode (the default is ``>'').

rwfh

Read-write filehandle

 my $fh = rwfh $filename;
 my $fh = rwfh \$mode, $filename;

Simply performs an open or croak with an appropriate message. If a string reference $mode is provided as a first argument it will be taken as the file mode (the default is ``+<'').

rofhz

Read only compressed filehandle

 my $fh = rofhz $filename;
 my $fh = rofhz \$mode, $filename;

Simply performs an open or croak with an appropriate message. Requires perl compiled with PerlIO support (perl 5.8, I believe). The gzip PerlIO layer is loaded with the autopop option so that uncompressed files can be open using this function. If a string reference $mode is provided as a first argument it will be taken as the file mode (the default is ``<:gzip(autopop)'').

Note: To properly decode UTF-8 files use the mode ``<:gzip(autopop):encoding(UTF-8)''

wofhz

Write only compressed filehandle

 my $fh = wofhz $filename;
 my $fh = wofhz \$mode, $filename;

Simply performs an open or croak with an appropriate message. Requires perl compiled with PerlIO support (perl 5.8, I believe). If a string reference $mode is provided as a first argument it will be taken as the file mode (the default is ``>:gzip:encoding(UTF-8)'').

Note: To properly encode UTF-8 files use the mode ``>:gzip:encoding(UTF-8)''

rwfhz

Read-write compressed filehandle

 my $fh = rwfhz $filename;
 my $fh = rwfhz \$mode, $filename;

Simply performs an open or croak with an appropriate message. Requires perl compiled with PerlIO support (perl 5.8, I believe). The gzip PerlIO layer is loaded with the autopop option so that uncompressed files can be open using this function. If a string reference $mode is provided as a first argument it will be taken as the file mode (the default is ``+<:gzip(autopop)'').

Note: To properly decode UTF-8 files use the mode ``+<:gzip(autopop):encoding(UTF-8)''

touch

 touch @files;
 touch \MODE @files;

Create files using optional numeric mode (e.g: touch \0700, ``foo''). If files exist, their atime and mtime will be updated to the current time.

canonpath

Like canonpath command in File::Spec, but only works on unix filesystems (also cygwin if $^O eq 'cygwin'). However, it will clean up ``/../'' components wheras File::Spec->canonpath will not.

The code has been modified from File::Spec::Unix::canonpath in the PathTools package by Ken Williams.

fmap

 my @foos = fmap { s/^FOO: (.*)/$_Util::fmap::file: '$1' line $./ } @files
 my @foos = fmap { s/^FOO: (.*)/$_Util::fmap::file: '$1' line $./ } \%options, @files

Transform files. Loop through the lines of each file and apply a function. Replace each line with the new value of $_. The current file name will be available in the variable $_Util::fmap::file and will be one of the entries in the file list given to the subroutine. Of course, the standard perl variable $. ($INPUT_LINE_NUMBER when use English; is in effect) will be available for your use as well.

In scalar or list context returns a hashref (or hash) of (filename => [ new contents ]) pairs. The values are arrayrefs containing the modified lines of each file.

In void context, alters files in-place, just like using perl -pi -e from the command line.

if_mode

File mode when reading the file (the default is simply ``<'').

of_mode

File mode when writing the file (the default is simply ``>'').

backup

If a single character string (E.g., '~') or if starts with a leading dot (E.g., '.bak'), is appended to the filename as a backup suffix, Otherswise is treated as the backup file name ((E.g., 'old_foo'). The default is '~'.

fgrep

 my @foos = fgrep { s/^FOO: (.*)/$_Util::fgrep::file: '$1' line $./ } @files
 my @foos = fgrep { s/^FOO: (.*)/$_Util::fgrep::file: '$1' line $./ } \"<:encoding(UTF-8)", @files

Grep files. Loop through the lines of each file and apply a function. If the function returns a true value then $_ (after the function application) will be appended to a list to be returned. The current file name will be available in the variable $_Util::fgrep::file and will be one of the entries in the file list given to the subroutine. Of course, the standard perl variable $. ($INPUT_LINE_NUMBER when use English; is in effect) will be available for your use as well.

In scalar context just the number of matches will be returned.

NOTE: If you want to chomp your lines note that the last line of a file may not contain a newline (or whatever $/ is) so use something like either of the following:

 my @foos = fgrep { chomp; /^FOO/ } @files;
 my @foos = fgrep { /^FOO/ and chomp || 1 } @files;

If a string reference $mode is provided as the first argument after the subroutine block it will be taken as the file mode (the default is simply ``<'').

find

  #XXX: BUGS!
  Currently not entirely correct but getting better. Known bugs:
    * -mindepth available but broken
    * not thoroughly tested given its complexity
 my @files = find [ '/' ], qw/-type f -name *.pm/;

File::Find using find(1) semantics. Currently supported find options are given below (descriptions taken from find(1)). Unlike find, this subroutine defaults to returning the list of matches rather than defaulting to the -print action. Tests are performed in the order specified so a failure early on will prevent further tests/actions from being perfomed. Note: this function will never be a full find2perl replacement.

-depth

Process each directory's contents before the directory itself.

-follow

Dereference symbolic links. This is the option that most closely follows find(1)'s behavior but is not a perfect match. In particular, a symbolic link which (if followed) would actually result in a circular reference will be processed by find(1), but not by this function.

NOTE: This option corresponds to the follow_fast option to File::Find

-follow_smart

Dereference symbolic links. Circular references (as weel as links that would cause a circular reference) will be automatically removed (symbolic links will only appear if the ``real'' file would not have been found otherwise). Dangling symbolic links will be ignored.

NOTE: This option corresponds to the follow option to File::Find

-maxdepth levels

Descend at most levels (a non-negative integer) levels of directories below the command line arguments. '-maxdepth 0' means only apply the tests and actions to the command line arguments.

-quiet

Disable ``Permission denied'' warnings for unreadable directories.

Tests

-iname pattern

Like -name, but the match is case insensitive. For example, the patterns 'fo*' and 'F??' match the file names 'Foo', 'FOO', 'foo', 'fOo', etc.

-iregex pattern

Like -regex, but the match is case insensitive.

-name pattern

Base of file name (the path with the leading directories removed) matches shell pattern pattern. The metacharacters ('*', '?', and '[]') do not match a '.' at the start of the base name.

-regex pattern

File name matches regular expression pattern. This is a match on the whole path, not a search. For example, to match a file named './fubar3', you can use the regular expression '.*bar.' or '.*b.*3', but not 'b.*r3'.

-type char

File is of type ``char'':

  b      block (buffered) special
  c      character (unbuffered) special
  d      directory
  p      named pipe (FIFO)
  f      regular file
  l      symbolic link
  s      socket
  D      door (Solaris)

Actions

-exec subroutine

Execute subroutine; The subroutine is executed in the directory containing the file and is passed three parameters: the file's name, the current directory (relative to the starting directory), the files's full path (relative to the starting directory). If the ``-follow'' option is provided then the ``true'' filename (all symbolic links resolved) will be provided as a fourth argument.

-print0

print the full file name on the standard output, followed by a null character. This allows file names that contain new-lines to be correctly interpreted by programs that process the find output.

-print

print the full file name on the standard output, followed by a newline.

-prune_all_failures

Discard and prune any files for which any test fails.

-prune_hidden

Discard and prune any hidden files. At the moment this means anything starting with '.' since I don't know how to detect ``hidden'' files on any systems other than linux.

-prune_iname pattern

Like -prune_name, but the match is case insensitive. For example, the patterns 'fo*' and 'F??' match the file names 'Foo', 'FOO', 'foo', 'fOo', etc.

-prune_name pattern

Discard and prune any files where base of file name (the path with the leading directories removed) matches shell pattern pattern. The metacharacters ('*', '?', and '[]') do not match a '.' at the start of the base name.

-prune_on_false

Discard and prune any files for which an -exec clause returns false.

-prune_rcs

Discard and prune any files or directories that look like they belong to a revision control system. At the moment this means any directories named: ``.svn'', ``CVS'', ``blib'', ``{arch}'', ``.bzr'', ``_darcs'', ``RCS'', ``SCCS'', ``.git'', ``.pc''

-prune_backup

Discard and prune any files or directories that look like backups. This includes anything ending in ``~'' or ``.bak'', matching ``#*#'', or ending in ``.tmp'' or matching ``.tmp-[_a-zA-Z0-9]+''

-prune_regex pattern

Discard and prune any names matching the regular expression pattern. This is a match on the whole path, not a search. For example, to match a file named './fubar3', you can use the regular expression '.*bar.' or '.*b.*3', but not 'b.*r3'.

Main Limitations:

No grouping via (), no -or.

newer

Returns true if first file is newer than second file. Also returns true if first file exists but second does not.

lastline

 my $line = lastline $file;
 my $line = lastline "<:encoding(UTF-8)", $file;

Returns the last line of a file. Currently this iterates through each line of the file since I don't think that there is a better way to do it.

By default the input will not be decoded. Either provide an initial scalar reference containing the file mode (with proper encoding, for example \``<:encoding(UTF-8)'') or decode the string before using it.

fprint

 fprint $filename, @stuff
 fprint \$mode, $filename, @stuff

Prints stuff to the indicated filename. If a mode is provided (for example, \">:encoding(UTF-8)") then it will be used instead of the default mode (``>'').

fprint_bu

 fprint_bu $filename, @stuff
 fprint_bu \$mode, $filename, @stuff

Prints stuff to the indicated filename, but backup filename (by appending a ~) first. If a mode is provided (for example, \">:encoding(UTF-8)") then it will be used instead of the default mode (``>'').

fappend

 fappend $filename, @stuff
 fappend \$mode, $filename, @stuff

Append stuff to the indicated filename. If a mode is provided (for example, \">>:encoding(UTF-8)") then it will be used instead of the default mode (``>>'').

fincrement

 fincrement $filename
 fincrement $filename, $amount
 fincrement $filename, pre => $pre, post => $post, layers => $perlio_layers
 fincrement $filename, $amount, pre => $pre, post => $post

Increments the number contained in $filename. On success, the new value is returned (Note: may be zero if $filename contained ``-1''). On failure, undef is returned.

The amount to add to the file's value may be provided. If it is missing, then a value of one is assumed. The optional parameters $pre and $post specify strings to print to the file before and after the number. These strings default to the empty string and a single newline respectively.

Note: $filename must contain only a number (with possible whitespace), or must exactly contain the concatenation of $pre, number, and $post.

If $filename does not exist, then it will be initialized to ``0''

The ``layers'' option can be used to set the PerlIO layers for the opened files (for example layers => ``:encoding(UTF-8)''). By default, no layers are applied.

cat

 my $stuff = cat $file;
 my $stuff = cat \$mode, $file;

Read in the entirety of a file. If requested in list context, the lines are returned. In scalar context, the file is returned as one large string. If a string reference $mode is provided as a first argument it will be taken as the file mode (the default is ``<'').

bcat

Read in the entirety of a binary file. If requested in list context, the lines are returned. In scalar context, the file is returned as one large string.

bu_open

 bu_open $file
 bu_open $fh, $file
 bu_open $fh, $file, "$file.bak"
 bu_open \$mode, $file
 bu_open \$mode, $fh, $file
 bu_open \$mode, $fh, $file, "$file.bak"
 ($writer, $reader) = bu_open \$mode, $file

Backup and open. The general idea is, if the file exists, rename it by appending a ``~'' to its name, then open the original name in write mode. This sub croaks if any operation fails. The backup file is created new so that the inode of the original file does not change.

If only a single string variable argument is given and the function is called in void context, then the requested file is backed up and opened, ``upgrading'' the given argument to a filehandle. Example:

 $file = "foo";
 bu_open $file;         # Note: bu_open "foo"; would be a fatal error
 print $file "Bar\n";

In scalar context, $file is unchanged and a write-onlyfilehandle is returned.

In list context, a filehandle for both the new file (write only) and the backup (read only) are returned.

If a mode is provided as a SCALAR reference (for example, \">:encoding(UTF-8)") then it will be used instead of the default mode (``>'').

If two arguments are given, the first will be used to store the newly opened filehandle, and the second should hold the file name.

Finally, the final argument (if provided) will be used for the backup file (rather than the $file argument with a ``~'' appended).

catfile

Calls the File::Spec catfile and canonpath methods.

realfile

Unnecessary! use Cwd::realpath

:shell - Shell Operations

safe_pipe

 my $results = safe_pipe [ 'command', 'arg' ], @input;
 my @results = safe_pipe [ 'command', 'arg' ], @input;

Pipe data to a shell command safely (without touching a command line) and retrieve the results. Notably, this is the situation that IPC::Open2 says that is dangerous (may block forever) using open2.

Code from merlyn:

 http://www.perlmonks.org/index.pl?node_id=339092

Note: Input and output will not be encoded/decoded thus should be octets.

:color - Color

NOCOLOR

 NOCOLOR(__PACKAGE__) if !$opt{color};
 NOCOLOR()            if !$opt{color};

Replaces subroutines and package variables whose name matches one of the names in the :color_subs or :color_strings export tags with inert versions which do not insert any color sequences. Subroutines are replaced by the identity function and strings are replaced with the empty string. The default package is the caller's current package.

WARNING: This subroutine has no good way of knowing that the subroutines and variables that it finds are really color subroutines and variables. It does however check that subroutines have a '$' prototype and it only has access to package variables (those not declared by my). This combined with the fact that there is only so many things that a function called ``BLUE'' could reasonably do means that this should not generally be a problem.

SUBS affected:

 BOLD UNDERLINE DARK BLINK REVERSE CONCEALED STRIKE
 BLACK RED GREEN YELLOW BLUE MAGENTA CYAN WHITE
 GREY GRAY BRIGHT_RED BRIGHT_GREEN BRIGHT_YELLOW BRIGHT_BLUE BRIGHT_MAGENTA BRIGHT_CYAN
 ON_BLACK ON_RED ON_GREEN ON_YELLOW ON_BLUE ON_MAGENTA ON_CYAN ON_WHITE
 ON_GREY ON_GRAY ON_BRIGHT_RED ON_BRIGHT_GREEN ON_BRIGHT_YELLOW ON_BRIGHT_BLUE ON_BRIGHT_MAGENTA ON_BRIGHT_CYAN

SCALARS affected:

 $BOLD $BOLD_OFF $UNDERLINE $UNDERLINE_OFF $DARK $DARK_OFF $BLINK $BLINK_OFF $REVERSE $REVERSE_OFF
 $CONCEALED $CONCEALED_OFF $STRIKE $STRIKE_OFF $NORMAL $DEFAULT_FG $DEFAULT_BG
 $BLACK $RED $GREEN $YELLOW $BLUE $MAGENTA $CYAN $WHITE
 $GREY $GRAY $BRIGHT_RED $BRIGHT_GREEN $BRIGHT_YELLOW $BRIGHT_BLUE $BRIGHT_MAGENTA $BRIGHT_CYAN
 $ON_BLACK $ON_RED $ON_GREEN $ON_YELLOW $ON_BLUE $ON_MAGENTA $ON_CYAN $ON_WHITE
 $ON_GREY $ON_GRAY $ON_BRIGHT_RED $ON_BRIGHT_GREEN $ON_BRIGHT_YELLOW $ON_BRIGHT_BLUE $ON_BRIGHT_MAGENTA $ON_BRIGHT_CYAN

hsl2rgb

 my $rgb    = hsl2rgb( $H, $S, $L );
 my @colors = hsl2rgb( @hsl_colors );

Convert HSL colors (triples from 0 to 1) to RGB colors (triples from 0 to 255).

rainbow

 rainbow( $n );
 rainbow( $n, %colors_options);

Return a list of $n rainbow colors (ROYGBIV).

Any options supported by colors can be provided and will be passed along, including the n and colors options, so you probably don't want to include those options.

wavelength2rgb

Convert a wavelength (a number between 380 nm and 780 nm) to a RGB triplet. Returns undef if given an out-of-range wavelength.

Formulas taken from Dan Bruton's color science page (http://members.cox.net/astro7/color.html).

$_re_color_escape

A precompiled regular expression that matches any of the colors or font manipulations provided in this package.

strip_color

Remove the color tags from a list of strings. The uncolored strings are returned. Does not modify the input strings and can be used on constant strings.

strip_color_violently

Remove the color tags from a list of strings. The uncolored strings are returned. Modifies the input strings and therefore may not be used on constant strings.

clength

Compute the length of a possibly colored string. The standard perl length function gets confused about how long a colored or decorated string is. This function fixes that so that you can center or align data.

%color

A hash of color names => escape sequences. Included are text style sequences,

  BOLD UNDERLINE DARK BLINK REVERSE CONCEALED

Also, the following colors:

  BLACK GREY GRAY WHITE
  RED GREEN YELLOW BLUE MAGENTA CYAN
  BRIGHT_RED BRIGHT_GREEN BRIGHT_YELLOW BRIGHT_BLUE BRIGHT_MAGENTA BRIGHT_CYAN

And their corresponding backgrounds:

  ON_BLACK ON_GREY ON_GRAY ON_WHITE
  ON_RED ON_GREEN ON_YELLOW ON_BLUE ON_MAGENTA ON_CYAN
  ON_BRIGHT_RED ON_BRIGHT_GREEN ON_BRIGHT_YELLOW ON_BRIGHT_BLUE
  ON_BRIGHT_MAGENTA ON_BRIGHT_CYAN

colors

At the most basic level, converts colors to different formats, however this subroutine is capable of quite a bit more than that.

Examples:

 colors [qw/red green blue/], format => "ps";
 colors [qw/red green blue/], format => "ps", n => 2;
colors

A list of colors, can be an X11 color name or any of the other formats recognised by Color::Calc.

n

Only return n colors.

interpolate

If false, requesting more colors than available in the colors list will throw a fatal error. The default is to create new colors between the given colors if there are insufficient colors provided. The interpolate command will also cause colors to be interpolated if the distribute option is set.

distribute

By default, if fewer colors are requested than are contained in the colors list, this subroutine will select the first n colors. Providing a true value for distribute will cause the subroutine to evenly spread out the choice of colors over the range of colors provided (if n > 2 then the first and last colors are guaranteed to be included).

format

Specify the style of the returned colors. Can be anything supported by Color::Calc which is currently (Color::Calc::VERSION == 1.0): ``tuple'', ``hex'', ``html'', ``object'' (a Graphics::ColorObject object), ``pdf''. The default format is ``object''.

The following formats are also accepted and are handled by this subroutine directly: ``ps'' | ``postscript''.

background

Try to make the colors appear on the given background color. Colors will be altered if this option is provided.

:color_subs - Color Subroutines

BOLD($)

Make text bold

DARK($)

Make text dark

UNDERLINE($)

Make text underline

BLINK($)

Make text blink

REVERSE($)

Make text reverse

CONCEALED($)

Make text concealed

STRIKE($)

Strikethrough text (rarely implemented)

BLACK($)

Make text black

RED($)

Make text red

GREEN($)

Make text green

YELLOW($)

Make text yellow

BLUE($)

Make text blue

MAGENTA($)

Make text magenta

CYAN($)

Make text cyan

WHITE($)

Make text white

GREY($)

Make text grey

GRAY($)

Make text gray

BRIGHT_RED($)

Make text bright_red

BRIGHT_GREEN($)

Make text bright_green

BRIGHT_YELLOW($)

Make text bright_yellow

BRIGHT_BLUE($)

Make text bright_blue

BRIGHT_MAGENTA($)

Make text bright_magenta

BRIGHT_CYAN($)

Make text bright_cyan

ON_BLACK($)

Make text on_black

ON_RED($)

Make text on_red

ON_GREEN($)

Make text on_green

ON_YELLOW($)

Make text on_yellow

ON_BLUE($)

Make text on_blue

ON_MAGENTA($)

Make text on_magenta

ON_CYAN($)

Make text on_cyan

ON_WHITE($)

Make text on_white

ON_GREY($)

Make text on_grey

ON_GRAY($)

Make text on_gray

ON_BRIGHT_RED($)

Make text on_bright_red

ON_BRIGHT_GREEN($)

Make text on_bright_green

ON_BRIGHT_YELLOW($)

Make text on_bright_yellow

ON_BRIGHT_BLUE($)

Make text on_bright_blue

ON_BRIGHT_MAGENTA($)

Make text on_bright_magenta

ON_BRIGHT_CYAN($)

Make text on_bright_cyan

:color_strings - Color Strings

$NORMAL

Undo all color modifications

$DEFAULT_FG

Remove foreground coloring

$DEFAULT_BG

Remove background coloring

$BOLD

Make text bold

$BOLD_OFF

Undo make text bold

$DARK

Make text dark

$DARK_OFF

Undo make text dark

$UNDERLINE

Make text underline

$UNDERLINE_OFF

Undo make text underline

$BLINK

Make text blink

$BLINK_OFF

Undo make text blink

$REVERSE

Make text reverse

$REVERSE_OFF

Undo make text reverse

$CONCEALED

Make text concealed

$CONCEALED_OFF

Undo make text concealed

$STRIKE

Make text strikethrough

$STRIKE_OFF

Undo make text strikethrough

$BLACK

Make text black

$RED

Make text red

$GREEN

Make text green

$YELLOW

Make text yellow

$BLUE

Make text blue

$MAGENTA

Make text magenta

$CYAN

Make text cyan

$WHITE

Make text white

$GREY

Make text grey

$GRAY

Make text gray

$BRIGHT_RED

Make text bright_red

$BRIGHT_GREEN

Make text bright_green

$BRIGHT_YELLOW

Make text bright_yellow

$BRIGHT_BLUE

Make text bright_blue

$BRIGHT_MAGENTA

Make text bright_magenta

$BRIGHT_CYAN

Make text bright_cyan