Linear Algebra Overview

Linear Algebra Overview#

%load ../../rapaio-bootstrap.ipynb

Adding dependency io.github.padreati:rapaio-lib:7.0.1
Solving dependencies
Resolved artifacts count: 1
Add to classpath: /home/ati/work/rapaio-jupyter-kernel/target/mima_cache/io/github/padreati/rapaio-lib/7.0.1/rapaio-lib-7.0.1.jar

DArrays are multi-dimensional arrays which stores data elements of the same type indexed by dimensions. Current implementation offers only stride dense vector arrays, but the design allows future implementations like sparse arrays.

DArrays offers many operations, besides standard manipulation data tools, there are implemented also some non trivial operations and also matrix decompositions and linear solvers. There are implementations for four numerical data types: byte, int, float and double.

DArrayManager#

Working with darrays starts with having a DArrayManager. A DArrayManager handles all the internals which are required for an implementation of DArrays. Even if there is a single implementation of DArrays, which works with standard Java arrays and uses stride algebra and simd vectorization, multiple implementations are possible. One example is to use MemorySegments when simd instructions will be able to work with indexed operations.

DArray manager describes which storage factory to be used, which implementation of the operations, and other various parameters.

One can create darrays with this tensor manager using also DArrays class. However it is advisable to use a DArrayManager every time since this will maintain code compatbility in the future when new DArray implementations will be available. Switching to other implementations will require a single line of code change, where you choose the DArrayManager implementation.

DArrayManager dm = DArrayManager.base();
DType dt = DType.DOUBLE;

Shape of a DArray#

DArray elements are indexed by dimensions. You can think of a darray as a hypercube of data elements with one element in each position of the hypercube. A darray can have 0, 1, 2 or more dimensions. Each dimension have a dimension size. The dimensions and the size of each dimension is described by a Shape object.

The total number of elements is given by the product of all dimension sizes, or 1 if there is no dimension.

// a shape with no dimensions
println(Shape.of());

// a shape with 1 dimension / a vector of size 1
println(Shape.of(1));

// Both shapes are of size 1, but they are different shapes
println(Shape.of().size());
println(Shape.of(1).size());

Shape: []
Shape: [1]
1
1

DArray creation#

A darray with no dimensions is a scalar and can be created with:

var scalar = dm.scalar(dt, 1.);
println(scalar.shape());
println(scalar)

Shape: []
BaseStride{DOUBLE,[],0,[]}
 1  

A DArray with a single dimension is a vector, with two dimensions it is called a matrix and in general with more than two dimensions it is called a tensor.

dm.seq(DType.FLOAT, Shape.of(100)).printFullContent()

[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 ] 

Since the default darray manager uses Java arrays as data storage, it is possible to wrap an array into a darray of appropriate type and use darray operations to change values in the array. For example we will create an array of double values and a wrapper darray around it. Using darray methods we can change the values from the array.

double[] array = new double[] {1., 2., 3.};
// the default tensor manager wraps a double array if we request a Tensor<Double>
var t = dm.stride(dt, array);
t.log1p_(); // call the in-place log1p operation
Arrays.stream(array).forEach(System.out::println);

6931471805599453
0986122886681096
3862943611198906

The same thing happens for other value types (from the supperted ones which are: double,float,int and byte)

float[] floatArray = new float[] {1.f, 2, 3, 4};
var floatTensor = dm.stride(DType.FLOAT, floatArray);
floatTensor.sqr_();
for(float f : floatArray) {
    println(f);
}

Using a different type will not wrap the array, but will copy the data into a new darray.

var doubleArray = dm.stride(dt, floatArray);
doubleArray.sqrt_();
for(int i=0; i<floatArray.length; i++) {
    printf("original value at index[%d]: %f, corresponding darray value: %f%n", 
                      i, floatArray[i], doubleArray.get(i));
}
// we can see that the values remain unchanged

original value at index[0]: 1.000000, corresponding darray value: 1.000000
original value at index[1]: 4.000000, corresponding darray value: 2.000000
original value at index[2]: 9.000000, corresponding darray value: 3.000000
original value at index[3]: 16.000000, corresponding darray value: 4.000000

The are also other constructors.

var m = dm.eye(dt, 3);
println("m: ", m);

Random random = new Random(42);
var u = dm.random(dt, Shape.of(3), random);
println("u: ", u);
println("vector inner product between u and third row of m, which basically is the third element of u:");
println(u.inner(m.selsq(0, 2)));
println(u.get(2))

m: BaseStride{DOUBLE,[3, 3],0,[3, 1]}
[[ 1 0 0 ]  
 [ 0 1 0 ]  
 [ 0 0 1 ]] 

u: BaseStride{DOUBLE,[3],0,[1]}
[ 0.6054611363173034 0.47673188281383294 -0.49948335406718436 ] 

vector inner product between u and third row of m, which basically is the third element of u:
-0.49948335406718436
-0.49948335406718436

Since m is the identity matrix, it’s transpose is also identity, matrix multiplication between the two produces also identity and the determinant of m should be 1:

var mtm = m.t_().mm(m);
display("text/markdown", "$m^T m = I$");
mtm.printContent();
println(mtm.lu().det());

$m^T m = I$

[[ 1 0 0 ]  
 [ 0 1 0 ]  
 [ 0 0 1 ]] 
1.0

Stride arrays#

DArray base implementation uses stride array algebra. In short, all the elements of the darrays are stored in a storage which is a contiguous array. A darray is a view over the storage which defines a stride arithmetic. If operations allows to use the same storage (it is possible to perform the transformation manipulating only shape, offset and stride information), than it will create a new view. This allows fast operations if possible.

Let’s build an array which contains a sequence of numbers.

var x = dm.seq(dt, Shape.of(2,3,4));
x

BaseStride{DOUBLE,[2, 3, 4],0,[12, 4, 1]}
[[[  0  1  2  3 ]   
  [  4  5  6  7 ]   
  [  8  9 10 11 ]]  
 [[ 12 13 14 15 ]   
  [ 16 17 18 19 ]   
  [ 20 21 22 23 ]]] 

Let’s dive a little bit into the description of the darray. BaseStride is the class which implements the darray, which means it uses stride arithmetic to index the values from storage. FLOAT is the data type of the elements, in this case being float value type from Java.

Next we have some numeric information. [2,3,4] is the shape. This is to be expected, since we specified that shape when we created the darray. This means we have a darray with three dimensions, first axis of size 2, second axis of size 3 and the third axis of size 4.

The next information is the number 0. This is the offset. The offset for a strided array is a position in the contiguous storage where the first element of the darray is stored. Sometimes this offset is not zero, and we will provide and example soon.

The last numeric information is [12,4,1]. These are the strides. We have a stride for each dimension in the darray. The meaning of stride is the following: how many elements have to be skipped to see the next element in the given dimension. The stride and offset gives a way to compute the position of an element in the storage, given an index for that element.

\[ptr[i,j,k] = offset + i*stride[0] + j*stride[1] + k * stride[2]\]

The above formula is valid for three dimensions, of course. In general for D dimensions we have:

\[ptr[i_0,i_1,..,i_{D-1}] = offset + \sum_{j=0}^{D-1} i_j * stride[j]\]

x is a 3 dimensional array. The first dimension has size 2. This allows the interpretation of x as being an array of 2 matrices of shape [3,4]. We can select an element based on an axis and index using sel. Furthermore, selsq, which is an abbreviation of select and squeeze removes the given axis from the darray.

// first matrix
println(x.selsq(0,0));
println(x.selsq(0,1));

BaseStride{DOUBLE,[3, 4],0,[4, 1]}
[[ 0 1  2  3 ]  
 [ 4 5  6  7 ]  
 [ 8 9 10 11 ]] 

BaseStride{DOUBLE,[3, 4],12,[4, 1]}
[[ 12 13 14 15 ]  
 [ 16 17 18 19 ]  
 [ 20 21 22 23 ]] 

Notice the offset of the second darray, which is 12. The two darrays have the same shapes and strides, which means their layout in memory is the same, but their offset is different. This is because x is stored in C order (the default order in storage, but that can be changed also), which means it first stores the elements of the first matrix at offset 0 and after that it stores the elements of the second matrix at offset 12, which is the same value as the number of elements of the first darray.

Multiple other operations on the layout creates different views on the same storage. General transposition, for example, creates the following darray.

x.t_()

BaseStride{DOUBLE,[4, 3, 2],0,[1, 4, 12]}
[[[  0 12 ]   
  [  4 16 ]   
  [  8 20 ]]  
 [[  1 13 ]   
  [  5 17 ]   
  [  9 21 ]]  
 [[  2 14 ]   
  [  6 18 ]   
  [ 10 22 ]]  
 [[  3 15 ]   
  [  7 19 ]   
  [ 11 23 ]]] 

Notice how the shape dimensions are reversed, the same offset, but strides are different, since there are different dimension sizes. To check if two darrays have the same storage one have to compare the storage instances.

println(x.storage());
println(x.t_().storage());
println(x.storage() == x.t_().storage());

rapaio.darray.storage.array.DoubleArrayStorage@2f4e40f7
rapaio.darray.storage.array.DoubleArrayStorage@2f4e40f7
true

Note on syntax: Some methods have two variants, the first one named operation and the second one operation_, which is the same name ended with underscore. When the method name ends with underscore the operation works on the same storage as the original darray, when there is no ending underscore a new copy is created. Not all operations allows this syntax, since some are not able to work on the same storage and some others works only on the same storage.

When this syntax is used, both scenarios are implemented. This is valid for the transpose operation also. Notice the storage instance:

println(x.storage());
println(x.t_().storage());
println(x.t().storage());

rapaio.darray.storage.array.DoubleArrayStorage@2f4e40f7
rapaio.darray.storage.array.DoubleArrayStorage@2f4e40f7
rapaio.darray.storage.array.DoubleArrayStorage@403db5f7

Data iterators#

The elements of a darray can be accessed in multiple ways. One way is based on indexes. The index of an element is an array of semi-positive integer numbers, each less than its corresponding dimension. Thus the index of an element describes the order of that element in each dimension.

dt = DType.FLOAT;
var x = dm.random(dt, Shape.of(4,5), random);
x

BaseStride{FLOAT,[4, 5],0,[5, 1]}
[[ -0.5915424823760986   0.42765530943870544 1.301008701324463   -0.33507856726646423 -0.5955197215080261  ]  
 [ -0.09123338013887405  0.7820308804512024  1.400564193725586   -0.15987195074558258  0.6741945743560791  ]  
 [ -0.2882782518863678  -0.9254017472267151  0.23874905705451965 -0.8072279691696167   0.938342809677124   ]  
 [ -0.9454368948936462   0.22093218564987183 0.6785244941711426   0.17902342975139618  0.20195713639259338 ]] 

We have a matrix of 4 rows and 5 columns. To access the element from second row and third column we can use get methods:

println(x.get(1,2));
// if we want directly the value type
println(x.getFloat(1,2));
// we can de a cast, also, if it is useful, notice that 0 is the rounded value of the original one
println(x.getInt(1,2));

1.4005642
1.4005642
1

To iterate on the elements using indexes one can use the following iterator:

var it = x.iterator();
while(it.hasNext()) {
    printf("%.4f, ", it.next());
}

-0.5915, 0.4277, 1.3010, -0.3351, -0.5955, -0.0912, 0.7820, 1.4006, -0.1599, 0.6742, -0.2883, -0.9254, 0.2387, -0.8072, 0.9383, -0.9454, 0.2209, 0.6785, 0.1790, 0.2020, 

The default order is Order.C C-style order, which means last dimensions loops faster. In the case of the matrix we iterate first on rows, and after that, in the inner loop, we iterate over columns. So, in C order, columns loops faster than rows.

One can use also other orders, like Order.F Fortran-style order, which means first dimensions loops faster. There is also Order.S which is the storage order, the best possible order in which elements can be iterated so that accessing them is cache friendly. Also, Order.A means automatic order, which can be determined by an algorithm. We also have Order.defaultOrder() which is the configured default order, in our case being C.

var it = x.iterator(Order.F);
while(it.hasNext()) {
    printf("%.4f,", it.next());
}

-0.5915,-0.0912,-0.2883,-0.9454,0.4277,0.7820,-0.9254,0.2209,1.3010,1.4006,0.2387,0.6785,-0.3351,-0.1599,-0.8072,0.1790,-0.5955,0.6742,0.9383,0.2020,

Another kind of iterator is to iterate by pointer. The pointer of an element is basically the position of that element in the underlying storage. The advantage of using pointer elements is that it can be faster since some optimizations are realized if possible. To access elements based on their pointer value you can used ptrGet methods.

var pit = x.ptrIterator();
while(pit.hasNext()) {
    printf("%.4f, ", x.ptrGet(pit.next()));
}

-0.5915, 0.4277, 1.3010, -0.3351, -0.5955, -0.0912, 0.7820, 1.4006, -0.1599, 0.6742, -0.2883, -0.9254, 0.2387, -0.8072, 0.9383, -0.9454, 0.2209, 0.6785, 0.1790, 0.2020, 

Operations on shapes#

In general we can consider that a darray is an organized indexed view over some data elements which are lies into a storage. Many operations on those objects changes only this organization of data. Some of the operations are able to create a new view over the same data and for some other this is impossible, and a new storage with copied data is necessary.

The idea of multidimensional array of elements can be seen in a recursive fashion. For example a darray in three dimensions can be understood as an indexed array of darrays of 2 dimensions, which in turn can be seen as arrays of unidimensional objects, which in a final turn can be seen as an indexed array of individual values, or scalars. As we can see this is a matter of organization, and this kind of operations helps us to give a proper form more usefull for understanding, for computation or for other reasons.

var x = dm.seq(dt, Shape.of(2,3,2));
x

BaseStride{FLOAT,[2, 3, 2],0,[6, 2, 1]}
[[[  0  1 ]   
  [  2  3 ]   
  [  4  5 ]]  
 [[  6  7 ]   
  [  8  9 ]   
  [ 10 11 ]]] 

// split into pieces of the same size on the first dimension
x.chunk(0, false, 1);

[BaseStride{FLOAT,[3, 2],0,[2, 1]}
[[ 0 1 ]  
 [ 2 3 ]  
 [ 4 5 ]] 
, BaseStride{FLOAT,[3, 2],6,[2, 1]}
[[  6  7 ]  
 [  8  9 ]  
 [ 10 11 ]] 
]

// split into pieces of the same size on first and last dimension
List<DArray<Float>> list = x.chunkAll(false,new int[]{1,3,1});
list.forEach(da->println(da.squeeze(0, 2)));

BaseStride{FLOAT,[3],0,[2]}
[ 0 2 4 ] 

BaseStride{FLOAT,[3],1,[2]}
[ 1 3 5 ] 

BaseStride{FLOAT,[3],6,[2]}
[ 6 8 10 ] 

BaseStride{FLOAT,[3],7,[2]}
[ 7 9 11 ] 

Darrays can also be concatenated or stacked, operations which will produce new copies of data.

// define some vectors
var a = dm.random(dt, Shape.of(3), random);
var b = dm.random(dt, Shape.of(3), random);
var c = dm.random(dt, Shape.of(3), random);
println(a);
println(b);
println(c);

BaseStride{FLOAT,[3],0,[1]}
[ 0.682409405708313 -1.860346794128418 -0.3640243411064148 ] 

BaseStride{FLOAT,[3],0,[1]}
[ 0.907001256942749 -0.2078128606081009 1.9437248706817627 ] 

BaseStride{FLOAT,[3],0,[1]}
[ 0.5633633732795715 -0.04871172457933426 -0.5485519170761108 ] 

// stack all of them as rows in a matrix. 
dm.stack(0, List.<DArray<Float>>of(a,b,c))

BaseStride{FLOAT,[3, 3],0,[3, 1]}
[[ 0.682409405708313  -1.860346794128418   -0.3640243411064148 ]  
 [ 0.907001256942749  -0.2078128606081009   1.9437248706817627 ]  
 [ 0.5633633732795715 -0.04871172457933426 -0.5485519170761108 ]] 

// stack all of them as columns in a matrix
dm.stack(1, List.<DArray<Float>>of(a,b,c))

BaseStride{FLOAT,[3, 3],0,[3, 1]}
[[  0.682409405708313   0.907001256942749   0.5633633732795715  ]  
 [ -1.860346794128418  -0.2078128606081009 -0.04871172457933426 ]  
 [ -0.3640243411064148  1.9437248706817627 -0.5485519170761108  ]] 

// concatenate all elements along the single available dimension of the elements
dm.cat(0, List.<DArray<Float>>of(a,b,c))

BaseStride{FLOAT,[9],0,[1]}
[ 0.682409405708313 -1.860346794128418 -0.3640243411064148 0.907001256942749 -0.2078128606081009 1.9437248706817627 0.5633633732795715 -0.04871172457933426 -0.5485519170761108 ] 

Most of the operations which manipulates shapes are present, like repeat, select or remove by axis indices.

var x = dm.random(dt, Shape.of(4,3), random);
println("x:", x);
println("first and last col of x:", x.sel(1, 0, 2));
println("remove first two rows of x:", x.rem(0, 0, 1));
println("select first col of x and repeat it 3 times:", x.sel(1, 0).repeat(1, 3, false));

x:BaseStride{FLOAT,[4, 3],0,[3, 1]}
[[  1.6434990167617798   0.9172410368919373   0.3495040833950043  ]  
 [ -0.3341809809207916  -0.35777759552001953 -0.1645195335149765  ]  
 [ -0.10719511657953262 -0.06851354241371155 -0.07772722095251083 ]  
 [  0.9436370730400085  -1.0320192575454712   0.9695566296577454  ]] 

first and last col of x:BaseStride{FLOAT,[4, 2],0,[3, 2]}
[[  1.6434990167617798   0.3495040833950043  ]  
 [ -0.3341809809207916  -0.1645195335149765  ]  
 [ -0.10719511657953262 -0.07772722095251083 ]  
 [  0.9436370730400085   0.9695566296577454  ]] 

remove first two rows of x:BaseStride{FLOAT,[2, 3],6,[3, 1]}
[[ -0.10719511657953262 -0.06851354241371155 -0.07772722095251083 ]  
 [  0.9436370730400085  -1.0320192575454712   0.9695566296577454  ]] 

select first col of x and repeat it 3 times:BaseStride{FLOAT,[4, 3],0,[3, 1]}
[[  1.6434990167617798   1.6434990167617798   1.6434990167617798  ]  
 [ -0.3341809809207916  -0.3341809809207916  -0.3341809809207916  ]  
 [ -0.10719511657953262 -0.10719511657953262 -0.10719511657953262 ]  
 [  0.9436370730400085   0.9436370730400085   0.9436370730400085  ]] 

The interesting case of stride 0#

It is possible that sometimes you can see a 0 strided dimension. This is ok, and it means that values on that dimensions are repeated (read from the same place) by a number of times equal with the dimension.

var s = dm.scalar(dt, 2);
println(s);

BaseStride{FLOAT,[],0,[]}
 2  

println(s.strexp(0, 3).strexp(1, 4));

BaseStride{FLOAT,[3, 4],0,[0, 0]}
[[ 2 2 2 2 ]  
 [ 2 2 2 2 ]  
 [ 2 2 2 2 ]] 

In the example below we started from a scalar, which is a darray with no dimensions, and we added new dimensions (stretch) and expanded that dimensions by a number of times. In this case we added first dimension repeated three times and second dimension repeated four times. Notice the strides are all 0.

Note:

During implementation of stride arithmetics I found that also negative strides are possible. Having a negative stride would mean that instead of adding a number of positions you would have to go back a number of positions in that dimension. However I had chosen to not allow this kind of flexibility. I could not foresee all the implications for all sorts of operations (the first reason), and the second reason was that many layout operations on strides become overly complicated (at least to my mind). It is a design decision and for now I see it as a fair price to lose some wild flexibility and buy some computation and stability.

Unary operations#

Unary operations are operations which are applied on each element and the order of processing elements is does not change the result. In most cases the processing is independent for each element, so a single pass over the indexed elements is enough. If the operation is not independent, additional passes over data might be required, but the order of precessing them is still not relevant.

This characteristic leaves some room for optimization since if we are free to process the elements in whatever order we want, we can work out the best order which is cache friendly and potentially vectorizable.

There are implemented plenty of unary operations, we illustrare here just with one example:

dt = DType.DOUBLE;
var x = dm.random(dt, Shape.of(3,4,2), random);
println("x:", x);
var y = x.swapAxis(0,2).swapAxis(1,2);
println("y:", y);

x:BaseStride{DOUBLE,[3, 4, 2],0,[8, 2, 1]}
[[[ -0.09966190499890398 -0.5811431434994926  ]   
  [ -0.8561254101627737  -0.9181375052182392  ]   
  [  1.1062313258059315  -0.03361855779440348 ]   
  [ -0.19940824264650267  0.3394637281444628  ]]  
 [[  0.5239940045886521  -0.4808024622381947  ]   
  [  0.18050084772155725 -0.32917346344484205 ]   
  [  1.135010764731621    0.8622715316732382  ]   
  [  0.295551483102627   -0.32032930267467846 ]]  
 [[  0.5165038632024493   1.3322719542251784  ]   
  [ -0.8554643292889054   0.8746757350332696  ]   
  [  0.32638451857295453 -0.09187350537981684 ]   
  [ -0.5084207394085699   0.10020556064469907 ]]] 

y:BaseStride{DOUBLE,[2, 3, 4],0,[1, 8, 2]}
[[[ -0.09966190499890398 -0.8561254101627737   1.1062313258059315  -0.19940824264650267 ]   
  [  0.5239940045886521   0.18050084772155725  1.135010764731621    0.295551483102627   ]   
  [  0.5165038632024493  -0.8554643292889054   0.32638451857295453 -0.5084207394085699  ]]  
 [[ -0.5811431434994926  -0.9181375052182392  -0.03361855779440348  0.3394637281444628  ]   
  [ -0.4808024622381947  -0.32917346344484205  0.8622715316732382  -0.32032930267467846 ]   
  [  1.3322719542251784   0.8746757350332696  -0.09187350537981684  0.10020556064469907 ]]] 

Notice how strides are not well aligned anymore. However appling unary operation is still fast since the elements are dense in memory.

y.add(2).log1p()

BaseStride{DOUBLE,[2, 3, 4],0,[12, 4, 1]}
[[[ 1.0648273146801222 0.7626147479516716 1.4125056554915065 1.0298307367635657 ]   
  [ 1.2595950067639308 1.157038683673884  1.4194899317930934 1.1925735236318604 ]   
  [ 1.2574672753583793 0.7629230584158948 1.2018859840021214 0.9129167506320462 ]]  
 [[ 0.8832950552182608 0.7332629232383152 1.0873428403711773 1.2058102336616128 ]   
  [ 0.9239404134152693 0.9823879887598992 1.3512554900985507 0.985693912840778  ]   
  [ 1.4660921051519407 1.3544619780220997 1.0675090575629782 1.1314684191780842 ]]] 

y.softmax1d(2)

BaseStride{DOUBLE,[2, 3, 4],0,[12, 4, 1]}
[[[ 0.17500473424725813 0.08213379149438037 0.5844704696385434  0.15839100461981817 ]   
  [ 0.2300244470890973  0.16315365705801513 0.42377487333569713 0.1830470225171904  ]   
  [ 0.4099551398262636  0.1039676243306248  0.3389756900440856  0.14710154579902585 ]]  
 [[ 0.16796293915289073 0.11991090312678258 0.2904025141609272  0.4217236435593994  ]   
  [ 0.1394973305941254  0.1623370069670688  0.5343865531909564  0.16377910924784933 ]   
  [ 0.46184977173043223 0.2922598926089767  0.11117374070414876 0.13471659495644225 ]]] 

y.softmax1d(2).sum1d(2) // not really 1 due to numerical precision, but very close

BaseStride{DOUBLE,[2, 3],0,[3, 1]}
[[ 1.0000000000000002 0.9999999999999999 1 ]  
 [ 1                  0.9999999999999999 1 ]] 

Reduce operations#

A reduce operation is an operation which computes an aggregate over one or more dimensions. As an illustrative example consider the sum over all elements of a matrix. This is a reduce operation since from a 2-dimensional darray you get a scalar, which is a 0-dimensional darray. Some reduction does not necessary eliminate dimensions, but can make them shorter.

var x = dm.random(dt, Shape.of(2,3), random);
print("x: ", x);
print("sum of all elements: ", x.sum());

x: BaseStride{DOUBLE,[2, 3],0,[3, 1]}
[[  0.34542089466377   -1.1442604343149465 -0.006700537039870284 ]  
 [ -1.9218206218596519 -1.7930243327862754 -0.04050852593549706  ]] 
sum of all elements: -4.560893557272471

The same sum can be also aplied along one dimension. This can be done using methods which ends with 1d. If you have trouble imagining the effect, you can this that you choose an axis and the vectors along that axis collapses into a value and the dimension disapears.

print("sum along rows:", x.sum1d(0));
print("sum along columns:", x.sum1d(1));

sum along rows:BaseStride{DOUBLE,[3],0,[1]}
[ -1.5763997271958818 -2.9372847671012217 -0.04720906297536734 ] 
sum along columns:BaseStride{DOUBLE,[2],0,[1]}
[ -0.8055400766910468 -3.755353480581424 ] 

For the next operations we will take a more complex darray, a 3 dimensional one.

var x = dm.random(dt, Shape.of(2,3,4), random);
x

BaseStride{DOUBLE,[2, 3, 4],0,[12, 4, 1]}
[[[  0.39899995451422293 -0.2643210253452166  1.5562230224800353  -0.2933931145093284  ]   
  [  0.3751673073136748   0.740381197550476   1.2483185916349693   0.2503380599085345  ]   
  [  1.9779791683701187  -1.4249549431782167  0.7558175246705047  -0.5863436478398663  ]]  
 [[  1.2770365589979307  -0.3217101506550075 -0.17304111461681013 -0.43350414318466146 ]   
  [ -0.4813117564170963  -0.1440362000426429 -1.3020425753781046   0.6593151939713588  ]   
  [ -0.9507732870948371   0.8610353749743883 -1.0812651372305313  -1.3228248420801145  ]]] 

Aggregate on a shape is an aggregation operation which takes all pieces from the darray of a given shape and reduces them in one value. This is basically a generalization of one dimensional reduction. There is a catch, however, the shape must match the last dimensions of the original shape. The on shape do not do broadcast, so you cannot use dimension size 1 (maybe this is an idea of improvement if there will be a need). Since the last dimensions collapses into a single value, the result will have the shape described by the first dimensions which are not subject to reduction. This operation allows a keepDim parameter which when set to true will keep the collapsed dimensions.

// sum on the last dimension
print(x.sumOn(Shape.of(4), false));
// sum over last two dimensions
print(x.sumOn(Shape.of(3,4), false));

BaseStride{DOUBLE,[2, 3],0,[3, 1]}
[[ 1.397508837139713    2.614205156407655  0.7224981020225403 ]  
 [ 0.34878115054145153 -1.268075337866485 -2.4938278914310947 ]] 
BaseStride{DOUBLE,[2],0,[1]}
[ 4.734212095569908 -3.4131220787561283 ] 

Aggregate to a shape does, somehow the opposite of aggregate on a shape. Using aggregate to a shape you specify actually the shape of the result, all other dimensions will collapse by reduction function into scalar values, placed in the requested shape. This operation allows broadcast. The specified shape must match the last dimensions of the original darray, but also can be 1, which means that the corresponding dimension will collapse.

x.sumTo(Shape.of(3,4), false)

BaseStride{DOUBLE,[3, 4],0,[4, 1]}
[[  1.6760365135121535  -0.5860311760002241  1.3831819078632253  -0.7268972576939898 ]  
 [ -0.10614444910342147  0.5963449975078332 -0.05372398374313536  0.9096532538798934 ]  
 [  1.0272058812752816  -0.5639195682038284 -0.32544761256002663 -1.9091684899199808 ]] 

In the previous example, we get a reduced shape as requested. The reduced shape is a matrix and in each cell it contains the sum of the two corresponding elements. In this case you can imagine the darray as an array of two matrices. Since we collapse to the matrix shape, this is esentially like summing the two matrices.

But also some wild things can be done, like collapsing into the seond dimension.

x.sumTo(Shape.of(3,1), false)

BaseStride{DOUBLE,[3, 1],0,[1, 1]}
[[  1.7462899876811644 ]  
 [  1.3461298185411699 ]  
 [ -1.7713297894085542 ]] 

Aggregate to a shape can perform the same thing as aggregate on a shape, you just have to specify the complementary indinces.

print(x.sumOn(Shape.of(3,4),false));
print(x.sumTo(Shape.of(2,1,1), false).squeeze(1,2));

BaseStride{DOUBLE,[2],0,[1]}
[ 4.734212095569908 -3.4131220787561283 ] 
BaseStride{DOUBLE,[2],0,[1]}
[ 4.734212095569908 -3.4131220787561283 ] 

Gather and scatter#

Gather and scatter operations are somehow related to aggregation, but instead of values it operates with indices and acts only on a single dimension. If you read the documentation about gather and scatter you might have a hard time. I did, at least.

Instead of explaining how they works, my starting point of understanding those operations is to have an example. Let’s start with a matrix, since it has a simpler structure.

var x = dm.random(dt, Shape.of(4,3), random);
print(x)

BaseStride{DOUBLE,[4, 3],0,[3, 1]}
[[ -1.9263576324867564 0.12145504927669352 -0.5763234546075162   ]  
 [  1.8451937732003756 1.3076329303091292  -0.4017278949455708   ]  
 [ -2.195724940612832  0.8764015971541241   0.053904322747972445 ]  
 [  0.9747407176263607 0.21075724479804384  0.49088489464751806  ]] 

Now think that you need to take the maximum value from each row. That can be easily done with a reduce operation.

// get the maximum value along columns, which is the maximum value from each row
x.amax1d(1)

BaseStride{DOUBLE,[4],0,[1]}
[ 0.12145504927669352 1.8451937732003756 0.8764015971541241 0.9747407176263607 ] 

This is simple, but what if you want to do something more complicated? Like for example to take the maximum value from each row, to multiply each maximum value by 100, and then to put those changed values back in their place?

We will do that in steps. First we take the index from each row of the maximum value.

var index = x.argmax1d(1, true);
index

BaseStride{INTEGER,[4, 1],0,[1, 1]}
[[ 1 ]  
 [ 0 ]  
 [ 1 ]  
 [ 0 ]] 

Now use the gather magic, which basically collects the values along the given axis from the corresponding indices.

var maxValues = x.gather(1, index);
maxValues

BaseStride{DOUBLE,[4, 1],0,[1, 1]}
[[ 0.12145504927669352 ]  
 [ 1.8451937732003756  ]  
 [ 0.8764015971541241  ]  
 [ 0.9747407176263607  ]] 

You can see that actually we obtained the same values as before. Why bother than? Because the index can be used also to put back the values. But first let’s do the processing, thus multiplication by 10.

maxValues.mul_(100);
maxValues

BaseStride{DOUBLE,[4, 1],0,[1, 1]}
[[  12.145504927669352 ]  
 [ 184.51937732003756  ]  
 [  87.64015971541241  ]  
 [  97.47407176263607  ]] 

We used inplace version of the operation, no need to copy the data. Now we apply the scatter magic, which basically do the opposite of what gather does. Instead of taking values from a given indinces, the scatter operation puts values back at the given indices.

maxValues.scatter(1, index, x);
print(x)

BaseStride{DOUBLE,[4, 3],0,[3, 1]}
[[  -1.9263576324867564 12.145504927669352   -0.5763234546075162   ]  
 [ 184.51937732003756    1.3076329303091292  -0.4017278949455708   ]  
 [  -2.195724940612832  87.64015971541241     0.053904322747972445 ]  
 [  97.47407176263607    0.21075724479804384  0.49088489464751806  ]] 

This is it. To me this example was enough to understand what scatter and gather does, hope it is the same for you.

Binary operations#

Binary operations are operations which takes one input and produces a result. The binary operations for darray are as normal ones, with the addition of the fact that one of the operands can be also a scalar. Thus, instead of adding a value to all elements by expanding the scalar to a matrix of the same shape, you can also provide a value typed parameters. Some examples, are provided, but since this is not a complex topic it should be illustrative enough.

var x = dm.random(dt, Shape.of(3,4), random);
print("x: ", x);
var y = dm.random(dt, Shape.of(3,4), random);
print("y: ", y);

x: BaseStride{DOUBLE,[3, 4],0,[4, 1]}
[[  0.14180302151996524 -1.1965032080235163  0.22371030592334323 1.2543483166020575  ]  
 [  0.3225320393535859  -0.2987091402816581  1.3024874350161564  0.9688885522075621  ]  
 [ -0.6427485657940173   1.7772473044775188 -0.25972966022766314 0.18538055027877662 ]] 
y: BaseStride{DOUBLE,[3, 4],0,[4, 1]}
[[ -0.6490129557843826    0.9220377521124845 0.41023570499390005 0.2900388839498221   ]  
 [ -0.029589111955443545 -0.3545306667771934 0.6371352715631713  1.0597089759064198   ]  
 [ -0.4131534140954149    0.6999792377480535 0.6009945892697238  0.021463987999379392 ]] 

// elementwise multiplication
print("x*y: ", x.mul(y));

x*y: BaseStride{DOUBLE,[3, 4],0,[4, 1]}
[[ -0.09203199813582905  -1.1032211283213793   0.09177395506486377 0.36380978583159884 ]  
 [ -0.009543436621650777  0.10590155067649847  0.8298606856166372  1.0267398954273292  ]  
 [  0.26555376436272965   1.2440362134779563  -0.15609612046968932 0.00397900590650201 ]] 

// add first row of x to each row of y, no squeeze for broadcasting purposes
y.add(x.sel(0,0))

BaseStride{DOUBLE,[3, 4],0,[4, 1]}
[[ -0.5072099342644173 -0.2744654559110318 0.6339460109172432 1.5443872005518795 ]  
 [  0.1122139095645217 -1.5510338748007098 0.8608455774865145 2.314057292508477  ]  
 [ -0.2713503925754497 -0.4965239702754628 0.8247048951930671 1.275812304601437  ]] 

// add first columns of x to all columns of y
y.add(x.sel(1,0)) 

BaseStride{DOUBLE,[3, 4],0,[4, 1]}
[[ -0.5072099342644173  1.0638407736324498    0.5520387265138653   0.4318419054697873 ]  
 [  0.2929429273981423 -0.03199862742360754   0.9596673109167572   1.3822410152600058 ]  
 [ -1.0559019798894322  0.057230671954036194 -0.04175397652429347 -0.6212845777946379 ]] 

// multiply x by 10
x.mul(10)

BaseStride{DOUBLE,[3, 4],0,[4, 1]}
[[  1.4180302151996524 -11.965032080235163   2.2371030592334322 12.543483166020575  ]  
 [  3.225320393535859   -2.9870914028165814 13.024874350161564   9.688885522075621  ]  
 [ -6.4274856579401725  17.77247304477519   -2.5972966022766313  1.8538055027877662 ]] 

Broadcasting#

Broadcasting is a useful this which is permitted by many operations. In general two or more darrays have shapes which allows broadcast if they can be expandedn into the same shape.

For that the shape dimensions are considered from the last to the first and are broadcastable if one of the following rules applies:

dimension sizes are the same
if dimensions are different, than the different ones has to be equal with 1
if the dimension does not exist in some darrays

For example those two shapes are compatible from a broadcast perspective: [3,1] and [4,3,2]. The final shape is [4,3,2]. Starting from the last dimensions of each shape we have:

2 and 1 is fine since one of them is 1
3 and 3 is fine since they are equal
4 and nothing is fine since in the first shape the dimension does not exists.

Another example is [3,1] and [4,3]. The last dimension is fine since one of them is 1, but the second dimension is not compatible since we have two non unit values which are not the same.

Matrix decompositions#

Some fundamental matrix decompositions are available. Usually those can be found as operations with an abbreviated name. Some examples includes singluar value decomposition (svd), eigenvalue decomposition (eig), or QR decomposition qr.

var x = dm.random(dt, Shape.of(4,2), random);
x

BaseStride{DOUBLE,[4, 2],0,[2, 1]}
[[  0.7567825267669753 -0.3157733126508792 ]  
 [  1.0930959308769808  0.516058757107416  ]  
 [ -0.9939206326524546  0.8418386973583344 ]  
 [  0.5991079617033903  0.8234527934893987 ]] 

// singular value decomposition
var u = x.svd().u();
var s = x.svd().s();
var v = x.svd().v();

display(u);
display(s);
display(v);

// reconstruction of x
println(u.mm(s).mm(v.t()));

println("rank of x: ", x.svd().rank());

BaseStride{DOUBLE,[4, 2],0,[2, 1]}
[[  0.43117234644755625 -0.23084525752274776 ]  
 [  0.6154020794511615   0.40085115798856896 ]  
 [ -0.5695240082955842   0.6257872289535087  ]  
 [  0.3331862725879112   0.6280279932432696  ]] 

BaseStride{DOUBLE,[2, 2],0,[2, 1]}
[[ 1.764831932326689 0                  ]  
 [ 0                 1.3238397382589575 ]] 

BaseStride{DOUBLE,[2, 2],0,[2, 1]}
[[  0.9999101816256504   0.013402562492250613 ]  
 [ -0.013402562492250613 0.9999101816256504   ]] 

BaseStride{DOUBLE,[4, 2],0,[2, 1]}
[[  0.7567825267669752 -0.31577331265087893 ]  
 [  1.0930959308769805  0.5160587571074158  ]  
 [ -0.9939206326524544  0.8418386973583342  ]  
 [  0.5991079617033903  0.8234527934893986  ]] 

rank of x: 2

// QR decomposition
x.qr().q()

BaseStride{DOUBLE,[4, 2],0,[2, 1]}
[[ -0.42882964605868845  0.23516857766932903 ]  
 [ -0.6194011153358777  -0.39464367291806557 ]  
 [  0.5632035862820868  -0.6314815692839941  ]  
 [ -0.3394835981028341  -0.6246464113059065  ]] 

// q is orthogonal
var q = x.qr().q();
println("norm of the first vector column: ", q.selsq(1,0).inner(q.selsq(1,0)));
println("inner product of the two column vectors: ", q.selsq(1,0).inner(q.selsq(1,1)));

norm of the first vector column: 1.0
inner product of the two column vectors: 0.0

var r = x.qr().r();
// basic reconstruction
println(x);
println(q.mm(r));

BaseStride{DOUBLE,[4, 2],0,[2, 1]}
[[  0.7567825267669753 -0.3157733126508792 ]  
 [  1.0930959308769808  0.516058757107416  ]  
 [ -0.9939206326524546  0.8418386973583344 ]  
 [  0.5991079617033903  0.8234527934893987 ]] 

BaseStride{DOUBLE,[4, 2],0,[2, 1]}
[[  0.7567825267669753 -0.31577331265087905 ]  
 [  1.0930959308769808  0.5160587571074158  ]  
 [ -0.9939206326524547  0.8418386973583344  ]  
 [  0.5991079617033903  0.8234527934893987  ]] 

// compute inverse of a matrix using QR decomposition
var xinv = x.qr().inv();
println(xinv);
println(xinv.mm(x));

BaseStride{DOUBLE,[2, 4],0,[4, 1]}
[[  0.24195452261565423 0.35272983767004973 -0.3163427617435499  0.19513327203437   ]  
 [ -0.17763429658819532 0.2980936141066997   0.47698882852409963 0.4718259003321623 ]] 

BaseStride{DOUBLE,[2, 2],0,[2, 1]}
[[ 0.9999999999999998               0 ]  
 [ 0.000000000000000111022302462516 1 ]] 

var x = dm.random(dt, Shape.of(4, 4), random);
x

BaseStride{DOUBLE,[4, 4],0,[4, 1]}
[[  0.9277868070616492   1.0376498525251583   2.1695416044546936 -1.130529030341333   ]  
 [  0.29000769383326563  0.10767860994046376 -1.0248722154647125  0.08677799357528221 ]  
 [  0.12311941873490145 -0.44389302327682645  0.5832119178935503 -0.28506764696738307 ]  
 [ -2.1636236765285006  -1.2652081154320538  -1.2598768406437806 -0.9414949106234748  ]] 

var y = dm.random(dt, Shape.of(4), random);
y

BaseStride{DOUBLE,[4],0,[1]}
[ 1.268183710166665 -0.20481478094651417 1.0255981782999506 -0.42200590908296 ] 

x.mul(y)

BaseStride{DOUBLE,[4, 4],0,[4, 1]}
[[  1.1766041152231261  -0.21252602724412303   2.2250779172746857  0.47708993119387144 ]  
 [  0.3677830331423491  -0.022054170907581227 -1.0511070771708435 -0.03662082606713223 ]  
 [  0.15613804124479055  0.09091585232612913   0.5981410805544456  0.1203002315086108  ]  
 [ -2.7438723015043545   0.25913332301396813  -1.2921273926465586  0.39731641565463965 ]] 

x.mul(y.stretch(1)) 

BaseStride{DOUBLE,[4, 4],0,[4, 1]}
[[  1.1766041152231261   1.315930639829248    2.7513773212982926  -1.433718500149394   ]  
 [ -0.05939786228526405 -0.022054170907581227 0.20990897830857377 -0.01777341574509944 ]  
 [  0.12627105156786375 -0.4552558760327708   0.5981410805544456  -0.2923648594220015  ]  
 [  0.9130619765268261   0.5339253009320425   0.5316754714684462   0.39731641565463965 ]] 

x.mul(y.stretch(0))

BaseStride{DOUBLE,[4, 4],0,[4, 1]}
[[  1.1766041152231261  -0.21252602724412303   2.2250779172746857  0.47708993119387144 ]  
 [  0.3677830331423491  -0.022054170907581227 -1.0511070771708435 -0.03662082606713223 ]  
 [  0.15613804124479055  0.09091585232612913   0.5981410805544456  0.1203002315086108  ]  
 [ -2.7438723015043545   0.25913332301396813  -1.2921273926465586  0.39731641565463965 ]]