Saturday, April 23, 2016

kdb+/q - Joins

You're probably familiar with SQL joins, which are used to combine data from more than one table. This post shows how you can do the same in kdb.

I'll use the following two tables to demonstrate joins in kdb:

q) show age:([] name:`alice`charlie`dave`frank; age:25 26 31 33)
name    age
-----------
alice   25
charlie 26
dave    31
frank   33

q) show work:([] name:`alice`bob`charlie`eve; dept:`ops`dev`hr`eng)
name    dept
------------
alice   ops
bob     dev
charlie hr
eve     dev

Now let's join these tables on the name field using the different kinds of kdb joins:

Left Join (lj):

In a left join, all the rows of the left table are preserved, and those from the right table are only combined if the values in the key columns match those in the left table. If there are values in the left table that don't exist in the right, the joined columns will contain nulls.

q) age lj `name xkey work
name    age dept
----------------
alice   25  ops
charlie 26  hr
dave    31
frank   33

Union Join (uj):

A union join (referred to as a full outer join in SQL) returns rows from both tables, regardless of whether there is a match. If there is a match, it returns combined rows, and if there isn't, the missing side will contain nulls.

q) (`name xkey age) uj `name xkey work
name   | age dept
-------| --------
alice  | 25  ops
charlie| 26  hr
dave   | 31
frank  | 33
bob    |     dev
eve    |     eng

Inner Join (ij):

An inner join returns only those rows where there is a match in both tables.

q) age ij `name xkey work
name    age dept
----------------
alice   25  ops
charlie 26  hr

Equi Join (ej):

This is an inner join in which you can specify the list of columns to join on.

q) ej[`name;age;work]
name    age dept
----------------
alice   25  ops
charlie 26  hr

Plus Join (pj):

This is a left join, which also sums the values of common columns (other than key columns). This type of join doesn't exist in SQL.

q) t1:([] name:`alice`charlie`dave`frank; x:1 2 3 4)
q) t2:([] name:`alice`bob`charlie`eve; x:10 10 10 10)
q) t1 pj `name xkey t2
name    x
----------
alice   11
charlie 12
dave    3
frank   4

Cross Join (cross):

This is a "cartesian product". It joins everything to everything and isn't normally the kind of join people are looking for. Each row is a combination of the rows of the first and second table, so can be a dangerous join to run against large tables.

q) `name xasc work cross age
name    dept age
----------------
alice   ops  25
alice   dev  25
alice   hr   25
alice   eng  25
charlie ops  26
charlie dev  26
charlie hr   26
charlie eng  26
dave    ops  31
dave    dev  31
dave    hr   31
dave    eng  31
frank   ops  33
frank   dev  33
frank   hr   33
frank   eng  33

Sunday, April 17, 2016

Java 8 Streams API: Immutable List Collector

Instead of first collecting a stream into a list and then making it unmodifiable, as shown below:

mutableList = list.stream()
                  // perform some stream operations...
                  .filter(myFilterPredicate)
                  .map(myMapperFunction)
                  // collect into a list
                  .collect(toList());

// now make the list unmodifiable
return Collections.unmodifiableList(list);

you can using collectingAndThen as follows:

return list.stream()
           // perform some stream operations...
           .filter(myFilterPredicate)
           .map(myMapperFunction)
           // collect into an unmodifiable list
           .collect(collectingAndThen(toList(), Collections::unmodifiableList));

Saturday, April 16, 2016

kdb+/q - Concatenating columns

This post shows how you can concatenate values in two or more columns of a table in kdb, into a single value. Consider the following table:

q) person:([] firstName:`Alice`Bob`Charles;lastName:`Smith`Jones`Brown)
q) person
firstName lastName
------------------
Alice     Smith
Bob       Jones
Charles   Brown

In SQL, it's quite easy to concatenate columns, like this:

SQL> select firstName || ' ' || lastName as fullName from person;
fullName
--------
Alice Smith
Bob Jones
Charles Brown

In q, the same thing can be achieved by flipping the firstName and lastName columns and then calling sv to convert the resulting vector into a string, using a space separator. This is shown below:

q) select fullName:`$" "sv'string flip(firstName;lastName) from person

// in functional form:
q) colsToJoin:`firstName`lastName;
q) ?[person;();0b;enlist[`fullName]!enlist(`$sv';" ";(string;(flip;(enlist,colsToJoin))))]

// if there are many repeated names, you can use .Q.fu to improve performance:
q) select fullName:.Q.fu[{`$" "sv'string x};flip(firstName;lastName)] from person

// in functional form, with .Q.fu:
q) ?[person;();0b;enlist[`fullName]!enlist(.Q.fu;{`$" "sv'string x};(flip;(enlist,colsToJoin)))]

Saturday, April 02, 2016

kdb+/q - Reading and Writing a CSV File

This post shows how you can load a CSV file into kdb and write a table out from kdb to a CSV file.

Reading a CSV file:

Let's say that you'd like to load a file containing comma-separated values into an in-memory table in kdb. For example, here's my CSV file, which contains country populations (source: Worldometers):

$ head -5 worldPopulation.csv
region,country,population
Africa,Algeria,40375954
Africa,Angola,25830958
Africa,Benin,11166658
Africa,Botswana,2303820

My file has got two string columns and one integer column.

I can load it into kdb using the Zero Colon function: (types; delimiter) 0: filehandle

q) data:("SSI";enlist",") 0: `$"/path/to/worldPopulation.csv"
// "SSI" creates 2 symbol columns and one integer column

// let's look at the column types
q) meta data
c         | t f a
----------| -----
region    | s
country   | s
population| i

// check the data
q) 5#data
region country                  population
------------------------------------------
Africa Algeria                  40375954
Africa Angola                   25830958
Africa Benin                    11166658
Africa Botswana                 2303820
Africa Burkina Faso             18633725

// you can use * if you want string columns, instead of symbol ones
q) data:("**I";enlist",") 0: `$"/path/to/worldPopulation.csv"

q) meta data
c         | t f a
----------| -----
region    | C
country   | C
population| i

q) 5#data
region   country        population
----------------------------------
"Africa" "Algeria"      40375954
"Africa" "Angola"       25830958
"Africa" "Benin"        11166658
"Africa" "Botswana"     2303820
"Africa" "Burkina Faso" 18633725

Writing a CSV file:

To save a table to a CSV file, you can use the command: filehandle 0: delimiter 0: table

// first, let's try printing out the csv data to console
q) "," 0: data
"region,country,population"
"Africa,Algeria,40375954"
"Africa,Angola,25830958"
"Africa,Benin,11166658"
"Africa,Botswana,2303820"

// save it
q)(`$"/tmp/out.csv") 0: "," 0: data
`/tmp/out.csv