Learning-based Distributed Spatio-Temporal K Nearest Neighbors Join
Efficient Spatial Temporal K Nearest Neighbors Join Algorithm

Brief Introduction

To support efficiently ST-KNN join over huge amount of Spatial-temporal data with any geometry type, we propose a novel distributed solution based on Apache Spark. Specifically, our method adopts a two-round join framework. Furthermore, we propose an automatic knob tuning framework based on Bayesian optimization, which determines good values of introduced parameters. The experimental results show the efficiency and effectiveness of our proposed method.
Framework of ST-kNN Join Executor
Framework of Automatic Knob Tuning

Key Features

(1) Scalability

Our algorithm supports large number of spatio-temporal data, but requires little for the clusters.

(2) Efficiency

In our experimental settings, our algorithm shows a competitive performance and is much more scalable than two state-of-the-art distributed spatio-temporal data management systems (i.e., LocationSpark and Simba).

(3) Various spatial and temporal types supported

Users are able to execute st-kNN join in different spatio-temporal situations.

(4) Easy of Use

Our algorithm has been integrated into JUST (JD Urban Spatio-Temporal Data Engine), which incorporates a complete SQL engine, and presets plenty of out-of-the-box spatio-temporal analysis functions. ST-KNN join can be used in this system with JustQL.

Download

Source Code: https://github.com/Spatio-Temporal-Lab/stknnjoin
Datasets: https://pan.baidu.com/s/1UYwObzuDv1QoRo2gNXL_Ng, Code: nx0z

System Demonstration

Execution Step

Users can input a JustQL in the JustQL Panel, then click the button 运行 (means Run). The execution result will display in the Result Panel.

SQL Syntax for ST-kNN join

We can perform ST-KNN join with a SQL-like statement:

SELECT * FROM R, S WHERE st_knnjoin(R.geom, S.geom, R.t_min, R.t_max, S.t_min, S.t_max, k, delta)
        
where R and S are the names of two spatio-temporal tables; R.geom and S.geom are the spatial field names of these two tables; and R.t_min, R.t_max, S.t_min, S.t_max are the temporal field names of the two tables.

Notice!!!

This demonstration is based on a dsitributed environment. JUST SQL engine can change to the distributed execution engine intelligently for ST-kNN join. You can also change the execution engine to 分布式(means distributed) manually, as shown in the following picture.

Follow Me!

Now, let's experimence ST-kNN join by executing the following statements one by one.

Data Preparing

1. Create tables:

CREATE TABLE point01 (
  pt01 point,
  pt_st01 timestamp,
  pt_et01 timestamp
);
        

CREATE TABLE point02 (
  pt02 point,
  pt_st02 timestamp,
  pt_et02 timestamp
);
        

CREATE TABLE linestring01 (
  ls01 linestring,
  ls_st01 timestamp,
  ls_et01 timestamp
);
        

CREATE TABLE linestring02 (
  ls02 linestring,
  ls_st02 timestamp,
  ls_et02 timestamp
);
        

CREATE TABLE polygon01 (
  pg01 polygon,
  pg_st01 timestamp,
  pg_et01 timestamp
);
        

CREATE TABLE polygon02 (
  pg02 polygon,
  pg_st02 timestamp,
  pg_et02 timestamp
);
        
2. Load Data:

LOAD HDFS:"/sigspatial_data/point01.txt" TO JUST: point01 (
  pt01 st_geomfromwkt(0),
  pt_st01 to_timestamp(1),
  pt_et01 to_timestamp(2)
) WITH (
  "just.separator" = "|",
  "just.header" = "false"
)
        

LOAD HDFS:"/sigspatial_data/point02.txt" TO JUST: point02 (
  pt02 st_geomfromwkt(0),
  pt_st02 to_timestamp(1),
  pt_et02 to_timestamp(2)
) WITH (
  "just.separator" = "|",
  "just.header" = "false"
)
        

LOAD HDFS:"/sigspatial_data/linestring01.txt" TO JUST: linestring01 (
  ls01 st_geomfromwkt(0),
  ls_st01 to_timestamp(1),
  ls_et01 to_timestamp(2)
) WITH (
  "just.separator" = "|",
  "just.header" = "false"
)
        

LOAD HDFS:"/sigspatial_data/linestring02.txt" TO JUST: linestring02 (
  ls02 st_geomfromwkt(0),
  ls_st02 to_timestamp(1),
  ls_et02 to_timestamp(2)
) WITH (
  "just.separator" = "|",
  "just.header" = "false"
)
        

LOAD HDFS:"/sigspatial_data/polygon01.txt" TO JUST: polygon01 (
  pg01 st_geomfromwkt(0),
  pg_st01 to_timestamp(1),
  pg_et01 to_timestamp(2)
) WITH (
  "just.separator" = "|",
  "just.header" = "false"
)
        

LOAD HDFS:"/sigspatial_data/polygon02.txt" TO JUST: polygon02 (
  pg02 st_geomfromwkt(0),
  pg_st02 to_timestamp(1),
  pg_et02 to_timestamp(2)
) WITH (
  "just.separator" = "|",
  "just.header" = "false"
)
        

Point-Point ST-KNN Join

ST-KNN join has been supported in JUST (JD Urban Spatio-Temporal Data Engine). You can join any two tables created before, including join themselves.
In this section, the algorithm extracts the k Nearest Cars for each pedestrain location.

SELECT
  *
FROM
  point01 t1,
  point02 t2
WHERE
  st_knnjoin(
    t1.pt01,
    t2.pt02,
    t1.pt_st01,
    t1.pt_et01,
    t2.pt_st02,
    t2.pt_et02,
    2,
    100
  )
LIMIT
  200
        

Linestring-Linestring ST-KNN Join

In this section, the algorithm extracts the k Nearest Car trajecotries for each pedestrain trajectory.

SELECT
  *
FROM
  linestring01 t1,
  linestring02 t2
WHERE
  st_knnjoin(
    t1.ls01,
    t2.ls02,
    t1.ls_st01,
    t1.ls_et01,
    t2.ls_st02,
    t2.ls_et02,
    2,
    100
  )
LIMIT
  200
        

Polygon-Polygon ST-KNN Join

In this section, the algorithm extracts the k Nearest Car stay point area (polygon) for each pedestrain stay point area (polygon).

SELECT
  *
FROM
  point01 t1,
  polygon01 t2
WHERE
  st_knnjoin(
    t1.pt01,
    t2.pg01,
    t1.pt_st01,
    t1.pt_et01,
    t2.pg_st01,
    t2.pg_et01,
    2,
    100
  )
LIMIT
  200
        

Polygon-Linestring ST-KNN Join

In this section, the algorithm extracts the k Nearest Car stay point area (polygon) for each pedestrain trajectory.

SELECT
  *
FROM
  polygon01 t1,
  linestring01 t2
WHERE
  st_knnjoin(
    t1.pg01,
    t2.ls01,
    t1.pg_st01,
    t1.pg_et01,
    t2.ls_st01,
    t2.ls_et01,
    2,
    100
  )
LIMIT
  200
        

Data Elimination

Drop tables:

DROP table point01;
        

DROP table point02;
        

DROP table linestring01;
        

DROP table linestring02;
        

DROP table polygon01;
        

DROP table polygon02;