2020年12月20日星期日

Optimize a query for calcilating datetime difference

I have a SQL table:

+---------+----------+---------------------+---------------------+---------+  | id      | party_id | begintime           | endtime             | to_meas |  +---------+----------+---------------------+---------------------+---------+  | 1395035 |     9255 | 2010-09-26 00:34:02 | 2010-09-26 03:56:20 |       0 |  | 1395036 |     8974 | 2009-07-10 11:00:00 | 2009-07-10 21:30:00 |       0 |  | 1395037 |     8974 | 2009-07-10 23:14:00 | 2009-07-11 08:48:00 |       0 |  | 1395038 |     8975 | 2009-07-10 11:00:00 | 2009-07-10 21:30:00 |       0 |  | 1395039 |     8975 | 2009-07-10 23:14:00 | 2009-07-11 08:48:00 |       0 |  | 1395040 |     8974 | 2009-07-11 10:08:31 | 2009-07-12 18:49:51 |       0 |  | 1395041 |     8975 | 2009-07-11 10:08:31 | 2009-07-12 18:49:51 |       0 |  | 1395042 |     8974 | 2009-07-12 20:38:27 | 2009-07-13 20:33:21 |       0 |  | 1395043 |     8975 | 2009-07-12 20:38:27 | 2009-07-13 20:33:21 |       0 |  | 1395044 |     8974 | 2009-07-13 21:57:37 | 2009-07-15 08:25:45 |       0 |  | 1395045 |     8975 | 2009-07-13 21:57:37 | 2009-07-15 08:25:45 |       0 |  | 1395046 |     8974 | 2009-07-15 08:51:25 | 2009-07-16 10:29:13 |       0 |  | 1395047 |     8975 | 2009-07-15 08:51:25 | 2009-07-16 10:29:13 |       0 |  | 1395048 |     8974 | 2009-07-16 12:22:22 | 2009-07-17 14:39:10 |       0 |  | 1395049 |     8975 | 2009-07-16 12:22:22 | 2009-07-17 14:39:10 |       0 |  | 1395050 |     8976 | 2009-07-24 16:53:48 | 2009-07-25 08:47:29 |       0 |  | 1395051 |     8977 | 2009-07-24 16:53:48 | 2009-07-25 08:47:29 |       0 |  | 1395052 |     8978 | 2009-07-24 16:53:48 | 2009-07-25 08:47:29 |       0 |  | 1395053 |     8979 | 2009-07-24 16:53:48 | 2009-07-25 08:47:29 |       0 |  | 1395054 |     8976 | 2009-07-25 10:47:14 | 2009-07-26 09:41:44 |       0 |  +---------+----------+---------------------+---------------------+---------+  ...  

I need to calculate time between begintime and previous endtime and set to_meas to 1 if this difference is > 30 minutes. Here is my attempt to do it in MySQL:

update doses d set to_meas=1 where d.id in     (select a.id from party join (select * from doses) a     on party_id=a.party_id     left join (select * from doses) b     on party.id=b.party_id     and b.begintime=(select min(begintime)     from (select * from doses) c     where c.begintime > a.endtime)     and timestampdiff(minute, a.endtime, b.begintime) > 30     group by party.id);  

This command runs (quasi-) forever. I've tried to do it in python's pandas:

conn = engine.connect()  sql =      '''      select doses.id, party_id, party.ml, begintime, endtime      from doses join party on party.id=doses.party_id  '''  df = pd.read_sql(con=conn, sql=sql,  measure = df.groupby('party_id', as_index=False).apply(      lambda x: x[pd.to_datetime(x['begintime']) -       pd.to_datetime(x.shift()['endtime']) > pd.to_timedelta('30 minutes')])  measure_ids = measure['id'].to_list()  measure_list = ','.join([str(x) for x in measure_ids])  conn.execute(      'update doses set to_meas=true where id in(%s)' % measure_list)  

The last statement runs about 10 seconds. Is there a way to optimize SQL code for running as fast as the pandas` one?

https://stackoverflow.com/questions/65386370/optimize-a-query-for-calcilating-datetime-difference December 21, 2020 at 09:08AM

没有评论:

发表评论