I have a SQL table:
+---------+----------+---------------------+---------------------+---------+ | id | party_id | begintime | endtime | to_meas | +---------+----------+---------------------+---------------------+---------+ | 1395035 | 9255 | 2010-09-26 00:34:02 | 2010-09-26 03:56:20 | 0 | | 1395036 | 8974 | 2009-07-10 11:00:00 | 2009-07-10 21:30:00 | 0 | | 1395037 | 8974 | 2009-07-10 23:14:00 | 2009-07-11 08:48:00 | 0 | | 1395038 | 8975 | 2009-07-10 11:00:00 | 2009-07-10 21:30:00 | 0 | | 1395039 | 8975 | 2009-07-10 23:14:00 | 2009-07-11 08:48:00 | 0 | | 1395040 | 8974 | 2009-07-11 10:08:31 | 2009-07-12 18:49:51 | 0 | | 1395041 | 8975 | 2009-07-11 10:08:31 | 2009-07-12 18:49:51 | 0 | | 1395042 | 8974 | 2009-07-12 20:38:27 | 2009-07-13 20:33:21 | 0 | | 1395043 | 8975 | 2009-07-12 20:38:27 | 2009-07-13 20:33:21 | 0 | | 1395044 | 8974 | 2009-07-13 21:57:37 | 2009-07-15 08:25:45 | 0 | | 1395045 | 8975 | 2009-07-13 21:57:37 | 2009-07-15 08:25:45 | 0 | | 1395046 | 8974 | 2009-07-15 08:51:25 | 2009-07-16 10:29:13 | 0 | | 1395047 | 8975 | 2009-07-15 08:51:25 | 2009-07-16 10:29:13 | 0 | | 1395048 | 8974 | 2009-07-16 12:22:22 | 2009-07-17 14:39:10 | 0 | | 1395049 | 8975 | 2009-07-16 12:22:22 | 2009-07-17 14:39:10 | 0 | | 1395050 | 8976 | 2009-07-24 16:53:48 | 2009-07-25 08:47:29 | 0 | | 1395051 | 8977 | 2009-07-24 16:53:48 | 2009-07-25 08:47:29 | 0 | | 1395052 | 8978 | 2009-07-24 16:53:48 | 2009-07-25 08:47:29 | 0 | | 1395053 | 8979 | 2009-07-24 16:53:48 | 2009-07-25 08:47:29 | 0 | | 1395054 | 8976 | 2009-07-25 10:47:14 | 2009-07-26 09:41:44 | 0 | +---------+----------+---------------------+---------------------+---------+ ...
I need to calculate time between begintime
and previous endtime
and set to_meas
to 1 if this difference is > 30 minutes. Here is my attempt to do it in MySQL:
update doses d set to_meas=1 where d.id in (select a.id from party join (select * from doses) a on party_id=a.party_id left join (select * from doses) b on party.id=b.party_id and b.begintime=(select min(begintime) from (select * from doses) c where c.begintime > a.endtime) and timestampdiff(minute, a.endtime, b.begintime) > 30 group by party.id);
This command runs (quasi-) forever. I've tried to do it in python's pandas
:
conn = engine.connect() sql = ''' select doses.id, party_id, party.ml, begintime, endtime from doses join party on party.id=doses.party_id ''' df = pd.read_sql(con=conn, sql=sql, measure = df.groupby('party_id', as_index=False).apply( lambda x: x[pd.to_datetime(x['begintime']) - pd.to_datetime(x.shift()['endtime']) > pd.to_timedelta('30 minutes')]) measure_ids = measure['id'].to_list() measure_list = ','.join([str(x) for x in measure_ids]) conn.execute( 'update doses set to_meas=true where id in(%s)' % measure_list)
The last statement runs about 10 seconds. Is there a way to optimize SQL code for running as fast as the pandas
` one?
没有评论:
发表评论