I have set up postgres and cassandra in my local machine. The data in each of them is about 2.7 crore rows and the data is the same in both (just a small experiment I am doing).
Little info about data: 2.7 crore rows and 15 columns(varchar,text,timestamp,integer types used only).
I tried different queries, one of them lets say "SELECT * from my-table where fav_source='BANGALORE' ". I used %%time (I am using jupyter notebook) to record the time taken for both postgres and cassandra. I am just trying simple select queries.
Postgres:- CPU times: user 29.1 s, sys: 3.8 s, total: 32.9 s Wall time: 42.2 s
Cassandra:- CPU times: user 43.5 s, sys: 3.62 s, total: 47.1 s Wall time: 3min 50s
So, the Wall time for cassandra is too much compared to that of postgres. Can someone help me understand why it is so?.
The cassandra code part is shown below
from cassandra.cluster import Cluster cluster = Cluster() session = cluster.connect('*my-keyspace*',wait_for_all_pools=True) query = "select * from *my-table* where fav_source='BANGALORE' ALLOW FILTERING" df = pd.DataFrame(list(session.execute(query)))
Postgres part is pretty much the same. I have a connect and read function which queries the same above query and returns data from the DB. I have not attached the postgres code so that I dont make the question too big , but it is the same standard read from DB code(nothing fancy).
So, Someone please explain to me why cassandra needs so much wall time and is there a way I can reduce this?
https://stackoverflow.com/questions/66575881/cassandra-having-more-wall-time-than-postgres March 11, 2021 at 11:05AM
没有评论:
发表评论