Footer (Showing 0 to 0 of 0 entries) issue showing up in large data sets
Footer (Showing 0 to 0 of 0 entries) issue showing up in large data sets
We've just crested the 12 million record mark in a table we display via DataTables, and some interesting performance issues have come to light caused by the needs of the footer to display the "Showing 0 to 0 of 0 entries" values. Should also note that we use column footer filtering fields.
For each rendering of the table I do 3 queries:
1) find the records matching the criteria with a mysql LIMIT set based on the value selected in the "Show N entries" pulldown.
select <fields> from
<
table> where <criteria> LIMIT 0,10;
So far so good. We're indexed up pretty well, results come back in ~500ms
2) get the total number of records in the table:
select count(*) from
<
table>;
About 400ms or so to complete. Still ok, response times to render the table very reasonable.
3) the problem child. Get the count of number of records matching criteria.
select count(*) from
<
table> where <criteria>;
The data is internet phishing/threat data that tends to be URL centric. So users put substrings of company brands, etc in the column footers filtering the data down to only show data that applies to them. In order to find these substrings I use wildcard searches:
select count(*) from
<
table> where url like '%google%';
Count() hates this, return times take over 10 minutes at times due to the WHERE, and things get much more expensive in count() with WHERE using LIKE and % wildcards. I sniff to see if the user has entered "http://<url" and if so strip off the beginining % wildcard and only use a trailing %
So a two part question comes out of this:
1) current strategy is to turn the footer display off, return a hardcoded 0 in the json data for total records and matching records. Any other suggestions? Need to figure out how to disable it as well.
2) any mysql tuning experts out there, my dba/network guy has limited availability over the holidays and I'm looking for a pointer that might help me better tune the db - page and swap file configs, additional memory, etc - on my end while waiting for him to come back.
Any other users with large db's facing these issues?
tia
Mike
Answers
well, that direction won't work. by setting recordsFiltered to zero I lose my pagination - the widget appears but because it believes it has zero records I am unable to move past the first page of N results.
and of course, the recordsFiltered value is the one in #3 above...
Can you use phpMyAdmin or something similar? If so, what performance/result does it give you for your "count(*)" query?
I use HeidiSql to profile the queries so I know exactly what the problem is and where the times are slow. What do you see phpma providing me?
In that case you would find more help on a MySQL site. Posting here suggests you think DataTables is the problem.
Fair enough point, its not a DT "problem" but was hopeful that someone here might have faced this before. And also reach out for info re: if there were DT options that I could use that would disable the need to run query 3 above while still allowing DT to run properly. Looks so far like there isn't anything
I'm digging in on the mysql side trying to understand how to watch paging and performance in real time on the console to help see/identify what is thrashing
I guess it's worth a try. I'm curious about this one myself; I haven't got near twelve million records myself, but I'd like to know everything will work if/when I do. I might have a look at building some test data if I get the time.
as an update, the current solution strategy involves using Sphinx Search Engine with real time indicies - I'm working on learning the engine so that we can figure out a way to return the needed values to support our site wide DT implementation. http://sphinxsearch.com
The Sphinx search engine itself is amazingly fast. What takes upwards of 25-30 minutes (select count(id) from <table> where field like '%<substring value>%') in a 27G table of 114M records in MySQL takes a bit over 1 second in Sphinx. And I think there is still room to further optimize the Sphinx transaction to get the results back in low ms timeframes. Our current single index that we have started from in Sphinx takes about 3 min to build against the table. Very impressive so far.
Solr might be an option too
More when I know it