Fast Client-Side Filtering

Fast Client-Side Filtering

george9eggeorge9eg Posts: 6Questions: 0Answers: 0
edited April 2012 in Plug-ins
I am attempting to employ DataTables in a browser-based web application that automatically updates the display in real time with data that arrives on a socket (custom ActiveX control in IE today; hope to use HTML5 WebSockets in future). Multiple updates to individual cells (changes only - not entire rows) arrive at a nominal rate of around 10 every second. For all intents and purposes, the total number of rows in the table is fixed and is in the range of 800 - 1000.

I want to use DataTables for advanced table functionality, especially client-side sorting and filtering. It's acceptable to have sorting happen only "on demand" (user request), but filtering needs to be dynamic. In other words, when filtering to show only those rows whose "Status" column has a certain value, individual rows should automatically be hidden/shown based on their current "Status" value.

At first, I thought DataTables was going to be a great fit and make this really easy, but I quickly discovered that even if I suppress bSort (and bSortClasses!), that there's a lot of overhead in the way DT implements filtering. So I embarked on writing a plug-in to optimize the processing for this specific purpose, only to discover that updating row visibility ... involves destroying/recreating tr and td elements?!? Waaaaaay too much overhead for real-time updates! I had pictured the filtering implementation as simply hiding/showing rows (ala CSS display:none) that stayed put in the DOM.

I had started going down the path of reorganizing the filtering calls to do the whole 3-tier she-bang on a row, and only re-filtering the rows that have changed when I discovered that the whole concept of what filtering actually does is vastly different from what I thought. Do you think it would still be feasible to co-opt the filtering support into doing what I'm looking for? Perhaps there could be a configuration switch that controls whether client-side filtering is accomplished by manipulating the DOM or using jQuery's .hide()/.show() (possibly even with animation). Would that be a huge deal, or could it be accomplished in a fairly straight-forward manner with a plug-in?

Replies

  • allanallan Posts: 63,812Questions: 1Answers: 10,516 Site admin
    Hi,

    > [filtering] involves destroying/recreating tr and td elements?!?

    That's not the case when using client-side processing (it was in DataTables 1.0-1.3, but never since). If you are using server-side processing, then yes, this has to be the case, since the table doesn't doesn't have any knowledge about anything other than what is on the current page - but client-side processing, no, nodes are retained.

    > simply hiding/showing rows (ala CSS display:none) that stayed put in the DOM

    It would be interesting to see what the speed difference is between inserting and removing child nodes, compared to setting the css property, since either way a reflow and repaint is going to be required. If nodes were created then I can see there being a significant difference, but I would imagine that it isn't that great a difference. Removing nodes is actually quite important from a performance point of view for larger tables - say you have a 100'000 rows - that a lot of TR elements in the DOM if you only have a paging size of 10! Also rows must be removed and re-added for sorting, so it makes sense to use the same mechanism for filtering, rather than taking a double hit.

    So moving towards a solution, can you give me more of an idea of what you are looking to do? Are you using client-side processing for example? What does you current update code look like? How fast are you getting data for cells (you say 10s, is that per cell, row or individual data points)?

    Allan
  • george9eggeorge9eg Posts: 6Questions: 0Answers: 0
    I am using client-side processing exclusively. I am receiving updates to individual values that are pushed to a socket on the client and delivered to the JavaScript on the page through an ActiveX event. Each update comes to the client as an XML document that the JavaScript transforms into JSON that script then uses to update fields on the screen. Update documents are pushed to the client at rates up to 1 per second. Each update document only contains values for individual fields whose values have changed. In the case of the particular page on which I'm currently trying to use DataTables, only one column ("Status") has values that change dynamically. Each update document typically contains a new value for the Status column of about 10 rows or less, with occasional peaks containing a new Status value for every row. Typical usage would include filtering the table on the Status column, expecting rows to appear and disappear dynamically based on which Status value(s) have been selected.

    I set table-layout:fixed, turn off bPaginate, bAutoWidth, bSort, and bSortClasses, and I call fnDraw() using setTimeout() to yield processing to the browser first. And it still bogs down and locks up the browser (32bit IE9 on 64bit Win7). IE9's profiler tells me that almost 90% of the time spent in _fnDraw() is being spent in calls to appendChild() and removeChild(). Since only the values in one column are changing, dropping and re-creating the rest of each row is pure overhead in my scenario, not to mention walking rows that haven't changed changed at all. The set of rows in the data is fixed, and the number of rows is sufficiently small that it is practical to load them all on the client. Given all this, I am quite sure that hiding/showing individual rows would involve substantially fewer DOM calls and perform much better.
  • allanallan Posts: 63,812Questions: 1Answers: 10,516 Site admin
    > dropping and re-creating the rest of each row is pure overhead in my scenario

    It isn't recreating DOM elements. The elements are removed, but not destroyed. They are held onto in Javascript variables so the same nodes can be reinserted into the DOM.

    Your updates might effect filtering and or sorting (albeit that you've got sorting disabled, so not in this case), so DataTables needs to do a full draw, and that draw will always remove the TR elements in the TBODY and then readd them in the order needed.

    Where this could be optimised is that the order and filtering might not be changed and thus a draw would not be needed, or a single node might just need replaced with a different one etc. That is certainly an area that DataTables could use some considerable optimisation on and I will look at that for future versions.

    So yes, without doubt the draw method could be optimised significantly for this use case, where as at the moment it is very general to cope with everything that gets thrown at DataTables.

    Allan
  • george9eggeorge9eg Posts: 6Questions: 0Answers: 0
    Ah yes, you're right. I saw elements being removed and added back, and my brain thought "destroy and recreate." Silly brain. :o) (...or a bit scary, maybe.)
  • allanallan Posts: 63,812Questions: 1Answers: 10,516 Site admin
    I'll have a look at how I could implement a algorithm that is better than the brute force one that is currently employed in a future version of DataTables. It sounds like it might be an ideal candidate for a plug-in (pick if you want a complex and thus extra code method but traded-off for speed, or if you want just the generic and straight forward method).

    Added to the list :-)

    Thanks,
    Allan
This discussion has been closed.