AdrDar87
AdrDar87ā€¢2mo ago

Altair performance questions

Hi team, thanks for your work on this package. Apologies for the dumb question, I have been using large datasets and used altair for dynamic plotting with slider filters. Using on jupyter: alt.data_transformers.enable("vegafusion") alt.renderers.enable("jupyter", offline=True), this is very fast (less than 1 sec for the update). I tried to achieve the same filtering using marimo sliders directly on my polars dataframe and then plotting, this works but it is slower as obviously polars needs to do the filtering and run 2 cells (2-5 seconds). For this use case, the JupyterChart (jupyter renderer) is faster, is there any plan to have it supported in marimo ? (Other renderers being too slow for large datasets)
10 Replies
Myles Scolnick
Myles Scolnickā€¢2mo ago
can you share an example with us? you can use alt.data_transformers.enable("vegafusion") in marimo as well
AdrDar87
AdrDar87OPā€¢2mo ago
Let me send something tomorrow
AdrDar87
AdrDar87OPā€¢2mo ago
marimo | Notebook Oct 16, 2024, 7:06 AM
Explore data and build apps seamlessly with marimo, a next-generation Python notebook.
AdrDar87
AdrDar87OPā€¢2mo ago
with a heavier dataset, the difference becomes noticeable https://altair-viz.github.io/user_guide/jupyter_chart.html I'm basically asking about support for JupyterChart at some point Thank you for looking into this šŸ™‚
Myles Scolnick
Myles Scolnickā€¢2mo ago
I cant see any slowness - it takes a 83ms for your example to re-render for me. (although its not a large dataset). happy to debug an example if you can send one with a larger dataset. also, I was able to render a JupyterChart as it was shown in the altair docs:
import pandas as pd
source = pd.DataFrame({
'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})
chart = alt.Chart(source).mark_bar().encode(
x='a',
y='b'
)

jchart = alt.JupyterChart(chart)
jchart
import pandas as pd
source = pd.DataFrame({
'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})
chart = alt.Chart(source).mark_bar().encode(
x='a',
y='b'
)

jchart = alt.JupyterChart(chart)
jchart
Myles Scolnick
Myles Scolnickā€¢2mo ago
No description
Myles Scolnick
Myles Scolnickā€¢2mo ago
We also have a doc about better performance utilizing Data Transformers https://docs.marimo.io/api/plotting.html#performance-and-data-transformers
AdrDar87
AdrDar87OPā€¢2mo ago
Thank you for looking into this! I'll have another look over the coming days šŸ™‚ "The JupyterChart widget and the "jupyter" renderer are designed to work with the VegaFusion data transformer to evaluate data transformations interactively in response to selection events. This avoids the need to transfer the full dataset to the browser, and so supports interactive exploration of aggregated datasets on the order of millions of rows." I'm curious, is marimo_csv doing the same thing under the hood? https://marimo.io/p/dev/notebook-dn5hsm-vfmcetat8211da33ux796r are you able to run the notebook with either renderers? jupyter renderer (works instantly in jupyter FYI) gets me an output limit marimo_csv plot is empty for some reason I still need to get you an example of a heavy dataset where the re-rendering time is long
Myles Scolnick
Myles Scolnickā€¢2mo ago
marimo_csv won't work like vegafusion, but you can also just use vegafusion which should be performant I can look at the jupyter data_transformer as well to see if we can bridge some capabailities
AdrDar87
AdrDar87OPā€¢2mo ago
That would be nice ! šŸ™‚ thank you for your help!