foat
foat4mo ago

programmatically interact with marimo

Is it possible to interact with kernel directly in a programmatic way without going through the UI? I want my python backend server to be able to add, remove and edit cells, get state, get dependency/DAG structure of a notebook. TY
9 Replies
Akshay
Akshay4mo ago
No, that's not possible today. All those operations are part of an internal API that is subject to change drastically, and it's unfortunately too early to standardize. Out of curiosity, what is your use case?
foat
foatOP4mo ago
thanks for the prompt reply this is for a small project im working on that involves running python code in a reproducible fashion. it's for the same kind of use cases people use notebooks for (explore -> pipeline) the crux of what I'm interested in is playing around with the UI/frontend. basically not a notebook. so i was hoping there was a clean way to manipulate code cells and get info about the relationship between them. I really like what you're doing with marimo, I also checked out ipyflow and hex.tech I think ipyflow might give me some ability to do this if I interact with it through REST-(ish?) API of the jupyter server. I've looked at the internal (marimo/_ast) modules and see what you mean. I'm a bit scared of touching that, a lot going on that I don't understand. in any case, last night this got me interested in thinking about even how these systems parse the python code to figure this out and I've made a simple static analysis thingy. excited about adding more features and playing around with it. I think I might keep doing this from scratch, would be educational. Very grateful for the Ipyflow papers, gives a lot of insight on how this works. Actually I'm curious, are all three of you players in the space (marimo, hex, ipyflow) doing the DAG parsing in the same way fundamentally? It seems only marimo and ipyflow are open source and I wonder how you see the comparison.
Akshay
Akshay4mo ago
Yes, I think that would definitely be very educational! The parsing is in marimo/_ast/visitor.py. marimo and ipyflow take very different approaches. ipyflow uses runtime analysis, trying to react to mutations, whereas marimo relies exclusively on static analysis -- meaning the DAG only takes into consideration variable definitions and variable references. In particular, we don't track mutations to objects at all, and we don't do any runtime tracing of your code. This is intentional, because tracing and trying to detect mutations is a losing battle; it's just a fundamentally impossible task in Python, there will always be edge cases that you can't cover leading to a poor development experience. I used to work on TensorFlow, and there was a team that tried to do runtime tracing of Python code to detect mutations, and it just didn't really work. In contrast, in marimo, it's easy for the user to understand how their DAG will be formed.
Akshay
Akshay4mo ago
One day we'll write a paper about it. I've written a small amount about how all this works in this blog: https://marimo.io/blog/lessons-learned
Lessons learned reinventing the Python notebook
Designing a notebook that can be shared as an app, run as a script, versioned with git, and more
foat
foatOP4mo ago
Wow this is so helpful thank you so much! I did watch a jupytercon presentation about ipyflow and he mentioned runtime analysis and also intuitively thought that would be really hard. So if it is a losing battle, how would you evaluate what the state of ipyflow is? Are they missing edge cases? Also on your point about not being able to abstract out the jupyter-server-style api (what I asked originally). What do you feel is complicated about it? Isn't that abstraction naturally separate from everything else and would be neatly packageable?
Akshay
Akshay4mo ago
So if it is a losing battle, how would you evaluate what the state of ipyflow is? Are they missing edge cases?
Missing edge cases, yes. There's also a runtime overhead, something between 2-4x I believe. I talked with Stephen Macke somewhat recently, and he believes the static approach we've taken is better for both users and developers. I spoke with Chris Lattner and he also endorses the static approach — that's as good of an endorsement as you can get in my book 🙂
Also on your point about not being able to abstract out the jupyter-server-style api (what I asked originally). What do you feel is complicated about it? Isn't that abstraction naturally separate from everything else and would be neatly packageable?
In theory it's of course doable. But Myles and I are developing really rapidly, and our internal abstractions change rapidly — though of course our public API is relatively stable. We can't abstract out the internals into a public API until we're ready to commit to not changing it, we don't want to break our users' code. So it's not that it's hard to expose our APIs, it's just too early.
Akshay
Akshay4mo ago
If you like you can also check out a talk I gave on marimo at north bay Python: https://www.youtube.com/watch?v=9R2cQygaoxQ&t=1s
North Bay Python
YouTube
"marimo: an open-source reactive notebook for Python" - Akshay Agra...
Akshay Agrawal https://pretalx.northbaypython.org/nbpy-2024/talk/LSLE9A We introduce marimo, an open-source reactive notebook for Python that addresses several common complaints about first-generation notebooks. marimo notebooks are reproducible, with a...
foat
foatOP4mo ago
Ty this is very helpful!
Akshay
Akshay4mo ago
No problem!