Rendered at 08:59:51 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
avaer 15 hours ago [-]
If you haven't tried AI modeling pipelines in the last year you'll be surprised.
The star of the show here is https://platform.worldlabs.ai/ (author works there, I don't) which is really good. There's also Meshy.ai (which this repo doesn't seem to use?) for non-scene stuff that's right up there in quality. There's texturing, auto-rigging, etc.
The latest VLLM models have true pixel image grounding which means you can totally ask your AI about pixel coordinates of things, so you get 3d perception for edits and anything else you need.
I'm actually surprised I don't see this stuff being used more; I think it's because most pipelines are hard-baked with assumption that your 3D assets are files you get from an artist, not something you can imagine up in minutes in a script. The technology is moving faster than the industry can keep up with.
NitpickLawyer 13 hours ago [-]
> I'm actually surprised I don't see this stuff being used more;
There's very little incentive to publicly admit you're using this tech. In fact there are a lot of reasons not to.
spookie 2 hours ago [-]
Outside trained images or objects, Hunyuan falls apart pretty hard.
Out of 30 objects I tried, only 4 had relative success. And even then, the topo ain't great.
pawelduda 14 hours ago [-]
What's the best option for converting house blueprints or 3D rendered images back to models?
totalview 10 hours ago [-]
There are not good workflows for doing this for a whole scene, especially where accuracy needs to be preserved. One shot models seem to be very good at providing 3D from a single angle, but multishot is very shoddy at this point, and without seeing behind things you have no clue about what is actually there.
Also, try meshy and look at how many polygons or triangles you get from any of the model objects. Hundreds of thousands, when you retopologize still goes to the high tens of thousands.
tombert 15 hours ago [-]
This is cool as hell.
I remember like seventeen years years ago, Microsoft had "PhotoSynth", which would make 3D environments based on a bunch of images, and seventeen-year-old-tombert thought it was one of the most amazing things to ever be done on a computer.
Doing this with just one image makes this at least an order of magnitude cooler. I will be playing with this over the weekend.
agentifysh 12 hours ago [-]
yeah I remember that was pretty neat each generation 3d stuff gets wilder and wilder
I used to spend all day on Bryce3D creating 3d landscapes, leaving computer on fall night to render like 10 seconds of video of a flyover sunset
bit of a rant here but we are definitely speedrunning 3d and its just going to get wilder once we get glass free bounded AR...projecting 3d video streams and objects in front of our phones (this one I know Samsung is already working on) and rooms
taffydavid 15 hours ago [-]
Photosynth was awesome, I really miss it, but it was more of a panorama tool than a 3d environment.
My pixel6 has a photo sphere mode on the camera which is the same thing
tombert 15 hours ago [-]
You could actually make it have a rough 3D environment as well. Their demo had a model of Piazza San Marco with dots to estimate the actual buildings and the like.
taffydavid 15 hours ago [-]
Oh yes, I remember that now!
andrew_kwak 1 hours ago [-]
That's pretty wild. How does it handle different lighting conditions in the source image? Curious if the results look natural or if they need a lot of tweaking.
toisanji 15 hours ago [-]
I see it used worldlabs, i’ve tested it quite a bit and no results were not really that usable, it hallucinated so many parts outside of the wall that made no sense. He will be fine if hallucinated and it made sense but if it doesn’t make sense, I’m not sure what the point of inputting a single image is. I’ve actually found better luck using gpt image 2 instead.
vunderba 10 hours ago [-]
Yeah, even the latest version, Marble 1.1 which this repo uses can make a royal mess of things, especially outdoor environments.
agentifysh 12 hours ago [-]
I wonder if there is something similar but for creating isometric sprites? I burned through $30 yesterday realizing that I can't just get image gen to give me isometric static/animated sprites with consistency....even the best image gen cannot do this and im just baffled how difficult isometric sprite is compared to 3d mesh gen
I'm at a crossroad , do I opt for 3d mesh isometrics with more hardware requirements for mobile phones or stick to isometric sprite which nobody seems to be generating via AI reliably (happy to be corrected here if anybody does find a way)
ShinyLeftPad 2 hours ago [-]
> I'm at a crossroad , do I opt for 3d mesh isometrics with more hardware requirements for mobile phones or stick to isometric sprite which nobody seems to be generating via AI
Just find an artist or learn to draw
taylorfinley 4 hours ago [-]
I had a similar experience recently while helping my 5 year old daughter vibe code a sandcastle-themed tower defense game (https://sandcastles.finley.lol).
I ended up thinking it might be easier to generate rigged models, animate them, and capture from an iso perspective, then do some kind of pixel art style transfer on the masked sprite sheet. Eventually I realized my kid didn't really care too much about the visuals so I didn't get too far with it.
I've been trying to use this to generate 3d character models from images. I am enjoying 3d printing these models to mess with my kids.
Not much of what I've found runs on local models but I'm always on the lookout. Meshy.ai (mentioned here) offers really nice generation but the cost adds up quickly.
washadjeffmad 11 hours ago [-]
There are quite a few, now, and more coming out regularly to surprisingly little fanfare.
Tencent's Hunyuan3D (https://github.com/Tencent-Hunyuan) is a single/multi view photogrammetry replacement, which image-blaster is based on.
The workflows to make meshes watertight for 3D printing are all pretty effective.
vunderba 10 hours ago [-]
This is more like a Claude-based skill set that orchestrates a bunch of different, separate systems. The closest equivalent to Trellis would probably be its usage of Huyuan-3D, which it uses to create some of the 3D object models.
From what I can tell, it takes an image and first segments it into objects versus environment then sends the environment to Marble 1.1 to generate a Gaussian splat,sends all the isolated individual objects to Hunyuan to generate GLB model files.
ZiiS 15 hours ago [-]
So Blade Runner's Esper photo analysis went from ruining the suspension of disbelief to reality quicker then most magic.
taffydavid 15 hours ago [-]
Well, in blade runner he looks around a corner and zooms in microscopic detail on something not visible from the photo.
But the esper interface is all voice activated, and doesn't talk back - which I think is very prescient, and more likely the way things will go. I'd much rather voice assistants just did the thing that I want them to do rather than talk back to me
janfoeh 14 hours ago [-]
I've never forgotten this SIGGRAPH demo from almost twenty years ago, in which the authors effectively switch camera and light source computationally (... in a static scene) [1]
Ever since then, I have viewed scenes such as the "lingerie store scene" from Enemy of the State [2] with a little bit less eye rolling...
Thanks for posting that demo, kinda sad I've never seen it till now.
MattCruikshank 13 hours ago [-]
I went to high school with the sales clerk, Ivana Miličević.
It's always weird to see her in stuff.
taffydavid 12 hours ago [-]
Oh like Banshee! Which is also weird to see because homelander
nomadar 12 hours ago [-]
Curious about the actual architecture. From the outside it looks like Gaussian splatting anchored to roughly one viewpoint, since the moment you wander outside the original frame or behind an object, it becomes messy.
But Ben Mildenhall is one of the co-founders and a NeRF co-author (https://arxiv.org/abs/2003.08934), so I'm betting that whatever they're doing is more interesting than naive splatting.
Curious if OP can share anything about the pipeline.
mattbillenstein 12 hours ago [-]
My team is working in the character animation space which might complement this: https://uthana.com/
What about creating 3d meshes from multiple photos of the same object?
PacificSpecific 7 hours ago [-]
I've had some success with that. It's pretty neat!
Haven't used it professionally mainly because the titles I've worked on lately aren't realistic so you can't really procure the materials to scan.
SilentM68 10 hours ago [-]
Very cool.
May I ask if Claude is the only option to use the tool?
Sol Roth
squirrelon 8 hours ago [-]
[dead]
par 14 hours ago [-]
I’m ready to make a game with this, or something similar. Open to suggestions on tooling and asset pipelines that utilize AI, if anyone has any suggestions or guides.
The star of the show here is https://platform.worldlabs.ai/ (author works there, I don't) which is really good. There's also Meshy.ai (which this repo doesn't seem to use?) for non-scene stuff that's right up there in quality. There's texturing, auto-rigging, etc.
The latest VLLM models have true pixel image grounding which means you can totally ask your AI about pixel coordinates of things, so you get 3d perception for edits and anything else you need.
I'm actually surprised I don't see this stuff being used more; I think it's because most pipelines are hard-baked with assumption that your 3D assets are files you get from an artist, not something you can imagine up in minutes in a script. The technology is moving faster than the industry can keep up with.
There's very little incentive to publicly admit you're using this tech. In fact there are a lot of reasons not to.
Out of 30 objects I tried, only 4 had relative success. And even then, the topo ain't great.
Also, try meshy and look at how many polygons or triangles you get from any of the model objects. Hundreds of thousands, when you retopologize still goes to the high tens of thousands.
I remember like seventeen years years ago, Microsoft had "PhotoSynth", which would make 3D environments based on a bunch of images, and seventeen-year-old-tombert thought it was one of the most amazing things to ever be done on a computer.
Doing this with just one image makes this at least an order of magnitude cooler. I will be playing with this over the weekend.
I used to spend all day on Bryce3D creating 3d landscapes, leaving computer on fall night to render like 10 seconds of video of a flyover sunset
bit of a rant here but we are definitely speedrunning 3d and its just going to get wilder once we get glass free bounded AR...projecting 3d video streams and objects in front of our phones (this one I know Samsung is already working on) and rooms
My pixel6 has a photo sphere mode on the camera which is the same thing
I'm at a crossroad , do I opt for 3d mesh isometrics with more hardware requirements for mobile phones or stick to isometric sprite which nobody seems to be generating via AI reliably (happy to be corrected here if anybody does find a way)
Just find an artist or learn to draw
I ended up thinking it might be easier to generate rigged models, animate them, and capture from an iso perspective, then do some kind of pixel art style transfer on the masked sprite sheet. Eventually I realized my kid didn't really care too much about the visuals so I didn't get too far with it.
https://github.com/Microsoft/TRELLIS
I've been trying to use this to generate 3d character models from images. I am enjoying 3d printing these models to mess with my kids.
Not much of what I've found runs on local models but I'm always on the lookout. Meshy.ai (mentioned here) offers really nice generation but the cost adds up quickly.
Tencent's Hunyuan3D (https://github.com/Tencent-Hunyuan) is a single/multi view photogrammetry replacement, which image-blaster is based on.
Facebook Research has extended SAM to 3D (https://github.com/facebookresearch/sam-3d-objects), separating as 'Objects' and 'Body'.
The workflows to make meshes watertight for 3D printing are all pretty effective.
From what I can tell, it takes an image and first segments it into objects versus environment then sends the environment to Marble 1.1 to generate a Gaussian splat,sends all the isolated individual objects to Hunyuan to generate GLB model files.
But the esper interface is all voice activated, and doesn't talk back - which I think is very prescient, and more likely the way things will go. I'd much rather voice assistants just did the thing that I want them to do rather than talk back to me
Ever since then, I have viewed scenes such as the "lingerie store scene" from Enemy of the State [2] with a little bit less eye rolling...
[1] - https://www.youtube.com/watch?v=p5_tpq5ejFQ
[2] - https://youtu.be/3EwZQddc3kY?t=6
It's always weird to see her in stuff.
Example: https://uthana.com/app/preview/cXi2eAP19XwQ/mH7opbcqZE4P
Haven't used it professionally mainly because the titles I've worked on lately aren't realistic so you can't really procure the materials to scan.
May I ask if Claude is the only option to use the tool?
Sol Roth