Skip to content

I think you can try SAM3 #97

@Ying0524

Description

@Ying0524

I have collected some videos and used the project for generation, but I found that the actual results were not particularly good. The YOLO model makes errors when localizing objects in certain frames, and there doesn’t seem to be a way to manually correct them—or perhaps I just haven't found the option. Recently, I noticed an open-source model called SAM3, which does an excellent job at image segmentation. However, its recognition efficiency isn’t very high and it can’t achieve real-time processing. On an RTX 4070 Ti, it takes about 0.2 seconds per image. But we could try extracting keyframes for prediction. Bro, do you think this would be feasible?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions