also, if you want to learn computer vision, I maintain a whole repo with notebooks showing how to use and fine-tune different CV models like SAM, YOLOv8, GroundingDINO and of course PaliGemma
this image is 3840x2160; running the model even with increased input resolution (1280) and lowered confidence threshold (0.2) doesn't yield much results
InferenceSlicer processes high-resolution images by dividing them into smaller segments, detecting objects within each, and aggregating the results.