The main difference is that binwalk goes through a file linearly, searching for ...

The main difference is that binwalk goes through a file linearly, searching for patterns like magic bytes, and tries to extract everything it finds.

The problems with this:

- very noisy, finding a lot of false positives (license code, format inside another format, etc).

- very slow, trying to extract irrelevant things

- imprecise, because it finds patterns in the middle of a file, where it's actually not relevant on the first level of extraction

unblob solves these problems by being smarter about the file formats, recognizing them by their specification, for example unpacks format header structs and carves out files based the information in the header (size, offset). See a simple example for NTFS [1].

We also went to great lengths preventing unnecessary work by skipping formats inside another [2]. We are using hyperscan [3] instead of grepping byte sequences with Python, which is orders of magnitudes faster. It can also handle 4Gb+ files because of this which binwalk cannot.

It's used for a year now in production and it's way more precise and faster than binwalk. We are getting less false-positives too, and even if unblob fails to extract everything, we still get meaningful information out of firmwares, where binwalk just failed with no output previously.

[1]: https://github.com/onekey-sec/unblob/blob/main/unblob/handle...

[2]: https://github.com/onekey-sec/unblob/blob/main/unblob/proces...

[3]: https://github.com/intel/hyperscan