Understanding Component Scanning

Black Duck Component Scanning is scanning functionality that provides an automated way to determine the set of open source software components that make up a software project. Component Scanning helps organizations manage their use of open source by identifying and cataloging components in order to provide additional metadata such as license, vulnerability, and project health for those components. Component Scanning lets users use the scanner to scan software artifacts on their local computers, which automatically generates a BOM that can be linked to a specific project in Black Duck.

Black Duck Component Scanning can extract the following archive types:

  • AR

  • ARJ

  • CPIO

  • DUMP

  • TAR

  • RPM

  • ZIP

  • 7z

Archives may optionally be compressed using any of the following compression algorithms:

  • Bzip2

  • Gzip

  • Pack200

  • XZ

  • LZMA

  • Snappy

  • Z (compress)

  • DEFLATE

During the component scan, Component Scanning examines similarities and differences between large clusters of files and can find:

  • Exact matches to unmodified archives and directories of open source.

  • Fuzzy matches to modified archives and directories of open source.

It scans an arbitrary file system directory or archive and matches to known components in the Black Duck KnowledgeBase (KB).

The core concept behind component scanning and discovery is the ability to compare the signatures of artifacts in the repository with the signatures of all OSS components in the Black Duck KB and quickly recognize a match. The recognition can be fuzzy—it does not need to be an exact match to be recognized. When there are multiple possible matches, Component Scanning determines the preferred match.

Component Scanning can discover and identify code that is:

  • Unmodified: A collection of files that have not changed since they were released by the open source project.

  • Renamed: A collection of files that have been renamed without other modification.

  • Compressed and/or recompiled: Jars that have been compressed and/or recompiled after they were released by the open source project.

  • Modified or rebundled: For example, with a jar:
    • Class files from more than one component jar combined into a single jar

    • Class files added to or deleted from a component jar

    • Nested component jars with jar files added or deleted

Component Scanning classifies each match based on how it was made:

  • Exact: Component Scanning identified the set of files as an exact match to a component in Black Duck KB.

  • File Dependency. Component Scanning identified a match via a file dependency.

  • Files Modified: Component Scanning identified a fuzzy match to a component in Black Duck KB, where some of the files were modified. Sometimes this is a match to a previous or subsequent version of the component, which may have been missing from Black Duck KB at the time that the match was made.

  • Files Added/Deleted & Modified: The component scan identified a fuzzy match to a component in Black Duck KB This can happen when:
    • An OSS component is matched, but some of the files associated with the component have been added, deleted, or modified. This can be a match to a previous or a subsequent version of the component, which may have been missing from Black Duck KB at the time of the match.

    • A component is only matched against a common directory structure (structure-only), but because a significant number of components share this structure, Black Duck KB may propose a match that has very little similarity to the scanned component.

    • A component is only matched against a common directory structure, but because proprietary or third-party code can share a common directory structure with components, Black Duck KB may propose a match that has very little similarity to the scanned code.

    The Black Duck KB contains a 'blacklist' of very common, non-unique, directory tree structures. For example, many components include a directory that contains three subdirectories: 'css', 'img', and 'js'. This structure has been blacklisted, so that Black Duck KB will not propose irrelevant matches.

Supported languages

For the current list of supported languages, refer to the list of supported languages shown in the Black Duck Detect documentation.

Individual file matching

Individual file matching is the identification of a component based purely upon the checksum information of a single file. Prior to Black Duck 2020.2.0, for a small set of file extensions (.js, .apklib, .bin, .dll, .exe, .o, and .so), regular signature scanning matched files to components based upon a checksum match to the one file. Unfortunately, this matching was not always accurate and produced a fair amount of false positives that required you to spend additional effort reviewing and adjusting the BOM. Therefore, individual file matching is no longer the default behavior and instead is an optional capability as of the Black Duck 2020.2.0 release.

This may cause some components to drop off your BOM, which may or may not be desired. Therefore, in the Black Duck 2020.2.0 release, Black Duck provides parameters in the scanning tools so that you can re-enable individual file matching. Refer to the command line parameters for the Signature Scanner CLI and Black Duck Detect documentation for more information.

ISO files

The Signature Scanner cannot scan an ISO file: you must first mount the file to your local file system and then scan the file system.

Supported package managers

Refer to the Black Duck Detect documentation for a list of supported package managers.

Scanning tools

Download, install, and scan using one of the following tools:

Tip: Review the Scanning Best Practices Guide for information on the best practices for scanning.