With the new architecture that the victi.ms project is adopting we’ve added a number of new repositories. Here is a quick rundown:
victims-cve-db is adding three new fields: package_urls, hash, file_hashes. package_urls will start out by accepting only web addresses for download locations. Eventually it will be expanded to also accept a valid package-url.
hash and files_hashes will be filled out by hashing service(s) after the file is merged into master. You can see some of the changes in this pull request.
Note that the new hash fields are not required as part of a pull request. They don’t need to be added as empty fields either. If the fields are filled out they will be replaced.
Reference Sync Client
There is a repo for a reference sync client that shows one way to sync content from the victims-cve-db to a local store. While this client may be expanded upon or even reused by other sync clients or scanners the primary purpose is to show a common way to keep a local store in sync with the remote data.
Victims Java Service
Jason Sheperd, the co-lead of the project, has done a wonderful job working on the Java hashing service. This service is intended to handle java archives, hashing the results, and providing back content to the caller. In our new architecture this will be used behind the scenes to update victims-cve-db entries with hash and file_hashes fields.
Victims Python Service
The victims-python-service implements the same interface as the victims-java-service but produces hashes for Python whls and eggs.
victims-bot is the maintainer of the victims-cve-db workflow. When a pull request is merged into the master branch the bot will be notified. The bot then grabs the package and submits it to the right hashing service. Once the file is hashed the data is added to the entry and pushed directly back to master where clients can sync the data at their leisure.
Older repos, such as victims-web, will not be deleted. They will continue to live under the victms GitHub namespace for the time being. A note will be added to the repositories README file to note that development has halted on those repositories.
As noted by previous posts victi.ms is undergoing architectural changes to meet the updated needs of users. In this post I hope to give a quick overview of what to expect over the next few months.
In victims v1 and v2 the API was front and center. The API was how clients retrieved definitions. This worked pretty well with v1. With v2 things started to get a little bit hairy as more fields were added which didn’t always match the needs of different package types. It also became apparent that having an API for retrieving content was not an ideal way to get content which may differ for different data streams.
This lead us to a new idea of how to structure the victims data. Many people are aware of the victims-cve-db. This repository houses information in a standard YAML format which can be used by clients who are scanning against versions. Instead of housing hash data in a REST API and YAML data in a git repository we’ve decided to move to using the victims-cve-db. This will mean we will be adding some more fields to facility hash information. This also gets the victi.ms project out of the external API business. Clients will now be able to clone the git repository and use/import the results in whatever form makes sense.
Basic Flow and Architecture
Instead of pushing everything through a central hosted database with API and web front ends we will be using microservices to update and release the victims-cve-db.
Adding to the database
- User submits a PR to the victims-cve-db filled out with everything but hash information
- The PR is reviewed. Once verified the PR is merged into master.
- A GitHub webhook is caught by the victims-bot which
- Clones the repo
- Looks at the merged item
- Downloads the relevant artifacts
- Submits the artifacts to the proper hashing system (example)
- The hashing system opens up the artifact and hashes the internals
- The hashing system returns the results back to the caller with any relevant metadata
- Updates the merged items with hashes
- Commits and pushes the results back to master
Clients can do one of the following to get content to scan with:
- Download the tarball for current master as one large data set
- Use git to clone/update content and use the new content
What About Old Clients?
We will be hosting the current output of the v2 API in a static fashion. This means clients will be able to snag the entire database if the need to do so. However, filtering by time frame or database version will not be supported. After 6 months we will decommission the static database download.
Some future additions we are considering include:
- Automatic submissions from trusted public sources
- Python/Ruby/Go hashing services
- Notification of content updates through some method
We believe this change in architecture will create a more stable project and let the community more easily grow the content. If you’d like to help don’t hesitate to jump in and lend a hand!