reader feedback 93
The Be working device file device, widespread with no trouble as BFS, is the file gadget for the Haiku, BeOS, and SkyOS operating systems. When it was created in the late ’90s as a part of the ill-fated BeOS mission, BFS’s forward-of-its-time feature set immediately struck the flowery OS geeks. That function set comprises:
- A 64-bit tackle area
- Use of journaling
- tremendously multithreaded reading
- aid of database-like extended file attributes
- Optimization for streaming file entry
A dozen years later, the legendary BFS nevertheless merits exploration—so we’re diving in today, starting with some filesystem basics and moving on to a discussion of the above points. We also chatted with two people intimately general with the OS: the person who developed BFS for Be and the developer in the back of the open-source version of BFS.
a little historical past
BFS was created in 1997 through Dominic Giampaolo and Cyril Meurillon, both of whom labored at Be. It was designed to be multi-threaded and lightweight, and to aid excessive-volume, streaming multimedia. It turned into additionally designed to aid the database aspects of the old Be file equipment. besides the fact that it become written at a time when systems customarily had only 8MB of RAM and a mere 9GB of disk storage, many of the forward-pondering design choices made then are still legitimate today.
BFS did not fairly conclusion when Be shut its doors after failing to get purchased through Apple. In 2002, Axel Dörfler re-carried out BFS for Haiku as an open-source assignment. The final part of this article elements an interview with Axel.
before we can talk about what made BFS so particular, we first should cover some file equipment fundamentals.
File system basics
at the primary stage, a file device exists to control the information on everlasting storage devices. features common to most file programs encompass:
- creating data and directories
- Opening, studying, writing, deleting, and renaming info
- studying, writing, and updating file metadata or attributes
further points encompass symbolic hyperlinks, entry handle lists, and reminiscence mapping.
For a high-stage overview of a large number of file techniques, discuss with the article From BFS to ZFS: past, existing and Future File methods. For a extra in-depth look at HPFS, NTFS, EXT2, and XFS circa 2000, consult with chapter three of purposeful File equipment Design.
the following phrases are general in file device discussions, so make sure to look through the listing and familiarize yourself with any you do not already know:
Disk: A everlasting storage medium of a undeniable measurement. A disk also has a sector or block size, which is the minimal unit that the disk can examine or write. The typical block dimension for disks has been 512 bytes, but it now tactics 4096 bytes for more moderen disks.
Block: The smallest unit writable via a disk or ﬁle equipment. every thing a ﬁle equipment does is composed of operations done on blocks. A ﬁle gadget block is at all times the equal size as or larger (in integer multiples) than the disk block measurement.
Bitmap: now not an image, however a knowledge constitution that determines which blocks on a disk are free or used.
Partition: A subset of all of the blocks on a disk. A disk can have several partitions.
extent: The name given to a set of blocks on some storage medium (i.e., a disk). this is, a quantity could be all of the blocks on a single disk, some element of the full number of blocks on a disk, or it may possibly even span distinct disks and be all the blocks on a couple of disks. The term “volume” is used to consult with a disk or partition that has been initialized with a ﬁle gadget.
Superblock: The enviornment of a volume where a ﬁle gadget retailers its important, quantity-broad tips. A superblock constantly carries assistance equivalent to how enormous a quantity is, the identify of a extent, and so on
Metadata: A time-honored term relating to guidance that is ready whatever thing but no longer directly a part of it. for example, the measurement of a ﬁle is very important assistance, however it is not a part of the statistics that’s kept within the file itself. So file metadata is records about a file, and never in a file.
Journaling: a technique of insuring the correctness of ﬁle equipment metadata even in the presence of vigour screw ups or unexpected reboots.
I-node: The location the place a ﬁle gadget shops the entire necessary metadata a couple of ﬁle. The i-node also offers the connection to the contents of the ﬁle and another information associated with the ﬁle. The time period “i-node” (which we can use in this article) is ancient and originated in Unix. An i-node is also referred to as a ﬁle control block (FCB) or ﬁle list.
Attribute: a name (as a textual content string) and value associated with the identify. The cost may additionally have a deﬁned class (string, integer, and so forth.), or it might just be arbitrary information. (1)
Now that we now have obtained our file system basics down, let’s look at one of the elements that make BFS entertaining.
First off, BFS’s sixty four-bit addressing capacity that no rely how gigantic disks get in the future, you should be in a position to format the entire disk with BFS. you can create partitions in excess of eight exabytes and, counting on the block dimension used, which you can create information that are improved than 30 GB in measurement.
one of BFS’s most important and widely touted points is its assist for prolonged attributes. An instance of the magnitude of attributes is illustrated with an instance of MP3 information. tips fields vital to an MP3 file would be: music title, band, album, free up date, encoding price, length, variety of instances played. in case you are looking to affiliate this guidance with every MP3 file using a traditional file system, you might must create your own database to guide browsing, developing, updating, or deleting these attributes as your track assortment grows and alterations. With BFS, in contrast, these attributes, or every other attributes, can be introduced to the file equipment itself. This means that a software for modifying or enjoying MP3s does not need to create or keep a database, because the file gadget will deal with these functions for you. BFS supports associating attributes with a file, both beneath application manage or from the command line. Attributes may also be searched and sorted by the file device, as an extension of any software. How here’s carried out will be mentioned in element later.
BFS supports the capacity to create a persistent or ‘are living’ question that watches for file adjustments. here’s a question that hooks into the file equipment, checking for information that match search standards. under Haiku, these queries are effortless to create and notably light on device elements.
BFS is journaled, which skill that it maintains song of certain file system consistencies on-the-go, and doesn’t need file device consistency tools like fsck or chkdsk. Journaling also contributes to quicker device booting after an surprising shutdown.
Internally, BFS uses UTF-eight characters for directory and file names. This capability that you would be able to use basically any language, natively, in Haiku. with out a additional effort that you can localize file names to chinese, German characters with umlauts, or cursive Arabic.
BFS offers particular performance consideration to giant file entry. growing and studying gigantic video, audio, or image data are optimized operations beneath BFS.
BFS structure: the Superblock
The superblock is usually the highest-stage statistics constitution for a file system. The BFS superblock describes the actual disk, the journal area, and indexing. For obtrusive efficiency explanations, the superblock is kept in RAM after the system boots.
typedef struct disk_super_block char name[B_OS_NAME_LENGTH]; int32 magic1; int32 fs_byte_order; uint32 block_size; uint32 block_shift; off_t num_blocks; off_t used_blocks; int32 inode_size; int32 magic2; int32 blocks_per_ag; int32 ag_shift; int32 num_ags; int32 flags; block_run log_blocks; off_t log_start; off_t log_end; int32 magic3; inode_addr root_dir; inode_addr indices; int32 pad; disk_super_block;
The identify struct holds the file gadget name. Three magic numbers are used for consistency checking as well as version numbering. Fs_byte_order holds the byte ordering, and block_size holds the express byte count number; block_shift, used as an exponent of 2, will additionally calculate the block size. here is a purposeful redundancy used for consistency checking of a file system. Num_blocks holds the variety of available blocks for the file device, and used_blocks holds the number presently in use. The flags container determines whether the state of the superblock is clean or dirty. Root_dir facets to the basis of all data and directories. Indices points to the starting of the index component to prolonged attributes. The bfsinfo utility will also be used to dump the superblock for a device’s superblock.
Like most UNIX-derived file systems, BFS makes use of a node structure to hold music of disk allocations. The bfs_inode constitution is described beneath. The i-node handles basic file metadata together with file creation time, owner, measurement of file, neighborhood possession, where the file information is saved on the disk, and so on. BFS does not replace file size unless a file is closed. In testing it become discovered that this gave a considerable performance gain.
typedef struct bfs_inode int32 magic1; inode_addr inode_num; int32 uid; int32 gid; int32 mode; int32 flags; bigtime_t create_time; bigtime_t last_modified_time; inode_addr guardian; inode_addr attributes; uint32 class; int32 inode_size; binode_etc *and many others; data_stream statistics; int32 pad; int32 small_data; bfs_inode;
A magic quantity is used once again in bfs_inode for sanity checking and blunder healing. The magic numbers for BeOS and Haiku are the same for compatibility, however distinct for the SkyOS implementation. BFS makes use of the file’s disk sectors as the i-node price, making sector mappings a straight search for. UID, GID, and mode are used to keep POSIX compliance.
The i-node constitution holds the basic attributes of a file but no longer the specific file data itself. here’s done in the course of the information member structure. The statistics member is defined with the aid of the data_stream struct:
typedef struct data_stream block_run direct[NUM_DIRECT_BLOCKS]; off_t max_direct_range; block_run oblique; off_t max_indirect_range; block_run double_indirect; off_t max_double_indirect_range; off_t dimension; data_stream;
The data_stream struct maps statistics from the physical disk to the file flow API. access using data_streams is optimized for throughput, bypassing the system cache, and the usage of DMA into and out of user memory. In benchmark exams, BFS is able to obtain excessive throughput that processes top disk transfer charges.
Database capabilities the use of extended attributes
As outlined earlier, dealing with attributes is a crucial file equipment function. The Mac HFS was the first file equipment to appreciably use file attributes in help of GUIs. believe that a windowed OS should persist and control many GUI attributes similar to frame measurement, place, coloring, textual content, etc., and desires to optimize entry for a brief response time.
BFS supports extended attributes in the kind of key/cost associations with info. Keys have a set classification and may be introduced at any time. valid varieties are string, time, double, waft, int, boolean, uncooked, and photograph. If a key’s listed, fundamental browsing on a secret’s tremendously optimized. the following are command line tools for managing extended file attributes.
- addattr key value filename: provides the key/cost pair to the exact file
- catattr key filename: displays the precise fieldname price.
- listattr filename: lists a file’s associated attributes, their types, and their sizes, in bytes.
- rmattr key filename: eliminates an attribute from the precise file.
New fields are created globally with mkindex. for instance,
- mkindex –class=indextype index: creates a brand new index, of class long, on the current quantity.
- reindex sourcefile filename: provides a file’s key to an index if the index is created after the file’s attributes.
- rmindex index: removes an attribute from the present extent, making it unavailable for use.
- lsindex: lists all attributes
In Haiku, access to file attributes is supported graphically in the course of the Tracker, in addition to with keyboard shortcuts. Any object in Haiku that has a graphical representation has the _trk/pinfo_le attribute set with its file icon position. The BEOS:type attribute holds the utility affiliation for a file. greater attribute usage aspect may also be discovered here.
additional information and examples of the use of file attributes are supplied here and right here.
shopping with question
query is the command line tool used to search for matching attributes. it is more convenient to make use of than the “find” UNIX command line utility.
listed here are some examples of the question syntax:
query "((identify="**")&&(BEOS:type=="audio/x-wav"))"– finds all data with MIME classification of “wav”. useful in case you have wav data which are missing the .wav extension.
question "(last_modified >= %the day gone by%)"– finds info older than the day past
The output from query can be used with scripting equipment from the command line as follows:
contact 'question ((last_modified< %the day past%)&&(BEOS:classification=="audio/mpeg"))'
this may update the last modified time on all MP3 files with a last modified time older than the day gone by.
more information on scripting with query is provided right here.
Its Haiku GUI counterpart is “find,” present in the Deskbar and documented right here. All locate queries are cached for 7 days and seem in the drop-down record.
are living queries
An particularly pleasant feature of BFS is the reside query function. When the use of locate, drag a question name from the select/prefer listing and drag it to the computing device or to Tracker. This hooks the query into the file device and saves it. Any time a file matching the query criteria is created or deleted, the query list is up-to-date. reside queries are supported natively in BFS and use highly little supplies.
applications use attributes
The Haiku Mail equipment is an instance of an utility that makes extensive use of attributes. Haiku mail doesn’t have its personal database for storing and managing email data. in its place, it stores every electronic mail directly into the BFS file device and makes use of greater than 15 e mail-specific attributes for searching and sorting. are you able to imagine having 8 exabytes value of e-mail? Haiku makes this theoretically possible.
here’s an excellent illustration of abstracting functionality out of particular person purposes and finding it within the working system, or in this case the file device. as a result of BFS supports extended attributes, any utility can use the effective database means of the file gadget without having to reinvent the wheel.
Optimizing attribute lookups
on the grounds that each file on BFS may probably have many extended attributes, and there could doubtlessly be tons of of heaps of files, there’s a great deserve to optimize the access of attributes. In BFS, every index is applied as a B+tree facts structure. The B+tree is a balanced, sorted tree statistics structure that provides very fast look up and scales up very well. no longer exceedingly it’s additionally used to manage listing structures and it is commonly used in other file programs, akin to NTFS. BFS indices are very optimized for lookups of the kind:
question “(identify >”111”)”
BFS indices are not optimized for usual expression searches of the form:
These searches degrade from a binary look up to a sequential lookup and will doubtlessly take a great deal longer in a huge file equipment.
Queries that use a daily expression but delivery with a hard and fast expression are optimized in Haiku, e.g.,
query “(identify==temp)” will run quicker than
computers with disk storage can experience many forms of failure modes. The magnetic material of a disk sector can go unhealthy, the servo mechanism that strikes the disk head can fail, the electronics that interface with the computing device can die, or the user can reboot the computing device within the core of a disk write operation. in case you run a disk long adequate, ultimately the electronics or mechanics of a disk will fail. You frequently don’t must wait very long for an impatient person to reboot all the way through a file write operation. this is unluckily a reasonably general incidence and it could have devastating outcomes on a file device.
consider here situation. A consumer has created a file and it’s the manner of saving it to disk. maybe it’s a developer engaged on an overdue assignment. The file device have to make a few updates earlier than this file is fully saved. It need to first save the contents of the file. It have to retailer the metadata for the file, creation date, proprietor, file measurement, etc. It ought to additionally update the superblock. agree with what occurs when a device powers down earlier than these operations complete. apart from losing your positive work, the statistics constructions can become corrupted and element to i-nodes and blocks that don’t exist. apart from being much more late, your file equipment may due to this fact fail as well, losing your entire different data.
Journaling, or file gadget logging, alleviates some of those complications. while journaling doesn’t make sure that a premature reboot gained’t lose your file, it does be sure that it won’t corrupt your file gadget.
agree with right here illustration of how journaling works. Let’s say a person creates a brand new file and saves it. The information must be saved to disk, and a brand new directory entry with proper metadata have to be saved as well. earlier than these disk operations turn up, BFS locks the unwritten blocks into RAM, and makes log entries within the file system journal for each block about to be written. After each and every block is correctly flushed to disk, the journal entries are marked as accomplished. If the gadget shuts down before the blocks are efficiently flushed, the journal entries will record the incomplete operations. When the system restarts, it inspects the journal for entries that aren’t achieved. The ultimate entries can be ‘replayed’ when the system reboots to efficaciously comprehensive the write operation, or the entire factor can also be “rolled again,” and the operation aborted. both approach, the file equipment is left in a constant state the place the listing structure and metadata precisely suit the file information.
In BFS, the journal logs any changes made to the directory, the bitmap block, i-nodes, and extended attributes. It doesn’t journal the precise consumer facts. during this way, journaling protects file equipment consistency however doesn’t deliver information healing points like a redundant disk array, or RAID.
while now not immune to all disk mistakes, BFS is proof against the ordinary failure mode of a premature equipment shutdown. File methods with out journaling, equivalent to Linux EXT2, are susceptible to file device inconsistencies and depend on prolonged scanning tactics for recovery. BFS doesn’t want disk scanning, and Haiku can start up without delay after a untimely shutdown.
advancements of Haiku BFS over BeOS
Haiku’s version of BFS has a few improvements over the common BeOS BFS implementation. The B+tree is more mighty. Haiku BFS uses a file cache for file information moreover a block cache. This resulted in a factor of 10 pace growth. Haiku’s BFS implements popularity modified time for information, and also has greater satisfactory-grained file fame means. The POSIX atime file became left out from BeOS BFS for performance’s sake. Haiku BFS includes a question optimized for hybrid regex that permits mixing of a static string with an everyday expression. New inspection equipment bfsinfo, bfswhich, chkindex, and get better have been introduced for Haiku BFS. The reindex command became added to enhance indexing of extended attributes.
a short interview with a Senior Developer at BeOS, who labored on BFS
(He asked that his name not be used, to comply with the wishes of his present supplier)
The e-book practical File device Design describes the BeOS building ambiance as being short on time and scarce on developers. What have been some of the fantastic elements of that atmosphere?
It become just a enjoyable vicinity to work at the time. all of us bought alongside definitely smartly, loved what we had been doing and all fed off of each and every different’s energy. everyone was properly-notch and it became just non-stop. constantly you get one or two of those features in a company, however in case you have all of them it be a really intoxicating ambiance: exchange is speedy, progress is good, you believe such as you’re in fact doing anything vital and it be simply a lot of fun.
From concept to release, how lengthy did it take to code and debug BFS?
Ten or eleven months. I had support from one other engineer to put in writing the code that dealt with writing to the indirect blocks in a file.
What equipment did you employ to increase and debug BFS?
Emacs, make, and gcc. initially I did some prototyping in person space and as a consequence changed into in a position to use gdb however once it went into the kernel it turned into all “Welcome to Kernel Debugging Land” from there on out.
What turned into the largest challenge in setting up BFS?
trying to get it accomplished in such a brief timeframe whereas assisting all of the aspects we wanted.
Did BFS have an impact on the design of any subsequent file techniques?
What are some sizzling topics of file programs these days?
statistics integrity and facts de-duplication are probably the largest areas of hobby right now. individuals are also spending lots of time making an attempt to figure out the way to deliver reputable storage in the face of unreliable components. RAID-5/6 are adequate, but because the size of drives go up, lots of people are involved that when a failure occurs, they are prone to a total failure if yet another force fails before they are finished with reconstruction.
What new elements or activities will the next era of file systems assist?
i am not sure. I believe it will be a little while yet before we form out which layer is the right vicinity for the entire different ingredients of the difficulty of “storage.” ZFS has gone down the course of inserting every thing in the FS. different folks are putting in a distinct set of things. or not it’s relatively clear that one of the vital performance that existed in BFS does *no longer* belong within the FS.
in your opinion, what science fiction film has the most excellent use of a laptop?
Hmmm, not certain. big name Trek? Their computer systems do everything and have a multitouch interface and that they don’t spend loads of time futzing around with them.
a quick Interview With Axel Dörfler, Developer of Open supply BFS
before BFS, did you have got old experience working with file programs?
I needed to write an application to recuperate some statistics from a BFS partitioned complicated disk. That gave me all the advantage I vital to jot down BFS.
What become the toughest part of rewriting BFS?
Making sure the B+tree implementation behaved appropriately, because the one used within the normal BFS (as mentioned in Dominic’s publication) became fairly unstable, and inefficient.
What become the least difficult part?
The actual BFS examine-only implementation, as I may reuse many of the code I had written for healing.
Did you find any surprises when rewriting BFS?
sure, there are reasonably some illogical issues like the log the use of block_runs, however requires them to have a length of 1, or the basically superfluous double oblique block implementation.
Did you ever have any discussions with Dominic about BFS?
yes, however they didn’t in fact element its flaws, reasonably my implementation.
What would you change in BFS 2.0?
a great deal.
How long did the coding take?
I honestly do not bear in mind. I consider the read-best part took only two weeks or so, whereas the write part took lots longer, and it took still longer to make it entirely compliant to Be’s BFS.
What became the largest malicious program and the way did you resolve it?
The most annoying bug was the “vnode identification already exists” malicious program – there were like a dozen or so the reason why this one could happen, and that form of minimized the pleasure of getting discovered an additional problem with it (simplest a part of them have been in BFS itself, even though).
or not it’s nonetheless there in a brand new incarnation as bug #5262, BTW, besides the fact that children this may have fully unrelated motives (like reminiscence corruption).
What training did you learn from rewriting BFS?
That it’s tons easier to have a working VFS when writing a file system.
finally, what is your present building computer?
reckoning on the place i am it’s either a ThinkPad T60, or two yr old Core 2 Quad with onboard Intel pix.
this article relied closely on the e-book practical File gadget Design with the Be File system. It’s seldom that a file gadget is documented in such aspect and in this sort of readable fashion. also because of Axel Dörfler for fact checking.
- purposeful File device Design with the Be File device, (1)
- Haiku File gadget Modules
- BFS File equipment development kit
- From BFS to ZFS
- Haiku Attributes
- The BeOS Bible
- Scot Hacker
- File systems in Linux
- EXT2 File gadget
concerning the author
Andrew Hudson is a contract technical project manager dwelling in Florida, u . s . a .. In his considerable spare time he enjoys exploring caves and restoring antique vehicles.