C Programming Reading Binary File Converting It to Hex Output
Understanding Binary Information
Let's talk some Hex.
In this post I'grand explaining data as it truly exists in our computers. If you've e'er wondered how a sequence of 1 and 0 results in meaningful information, this article is for y'all.
Bits and Bytes
All information in a computer is represented as a sequence of ones and zeros. Depending on where the information lives — RAM, SSD, HDD, DVD etc. — ones and zeros are encoded differently physically, but conceptually it's two distinguishable states — and that's all at that place really is.
One such slice o f data — a cypher or one — is called a flake . Eight bits make a byte . Bytes are relevant as a unit of measurement, because — by and large speaking — they are the smallest addressable units of retentiveness.
Suppose you have 40 bits of data, and then 5 bytes:
01001000
01100101 01101100
01101100
01101111
You can asking to read or write data in one of those bytes by specifying an offset from the starting point. Yous e'er read or write one whole byte at a time, non individual bits. Therefore nearly information — barring pinch mechanisms — is encoded in bytes as basic units, not $.25.
Past the way, at the terminate of the article you lot'll exist able to become back to the 40 bits above, and figure out what they hateful. When you get there, effort coming back and expect at it. You'll figure it out. I promise.
Why is a byte viii $.25, and not 10?
A mixture of historical and practical reasons.
I'll try and hint at underlying practical concerns. Allow'due south take a step back for a second: for united states, multiplying or dividing by powers of x is trivial by just appending zeros or shifting the decimal point, right?
3.1415 ✕ 100 = 314.fifteen
That's true in our decimal system — a numeric system based on powers of 10.
Computers do binary arithmetic internally. Binary arithmetic is based on powers of 2. In binary arithmetic, powers of 2 like 8 = 2³ are equally convenient.
Let's look at an instance in binary arithmetic:
13 ✕ 8 = 104
corresponds to 00001101 ✕ thousand = 01101000
Information technology is just shifting the input by three bits to the left. Don't try to sympathise the encoding yet. Just observe that the issue was trivially computed by shifting digits to the left.
In summary: if the size of data units is a power of two, such every bit 8 = 2³, many internal calculations a figurer needs to exercise are much simpler than they would be otherwise.
How many bytes to a kilobyte?
Practiced question. In that location was no official standardization regarding kilobytes, megabytes, gigabytes, etc. until 1998. Nonetheless, these units were widely used among tech people and consumers alike. Y'all bought a difficult drive, the size of information technology was given in gigabytes.
People in tech were counting 2¹⁰ = 1024 bytes per kilobyte, 1024 kilobytes per megabyte, 1024 megabytes per gigabyte and so on. As nosotros learned, powers of 2 is the applied affair to use when yous're working with computer memory.
Famously, marketing people selling difficult drives had the ingenious idea that lacking an official standardization, they could only claim that 'their' units are based on 1000, and therefore 'their' gigabyte is thousand³ = 1,000,000,000 bytes — instead of 1024³ = one,073,741,824, so they could shortchange you on infinite. The bigger the drive, the more than of a difference.
You bought your 500GB hard drive and kept wondering why your computer kept claiming it was only 465.66 GB.
Well, standardization eventually happened in 1998. Unfortunately, corporate greed became the official standard, and the original units based on 2¹⁰=1024 became 'bi' units: kibibytes, mebibytes, gibibytes, etc. To make matters worse, the usual short forms, KB, MB, GB, etc. got absorbed besides. The original base of operations-two units are now KiB, MiB, and GiB.
Outside academia almost tech folks nonetheless understand a kilobyte to exist 1024 bytes. The software world got split, and some companies started adopting this 'standard', some did not. Some allow you choose.
But I digress.
Anatomy of a Byte
Let'southward expect at a sample byte value: 01101101
Read the value out loud and echo information technology from memory. Not super easy, is it?
When communicating binary data to humans, ones and zeros are terribly inefficient. Let'south divide a byte value into 2 groups of 4 bits each— these groups are called nibbles — and assign a symbol to all bit patterns a nibble tin take.
Instead of the fleck pattern, we can start using the symbols. Our byte value 01101101
becomes 6D
.
This is chosen hexadecimal annotation — hex for brusk — considering there's 16 different symbols. Likewise, since at that place is the potential of ambiguity with decimal numbers, hex values are often prefixed with 0x to remove that ambiguity. So we're looking at 0x6D
.
That'due south non an interpretation yet, just a more meaty way of writing — and maxim — the value.
Interpreting Binary Data
Any the ones and zeros stand for depends entirely on context.
"So what is the information content of 0x6D?"
"It'southward a blueprint of ones and zeros"
"Yeah, merely what does information technology mean?"
"It's only ones and zeros"
"I see"
If you lot were to interpret 0x6D every bit an ASCII graphic symbol, it would be 1000
. If yous were to interpret it as a two'due south complement number, it would be 109
. If you were to interpret it as a fixed point number with 3 fractional digits, it would be 13.625
.
When given a piece of binary information, you generally take to be given the context to brand any sense of it. For text files, it's the encoding, which these days is mostly UTF-8 — an ASCII-compatible encoding for text characters. For other types of files, say a PNG file, you have to look upward the meaning of the information in a corresponding file format specification.
Binary Integers
The starting time and default interpretation of a byte — or a sequence of bytes — is equally a positive integer. When interpreting byte values, the documentation will ofttimes imply that y'all know how to read bytes as integers.
From the PNG file format spec:
"The first 8 bytes of a PNG datastream always contain the post-obit (decimal) values:
137 80 78 71 13 10 26 10"
Then … what is the bit pattern for these decimal values? Permit's stick to our 0x6D
value for a while. We'll detect out. The hex value 0x6D
is the number 109
, just how do I know this?
We're used to our base ten decimal organisation. Simply the choice of 10
as base and 0–9
as digits is totally arbitrary.
We're but animals with 10 fingers on our easily, and that's making the quantity both familiar and mechanically user-friendly when counting — say livestock or sacks of rice. The number 10 is non inherently special, it's but applied to us as man beings.
The decimal organization
When we write numbers down, we write a digit in range of 0-9
for each power of our base 10
, starting with ten⁰ on the right. We're adding digits to the left for additional powers of x as long as we need them to express our number.
The octal system
We could go with a smaller base of operations of eight
instead using digits 0–7
. It's called the octal system. It'due south not a totally dreamed up example. Information technology's actually used. Perhaps you've seen it in the context of file permissions on Linux systems. Just as hex numbers are often prefixed with 0x to remove ambiguity, octal numbers are oftentimes prefixed by 0.
The hexadecimal system
So we went from x down to base 8. We can also go up, to say base 16
. Nosotros'll run into trouble with the digits, though.
Nosotros demand digits for 0–15
and by ix we don't accept established conventions. Let'south use the hex digits: A=x, B=11, C=12, D=13, E=14, F=15
The binary system
The pick of base is entirely arbitrary as long every bit nosotros take agreed-upon digits to paw. Let's go full minimalist and use base 2
, with digits 0–i
.
Integer summary
Now y'all see how 01101101
, 0x6D
, and 109 are
all unlike notations for the same integer. And at present it should make sense how looking at an ASCII table similar this, they are indexed using various formats. It'south but a courtesy so you can look upward the graphic symbol using an index in whatever system yous happen to accept at hand.
Binary Text
Open a text edit editor, create a new file, put "Hullo Globe" in it and salve it equally a elementary text file.
Now get a hex editor. A hex editor is just a file editor that does non interpret the contents of a file for you —information technology just shows you the raw binary content. Any basic, free hex editor is fine. You lot can utilise an online hex editor also.
Open your text file in your hex editor. Information technology will show you the binary content of the file. Information technology should await something similar this.
A hex editor typically shows y'all iii columns: the offset from the starting time of the file (ordinarily in hex), the binary contents of the file (as well in hex), and a third cavalcade shows the file's bytes interpreted as ASCII characters.
If your file contents have some additional bytes at the beginning, it's likely that your text editor saved a BOM in your file. Don't worry about information technology.
Call back our xl bits from earlier? You now know everything yon need to figure out what that says.
01001000
01100101 01101100
01101100
01101111
Going Deeper
We have the basics downwards. The following topics each deserve an article of their own. But we're already running curt on space, time, and reader'southward attention, so I'm pointing you lot towards resources to peruse at a later time.
I know you nonetheless have questions! 🤓
Proper Integers
Then far nosotros've talked about positive integers only, limiting ourselves to one byte at a fourth dimension. Just…
"How about encoding integers bigger than 255? There is not enough $.25 in a byte to store higher values. And what nearly negative numbers?"
Skilful questions. You use consecutive bytes of a fixed length — ii bytes, 4 bytes, and eight bytes are customary — and that extends the powers of 2 you have available. You use two's complement encoding for signed integers.
About non-toy calculators come up with normal, scientific and programmer views .Turn on the "programmer'due south view" on your calculator when looking into this stuff.
Once you start using more than than one byte to represent a value, endianness becomes important, especially when reading bytes from files or the network, so keep that in heed.
Fractional Numbers
So far nosotros've been talking about whole numbers only. What almost:
Ï€=3.1415926535…
In that location'south a standardized, hardware supported representation of floating point fractional numbers. The most commonly used versions are 4-bytes unmarried precision, ofttimes called 'bladder', and the 8-bytes version which is commonly called 'double'.
The general thought is as follows:
The sign is stored in the leftmost flake. 0 stands for positive, 1 indicates a negative number. Our example number is positive.
Shaded red, a fractional number is given equally i
plus a sum of selected negative powers of 2 — then halves, quarters, eights, etc.… In the example above we accept a bit set only in the quarters position. So the fractional number is 1.25
Annotation that the fractional number will always exist betwixt one.0
inclusive and 2.0
exclusive.
In the side by side step yous calculate an exponent given by the bits shaded dark-green. Our exponent is 124
. The standard dictates that we multiply our fractional number by two^(exp-127)
, and so in our instance we multiply by two^-three = 1/8
.
Finally he number encoded by our bit design is:
1.25 × 1/8 = 0,15625
You might be wondering how to effectively convert decimal numbers to floating signal numbers with this encoding. In practice, you never take to do it manually, and the ecosystem — like your programming language standard library — takes care of it for you.
Oh, and regarding π, endeavor interpreting this pattern:
0 10000000 10010010000111111011011
Yous might also be wondering just how precise the approximation to decimal fractions is, at present that we're using halfves instead of tenths. Many people do, even the folks who make Excel.
Decision
Our information is just ones and zeros. The style it is grouped and interpreted makes it meaningful. In this If you're ever asked to work with binary data, yous should now have enough information to properly understand the documentation that comes with it.
To examination your understanding, try looking at the shape file format spec and come across if it makes whatsoever sense.
C Programming Reading Binary File Converting It to Hex Output
Source: https://towardsdatascience.com/understanding-binary-data-fc4c78c9e677