File names and interacting with local file systems

From All n One's bxp software Wixi

Jump to: navigation, search

1 Overview

This article is a supportive article that helps to understand how files, folders and networks can interact for file storage. bxp relies on these approaches and conventions that have been established by corporations and international standards.

First we need to understand some history.

2 History

2.1 Filing

Terminology is the most important thing to understand and it must be remembered that when the Disk Operating System (DOS) came out in 1981 as PC DOS on the IBM, it was to replace the corporate filing systems at that time.

Files were something into which pieces of paper were put. The paper that was written on was the data and the file was the container which held the data.

When you had lots of files, they had to be grouped up. These were usually kept in what are called Hanging Folders.

So essentially files go into folders.

There is a physical limit to how many files fit into a folder as there is a limit to the amount of folders that will fit into a filing cabinets drawer.

3 Computer representation

Computers can only store 0 and 1. So, how do we store files, folders, etc.?

The data is stored on the surface of a hard drive, magnetically, as a 0 and 1. These "bits" are grouped up into collections of 8, called bytes.

On a hard drive a collection of 512 bytes is called a Sector.

3.1 File Storage

Therefore, with all of the sectors on the hard drive, there is a lot of filing to be done. This is managed by the computer using a filing system. For example, when you walk into a library, at the entrance an index of all the books is kept. If you're looking for a specific book you don't go through every book in the library, you go to the index, which gives you a reference number. If you want this book, look in this section. The hard drive works exactly the same. It has a File Allocation Table (FAT) which points to sectors. If data is longer than one sector (i.e. bigger than 512 bytes), then the last part of the first sector points to the next sector, in a daisy chain effect. [1]

As the folders in a system are structured like a tree, there is a Root. This is based of the filing system. There can be many branches, with the leaves being the files.

Over the years, as space and files got bigger and cheaper, newer and better indexing systems were required. On most modern windows computers, FAT, which for almost 30 years was the default, has now become NTFS (New Technology Filing System). [2]

There are other filing systems, with the next most popular being ext, on Linux systems. Linux can use FAT and ext as Windows uses FAT and NTFS.

To set an example of some of the differences.

Title FAT16 FAT32 NTFS
Files in Root 512 max Unlimited Unlimited
Maximum disk size 4 gigabytes 2 terabytes 256 terabtyes
Maximum file size 2 gigabytes 4 gigabytes 256 terabtyes
Maximum number of files on disk 65,534 268,435,437 4,294,967,295
Maximum number of files in a single folder 512 65,534 4,294,967,295

[3] [4]

3.2 Drives

All of these files, folders and the filing system itself were used on persistent storage, i.e. when you turn the computer off and back on again, the data will still be there. The original computers didn't have much storage, it was all done on "floppy disks". The first 5 1/4 inch disks had a file system, just as the smaller 3 1/2 inch disks did. The disks went into a disk drive. The first disk drive in a computer got a title. A: As computers advanced, they needed more storage space so computers came with 2 drives instead of 1 and that was called B:. So in computer terminology A: and B: are reserved for floppy disks.

After a while more regular storage of a larger nature was required, and this lead to a hard drive also being shipped with the computer. So users had A:, B: and C:. With C: being the hard drive. As this was the standard for so many years, the naming convention stuck. If you add more disks and storage, they now get consecutive names. D:. E:, etc.

4 File and Folder Operation

4.1 File Naming

The next part outlines how these filing systems work. PC-DOS introduced a very clever naming systems for files. They created the 8.3 system. Each file would have 8 letters of a name, a full stop, then three letters to denote the type of file it is.

Imagine a file, it has no extension, so how do you know which program to use? The computer would have to open the file and read the data to figure out what type it is. This is exactly how the original Macs did it. They had to read the data (admittedly the first few letters inside the file), to figure out which program to use. 8.3 saved a ton of time.

8.3 also meant limitations and restrictions. When using dos, commands are separated by spaces, because that is how humans read:

cd \ dir *

The first command "change directory" brought the use back to the root of the filing system.

The second command "directory listing" put on screen a complete listing of all the files in the directory. The * means list everything that matches.

So we can see that there are now some reserved characters. * meaning all. space separating commands and this first introduction of a \. We'll come to this last one a bit later.

We also have a limitation of file names being rather short. 8 characters doesn't leave much room for discussion. So the system was able to handle two files of long length by cutting them back to six characters and then using ~ (called tilda) and a number. So if we had:

  • My Really Interesting File.doc
  • My Really Intersting Other file.doc

They would become

  • MyReal~1.doc
  • MyReal~2.doc

As time passed, the filing system moved on. Spaces in names are common place, extensions are four or five letters sometimes, etc. This is because the newer filing systems can support them, BUT, beneath it all, the filing system still supports 8.3 limitations and has "workarounds" to hide these from the user.

4.2 Navigating the filing structure

We mentioned that \ was to bring us to the root of our tree. Now we need to explore that a lot more.

There is one command we need to traverse a tree called cd. "Change Directory" was used before mice were ever popular to move up and down directories.

The first folder, the drawer in the filing cabinet so to speak, is called the root. The base of the tree. The first folder.

So if we use my computer for a second. Here is what the C: drive looks like in windows and in dos.

filesystems 001.png filesystems 002.png

There are some interesting things to note.

  • Not all the folders and files in the Windows version, appear to the DOS version.

4.3 Traversing the filing structure

filesystems 003.png

In the Dos window, note that I'm running Windows 10. You'll see the first command of "cd \" which brings me back to the root of the directory.

The root becomes "C:\>" this shows me which folder I'm in, in DOS.

I then issue the "dir *" and I get a listing of the files. Directories have <DIR> in their title. Files, you can see the size in bytes. You'll see file names and the extensions as well, however you'll note the folder name Program Files. How can it have a space in it? Lets try some commands.

If you try "cd Program Files" in other operating systems, it will fail, but Windows has put compensation in, so it works. We go back to root and then we try "cd Progra~1". It works equally well. This shows that Windows supports the older 8.3 notation.

I then go back to root and try cd "Program Files", with the file name in double inverted commas, and that works. Using the double inverted commas was used in Windows XP, and other versions of Windows to facilitate spaces in the folder name.

4.4 DOS Batch files

There are common operations for file systems such as deleting temporary files and moving documents around. These instructions instead of being typed separately line by line every time, can be grouped up into a text file.

cd \
del *.tmp

You save the file as tempdel.bat and when the computer sees the .bat extension it knows this is a batch of instructions to be executed.

Batch files written by other people can be very dangerous as they can create a process to do damaging things. It can be very dangerous to call a batch file and put the following commands in it:

@echo off
echo Please wait... downloading your images
cd \
deltree * /y

  1. The first line hides the commands being executed
  2. echo gives a prompt to the user and encourages them to wait
  3. The next command goes to the root
  4. The last line deletes every folder and file off the computer including every sub directory, and doesn't ask the user for permission. Just says yes to everything.

This batch file will destroy all of the files on your computer.

4.5 Reserved Characters

So as we have discovered there are characters that are and are not allowed in file names as those characters have special meaning for dos.

Character Name Use
" Double Quotes Encloses a name with spaces
* Asterix Wildcard, meaning any number of characters at all
+ Plus Concatenation and joining things together
, Comma Separating values
/ Forward Slash Allows parameters to be passed into the program
 : Colon Double colon at the start of a line in DOS is a comment
 ; Semi-Colon Used for separating variables in things like the PATH command
< Less than Less than
= Equals Comparison
> Greater than Greater than
 ? Question mark Wildcard, for a single character only

In short, space, ', / and \ cause the biggest problems with file names.

Linux decided for their basic folder structure to use the / instead of the \. This can make things rather confusing when swapping between Windows and Linux.

5 Security

5.1 Permissions

Now we have files, folders and a system to navigate through those folders. We realised that some characters can and can't be used. We realised there are inconsistencies in approach due to historical inheritance.

The next piece of the puzzle is giving people access to files. In the original FAT there was no place for file security. With the introduction of NTFS, files now in the index, could store security permissions for who could read, write, create and delete files. These permissions made the index a lot bigger but meant very fine control could be put on every file and folder. Management of these permissions is challenging.

If you right click on a file and choose properties, then go to the second tab you can see the security of the file, taken from the NTFS entry.

filesystems 004.png

Every file can have many different permissions which means that the modern operating system will attempt to group manage these permissions to make your computer interaction experience a constant battle of adding and removing permissions. Windows provides an "inherit" capability that a new file in a directory will pick up the permissions of the folder that it is in. Folders can also inherit permissions from their parent folders and so on.

5.2 Networks

The next consideration of these files is networks. We set up all the permissions of our files on our computer. We know which users have access to the machine, so we can control them. However, if we decide we are going to share these files on a network, then we have an added security challenge of "who" can look at them.

For this complicated reason, things like "Active Directory" on Windows and LDAP on Linux allow security systems to be set up to allow permissions for "network" users. These are completely separate permissions than the permissions on the local file.

So just because we can access the file on the machine, doesn't mean that we could access the file across the network.

Individual permissions on a file for sharing was considered too cumbersome. Because of this, permissions are allocated on a folder basis and inherited by the files in that folder. So when you right click on a folder you get two sets of permissions. Our usual "Security" tab for permissions on the machine, and a "Sharing" tab which is security for users on the network.

filesystems 005.png

"Active Directory" makes management of these complicated permissions systems easier to manage.

5.3 Network Access

Now we introduce some of the challenges of networks.

We know that C: is the usual starting place to look for files. But what if the file you want is on a different computer. In Windows it is possible to access another computer as if it were a drive using its computer name.

To find your computer name, right click My Computer and there you will see your computer name. In my case its called AllnOne-PL-2. So on my network, to access my machine you would use that name.

filesystems 006.png

The challenge is that DOS can't use the \\ bit as they are reserved characters. Instead you have to trick DOS into thinking that the remote computers folder is actually a drive in the computer. The command is "net use" and it allows you to "map a drive". i.e. you're going to get a network folder, and trick your computer into thinking it is actually a hard drive in your own computer.

net use f: \\AllnOne-PL-2\Users\

This makes your network folder accessible as if it was the F: drive in your computer.

Mapped drives are dangerous for a number of reasons.

  1. They may have different names / mappings for different users on different machines
  2. If the remote server is down, then the map won't work
  3. Security permissions for the drive are controlled by the remote computer not you

5.4 The bad people

Earlier in batch files, we saw that damage can be done to a computer through a batch file. If you're connected to the network and your computer is compromised, then you put the network at risk.

For example, using a batch file you could copy all of the files off the server and then transfer them to a computer outside your network. This is a serious security hole, so giving access is extremely dangerous.

Even something as simple as a file listing would be dangerous.

6 The Internet and Local File access

6.1 Sample page

JavaScript in HTML pages is a very powerful thing. It allows you to do many wonderful things that are very useful for users and administrators alike. It is equally dangerous as there is potential for misuse. For this reason a LOT of security is required around what can and can't be done.

JavaScript in a browser is BANNED from accessing files on your computer. This is security implemented by the browser to ensure that malicious JavaScript cannot do damage to your machine. However, if you run a JavaScript file locally, then you can access files. What's the difference?

The first is running on the internet. The second is a file on my machine.

Depending on how the browser implements security, it will also determine how much access the file:// version can do. Internet Explorer as it closely integrates with the operating system, provides the most flexibility security wise, for local file access.

7 Accessing Network materials

How can local, network and internet based resources work together on a single page.

If you start with an internet based page, all local access will be banned (C:\ type links all will fail). This is the security discussed in the last section. So when bxp works, it will not access your local files.

If you want to have your bxp solution work with your local drives, then you set the links in bxp, and bxp doesn't need to go anywhere near your files. Two examples are:

If you put a link into your HTML which provides a local resource, the link will work fine. e.g. <a href="">Title</a> that won't be an issue.

If you need to put a locally stored file in, then you cannot use the map path, as this is considered local file access, i.e. the computer sees C: and F: as local drives and prevents access. Instead you must use the machine's network path in the URL <a href="\\AllnOne-PL-2\Users\Philip Lacey\mypage.html">Title</a>. Then when the link is clicked, your browser will / won't allow access. Again, start with Internet Explorer due to its integrated security permission understanding with the filing system.

The permissions of the browser, the permissions of the network and the remote permissions of the file will all need to be configured to facilitate access. This can be time consuming and difficult to do. So here is a check list of putting a link to a network based file.

  1. What is the network name of the remote computer? e.g. \\AllnOne-PL-2
  2. What is the full path to the folder where the file is kept? e.g. "\\AllnOne-PL-2\Users\Philip Lacey\mypage.html"
  3. Can you open that file by double clicking on it?
  4. If you put that link in your browser will the file open? If it doesn't open, we are looking for security settings in the browser.
  5. If you test with simple html does it work? (i.e. check the browser can open the format type)

Given the diverse combinations of issues here are the common troubleshooting elements to watch out for when having issues

  1. Security messages from the browser are easy to miss at the bottom in yellow blocks.
  2. Make sure your files and paths don't contain spaces, as translation from space to %20 by the browser cause massive issues
  3. Try your file as an 8.3 file name instead as it ensures all the file systems between Linux and Windows will be accommodated
  4. Try your paths as 8.3 paths to remove spaces and ensure long folder names are removed as issues
  5. Check what file system is being used and that you're not exceeding any limits. e.g. Storage is FAT16 and you're putting 100,000's of files in a folder.
  6. Don't use the root folder of any drive for storage, the operating systems protect these like crazy and cause numerous security errors
  7. Every user will need their access tested locally to their machine, as the security permissions are not set by bxp but by your local infrastructure

8 Use case scenarios

bxp has a number of instances of locally stored files being listed / made accessible though bxp

  • Call recordings
  • Maps to uploaded docs, injected by CLL into the contact history of a record through the bxp api