19 October, 2017

Commonly asked Computer Networks

What are Unicasting, Anycasting, Multiccasting and Broadcasting?

If the message is sent from a source to a single destination node, it is called Unicasting. This is typically done in networks.
If the message is sent from a source to a any of the given destination nodes. This is used a lot in Content delivery Systems where we want to get content from any server.
If the message is sent to some subset of other nodes, it is called Multicasting. Used in situation when there are multiple receivers of same data. Like video conferencing, updating something on CDN servers which have replica of same data.
If the message is sent to all the nodes in a network it is called Broadcasting. This is typically used in Local networks, for examples DHCP and ARP use broadcasting.
What are layers in OSI model?
There are total 7 layers
1. Physical Layer
2. Data Link Layer
3. Network Layer
4. Transport Layer
5. Session Layer
6. Presentation Layer
7. Application Layer
What is Stop-and-Wait Protocol?
In Stop and wait protocol, a sender after sending a frame waits for acknowledgement of the frame and sends the next frame only when acknowledgement of the frame has received.
What is Piggybacking?
Piggybacking is used in bi-directional data transmission in the network layer (OSI model). The idea is to improve efficiency piggy back acknowledgement (of the received data) on the data frame (to be sent) instead of sending a separate frame.
Differences between Hub, Switch and Router?

Hub	Switch	Router
Physical Layer Device	Data Link Layer Device	Network Layer Device
Simply repeats signal to all ports	Doesn’t simply repeat, but filters content by MAC or LAN address	Routes data based on IP address
Connects devices within a single LAN	Can connect multiple sub-LANs within a single LAN	Connect multiple LANS and WANS together.
Collision domain of all hosts connected through Hub remains one. i.e., if signal sent by any two devices can collide.	Switch divides collision domain, but broadcast domain of connected devices remains same.	It divides both collision and broadcast domains,

What happens when you type a URL in web browser?
A URL may contain request to HTML, image file or any other type.

If content of the typed URL is in cache and fresh, then display the content.
Else find IP address for the domain so that a TCP connection can be setup. Browser does a DNS lookup.
Browser needs to know IP address for a url, so that it can setup a TCP connection. This is why browser needs DNS service. Browser first looks for URL-IP mapping browser cache, then in OS cache. If all caches are empty, then it makes a recursive query to the local DNS server. The local DNS server provides the IP address.
Browser sets up a TCP connection using three way handshake.
Browser sends a HTTP request.
Server has a web server like Apache, IIS running that handles incoming HTTP request and sends a HTTP response.
Browser receives the HTTP response and renders the content.

What is DHCP, how does it work?

The idea of DHCP (Dynamic Host Configuration Protocol) is to enable devices to get IP address without any manual configuration.
The device sends a broadcast message saying “I am new here”
The DHCP server sees the message and responds back to the device and typically allocates an IP address. All other devices on network ignore the message of new device as they are not DHCP server.

In Wi Fi networks, Access Points generally work as a DHCP server.

What is ARP, how does it work?
ARP stands for Address Resolution Protocol. ARP is used to find LAN address from Network address. A node typically has destination IP to send a packet, the nodes needs link layer address to send a frame over local link. The ARP protocol helps here.

The node sends a broadcast message to all nodes saying what is the MAC address of this IP address.
Node with the provided IP address replies with the MAC address.

Network Devices (Hub, Repeater, Bridge, Switch, Router and Gateways)

1. Repeater – A repeater operates at the physical layer. Its job is to regenerate the signal over the same network before the signal becomes too weak or corrupted so as to extend the length to which the signal can be transmitted over the same network. An important point to be noted about repeaters is that they do no amplify the signal. When the signal becomes weak, they copy the signal bit by bit and regenerate it at the original strength. It is a 2 port device.

2. Hub – A hub is basically a multiport repeater. A hub connects multiple wires coming from different branches, for example, the connector in star topology which connects different stations. Hubs cannot filter data, so data packets are sent to all connected devices. In other words, collision domain of all hosts connected through Hub remains one. Also, they do not have intelligence to find out best path for data packets which leads to inefficiencies and wastage.

3. Bridge – A bridge operates at data link layer. A bridge is a repeater, with add on functionality of filtering content by reading the MAC addresses of source and destination. It is also used for interconnecting two LANs working on the same protocol. It has a single input and single output port, thus making it a 2 port device.

4. Switch – A switch is a multi port bridge with a buffer and a design that can boost its efficiency(large number of ports imply less traffic) and performance. Switch is data link layer device. Switch can perform error checking before forwarding data, that makes it very efficient as it does not forward packets that have errors and forward good packets selectively to correct port only. In other words, switch divides collision domain of hosts, but broadcast domain remains same.

5. Routers – A router is a device like a switch that routes data packets based on their IP addresses. Router is mainly a Network Layer device. Routers normally connect LANs and WANs together and have a dynamically updating routing table based on which they make decisions on routing the data packets. Router divide broadcast domains of hosts connected through it.

6. Gateway – A gateway, as the name suggests, is a passage to connect two networks together that may work upon different networking models. They basically works as the messenger agents that take data from one system, interpret it, and transfer it to another system. Gateways are also called protocol converters and can operate at any network layer. Gateways are generally more complex than switch or router.

17 October, 2017

What are the differences between a public cloud and a private cloud?

Private clouds are those that are built exclusively for an individual enterprise. They allow the firm to host applications in the cloud, while addressing concerns regarding data security and control that is often lacking in a public cloud environment. It is also known as an internal or enterprise cloud and resides on the company's intranet or hosted data center where all of your data is protected behind a firewall.

Public Cloud

Pay for whatever resource you need at whatever time period.
These are provided commercially.
Supports heavy workloads without disturbing any functionality.
It is very cheap for the consumers, since the hardware, application and other costs are handled by the providers.
There is no wasted resource because consumers are charged for what they use.
Scalability is always met here.

Private Cloud

It is owned by a specific private group for their own use of employed, partners and their own customers.
Highly controlled and not accessible by anyone other than allowed.
Security, governance and compliance is highly automated.
Similarly, the features are like a Public Cloud irrespective of security and maintenance.
The cost is very high.

A program sorts an array of integer. Write down the code that tests the sorting algorithms written in the program.

@Test
public void testSort() {
  String[] arr = {4, 5, 3};
  String[] expected = {3, 4, 5};
  sort(arr); // or whatever your sort method call it like
  assertArrayEquals(expected, arr);
}

1. Set up an array.
2. Set up another array with the expected result.
3. Call swap on the first array
4. use an assertArrayEquals call.

ASP.Net 4.5 Garbage Collection Improvement

Fundamentals of Garbage Collection

In the common language runtime (CLR), the garbage collector serves as an automatic memory manager. It provides the following benefits:+

Enables you to develop your application without having to free memory.
Allocates objects on the managed heap efficiently.
Reclaims objects that are no longer being used, clears their memory, and keeps the memory available for future allocations. Managed objects automatically get clean content to start with, so their constructors do not have to initialize every data field.
Provides memory safety by making sure that an object cannot use the content of another object.

The .NET Framework's garbage collector manages the allocation and release of memory for your application. The garbage collector's optimizing engine determines the best time to perform a collection, based upon the allocations being made. When the garbage collector performs a collection, it checks for objects in the managed heap that are no longer being used by the application and performs the necessary operations to reclaim their memory."

“Garbage collector is one real heavy task in a .NET application. And it becomes heavier when it is an ASP.NET application. ASP.NET applications run on the server and a lot of clients send requests to the server thus creating loads of objects, making the GC really work hard for cleaning up unwanted objects.”

“To overcome the above problem, server GC was introduced. In server GC there is one more thread created which runs in the background. This thread works in the background and keeps cleaning…objects thus minimizing the load on the main GC thread. Due to double GC threads running, the main application threads are less suspended, thus increasing application throughput. To enable server GC, we need to use the gcServer XML tag and enable it to true.”

<configuration>
   <runtime>
      <gcServer enabled="true"/>
   </runtime>
</configuration>

This is not done by default. The MSDN information page says “There are only two garbage collection options, workstation or server. For single-processor computers, the default workstation garbage collection should be the fastest option. Either workstation or server can be used for two-processor computers. Server garbage collection should be the fastest option for more than two processors. Use the GCSettingsIsServerGC property to determine if server garbage collection is enabled.”

“In the .NET Framework 4 and earlier versions, concurrent garbage collection is not available when server garbage collection is enabled. Starting with the .NET Framework 4.5, server garbage collection is concurrent. To use non-concurrent server garbage collection, set the <gcServer> element to true and the <gcConcurrent> element to false. “

So if you’re using ASP.Net 4.5 and have a multi-core server, you should try turning on the Server Garbage Collection and do some profiling to see if it improves the performance of your site.

05 October, 2017

What is DHCP and How Does it Work ?

Dynamic Host Configuration Protocol (DHCP ) is the technology that automatically assigns IP addresses to network devices. Most network administrators prefer to use DHCP rather than manually assigning IP addresses.

A user turns on a computer which has a DHCP client.
The client computer sends a broadcast request ( called a DISCOVER) to every device in the network, looking for a DHCP server to answer.
The router directs the broadcast request or DISCOVER packet to the correct DHCP server.
The server with DHCP server software receives the DISCOVER packet. Depending on availability and usage policies fixed on the server, the server determines an appropriate address to give to the client. The server then reserves that address for the client temporarily and sends back to the client computer, an OFFER (or DHCPOFFER) packet, containing that address information. The server also configures the client’s WINS servers, DNS servers, NTP servers, and sometimes other services as well.
The client then sends a REQUEST (called DHCPREQUEST) packet, conforming the server that it intends to use the address.
The server then sends an ACK (or DHCPACK) packet, confirming that the respective client computer has a been given a lease on the address for a specific period of time defined by the server.

If a computer uses a static IP address, that means it was manually configured to use that specific IP address. The main problem of manual assigning of the IP’s is that it is prone to user error which may lead to configuring of two computers with the same IP address. It creates a conflict that results in loss of service. Using DHCP reduces the work as it dynamically assigns IP addresses and also reduces the chances of conflicts.

Read Details

What is RISC architecture? What is CISC architecture? Difference between RISC and CISC

What is RISC architecture?:

RISC is acronym for Reduced Instruction Set Computing.In this architecture a microprocessor is designed to perform a smaller number of types of instructions so that it can operate at a higher speed (perform more millions of instructions per second, or MIPS).

Since each instruction type that a computer must perform requires additional transistors and circuitry, a larger list or set of computer instructions tends to make the microprocessor more complicated and slower in operation. That is why less instruction means fast in operations.Pipelining is one of the unique feature of RISC. It is performed by overlapping the execution of several instructions in a pipeline fashion.

Example: Apple iPod and Nintendo DS etc.

What is CISC architecture?:

CISC is acronym for Complex Instruction Set Computing.CISC architecture are designed to decrease the memory cost. The large programs need more storage, thus increasing the memory cost and large memory becomes more expensive.

To solve these problems, in CISC architecture the number of instructions per program is reduced by embedding the number of operations in each instruction. Thereby the instructions are more complex.

Example:IBM 370/168, VAX 11/780, Intel 80486 etc

#	RISC	CISC
1	Simple instructions taking one cycle.	Complex instructions taking multiple cycle.
2	Very few instructions refer memory.	Most of instructions may refer memory.
3	Instructions are executed by hardware.	Instructions are executed by micro-program.
4	Fixed format instructions.	Variable format instructions.
5	Few instructions.	Many instructions.
6	Complex addressing modes are synthesized in software.	Supports complex addressing modes.
7	Multiple register sets.	Single register set.
8	Highly pipelined.	Not pipelined or less pipelined.
9	Conditional jump can be based on a bit anywhere ¡n memory.	Conditional jump is usually based on status register bit.
10	Complexity is in the compiler	Complexity is in the micro-program
11	Few addressing mode and most instructions have register to register addressing mode	Many addressing modes

04 October, 2017

Introduction to Peer-to-Peer Networks

Peer-to-peer networking is an approach to computer networking in which all computers share equivalent responsibility for processing data. Peer-to-peer networking (also known simply as peer networking) differs from client-server networking, where certain devices have responsibility for providing or "serving" data and other devices consume or otherwise act as "clients" of those servers.

Characteristics of a Peer Network

Peer-to-peer networking is common on small local area networks (LANs), particularly home networks.

Both wired and wireless home networks can be configured as peer-to-peer environments.

Computers in a peer-to-peer network run the same networking protocols and software. Peer networks devices are often situated physically near one another, typically in homes, small businesses and schools. Some peer networks, however, utilize the internet and are geographically dispersed worldwide.

Home networks that use broadband routers are hybrid peer-to-peer and client-server environments. The router provides centralized internet connection sharing, but files, printer, and other resource sharing are managed directly between the local computers involved.

Peer-to-Peer and P2P Networks

Internet-based peer-to-peer networks became popular in the 1990s due to the development of P2P file-sharing networks such as Napster. Technically, many P2P networks are not pure peer networks but rather hybrid designs as they utilize central servers for some functions such as search.

Peer-to-Peer and Ad Hoc Wi-Fi Networks

Wi-Fi wireless networks support ad hoc connections between devices. Ad hoc Wi-Fi networks are pure peer-to-peer compared to those that use wireless routers as an intermediate device. Devices that form ad hoc networks require no infrastructure to communicate.

Benefits of a Peer-to-Peer Network

P2P networks are robust.

If one attached device goes down, the network continues. Compare this with client-server networks when the server goes down and takes the entire network with it.

You can configure computers in peer-to-peer workgroups to allow sharing of files, printers and other resources across all the devices. Peer networks allow data to be shared easily in both directions, whether for downloads to your computer or uploads from your computer

On the internet, peer-to-peer networks handle a high volume of file-sharing traffic by distributing the load across many computers. Because they do not rely exclusively on central servers, P2P networks both scale better and are more resilient than client-server networks in case of failures or traffic bottlenecks.

Peer-to-peer networks are relatively easy to expand. As the number of devices in the network increases, the power of the P2P network increases, as each additional computer is available for processing data.

Security Concerns

Like client-server networks, peer-to-peer networks are vulnerable to security attacks.

Because each device participates in routing traffic through the network, hackers can easily launch denial of service attacks.
P2P software acts as server and client, which makes peer-to-peer networks more vulnerable to remote attacks than client-server networks.

Data that is corrupt can be shared on P2P networks by modifying files that are already on the network to introduce malicious code.

Virtual Private Networks (VPNs)

One of the most exciting and predominant technology areas in modern networking has been the Virtual Private Network. Thanks to these technologies, companies can save vast sums of money using the public Internet to safely and securely replace the need for a dedicated WAN. This article teaches readers about the core principles that make VPNs such an important and growing area of networking.

Virtual Private Networks (VPNs) are tools that allow network users to connect through the public internet to an organization’s internal network. Where many companies rely on dedicated leased lines to connect remote physical sites, it simply is not feasible to rely on the same technology to allow dozens or even hundreds of remote workers to connect from home or from temporary field locations. VPNs offer a secure method of connectivity that encrypts data, and they are perfect solutions for individual remote users or very small remote sites. In addition to the technological advantage, VPNs help reduce costs. In a nutshell, VPNs when properly implemented, can provide wide area security, reduce the costs associated with traditional leased lines, and provide effective support of telecommuters, and road warriors. Additionally, both the organization and the remote users can save money. The company can forgo the cost of leased lines, and the remote users escape long distances costs by using a local service provider to communicate with their headquarters office. All these advantages are made possible by the concept of a Virtual Tunnel that re-encapsulates a data packet inside another data packet and transmits it over the public internet.

Types of VPNs

Site-to-Site

In a site to site VPN, data is encrypted from one VPN gateway to the other, providing a secure link between two sites over the internet. This would enable both sites to share resources such as documents and other types of data over the VPN link.

Remote Access VPN

In a Remote Access VPN deployment which is also known as a mobile VPN a secure connection is made from an individual computer to a VPN router. This enables a user to access their e-mail, files and other resources at work from outside the network, provided they have an internet connection. There are a number of common forms of technology used in remote access VPNs and VPN Tunnels.

VPN Networking Protocols

VPN tunnels rely on one of four major networking protocols, all of which can provide different levels of security.

PPTP (Point-to-Point Tunneling Protocol)

PPTP supports the use of VPNs. Using PPTP allows remote users to access their business networks in a secure fashion while using Microsoft Windows Systems and other PPP (Point to Point tunneling Protocol) capable platforms. Remote users leverage their local internet providers to connect securely to their networks via the internet.

PPTP brings with it its own problems and is a weak security protocol compared to other options; however, it should be pointed out that Microsoft has enhanced the operation of PPTP to correct protocol instabilities. This protocol is easier to deploy than a solution like IPSec which we will discuss later.

L2TP (Layer 2 Tunneling Protocol)

L2TP is an enhancement of PPTP (Point to point tunneling protocol) and it is widely used by internet service providers to offer VPN services over the internet. L2TP is actually a hybrid of two protocol types: PPTP (Point-to-Point Tunneling Protocol) and L2F (Layer 2 Forwarding Protocol), with some other functionality stolen from IPSec. It should be noted that L2TP can be deployed in unison with IPSec to satisfy virtually any encryption, authentication, or data integrity requirements.

IPSec (IP Security)

IPSec works at the third layer of the OSI model and as such can it can protect any protocol that runs on the IP stack. IPSec is actually a suite of protocols and associated algorithms that can be expanded on in a modular fashion. IPSec is a strong, flexible, and scalable security protocol and virtually perfect for securing VPNs. IPSec requires significant amounts of setup on a network as well as on the client. This makes the protocol a complicated solution to work with, not to mention it is much more of a processor strain for all devices that run it than its lighter weight counterparts. The added bonus is that IPSec can be used for both site-to-site and Remote Access VPNs.

SSL VPN (Secure Socket Layer)

SSL VPN provides the best of both worlds when it comes to protection and ease of use. Another benefit is that it has been a tried and true solution that has been used heavily on the internet for years. Most commonly this protocol is employed by online stores and online banking. When you see https: in your browser URL bar, you know immediately that you are being protected by SSL.

The reduction in complication offered by SSL VPNs comes when you consider that the client no longer requires client software to be installed and running. This single fact lessens the burden of the protocol as well as reduces the overhead needed to maintain and troubleshoot it in a working environment. The absence of client software means that a user needs to rely on a secure portal. A secure portal is a graphical interface served up to a web browser that provides tools and access to applications running on the network. Today one of the most common applications served up in this fashion are email and thin clients tools like RDP. SSL can also approximate the way IPSec works with additional lightweight software that can be installed with very little effort via the browser. This fact can simplify the processes involved in securely accessing the corporate network.

SSL VPNS can literally support thousands of end users that need access to the headquarters network without requiring the support of an administrator or even a single hour of configuring or troubleshooting unlike IPSec protocol.

VPNs on a Linksys Router

If you have a Linksys router, you can set it up so that you can form a VPN through the router itself, giving you a way to securely access a computer on your network. Emphasizing both the value of VPNs and their popularity the majority of Cisco’s Linksys devices allow these protocols to pass through the router’s firewall by default. This behavior can be changed on a Linksys router via the web-based setup page where we can find the Security tab. Once this tab has been selected, we will be presented with the VPN Passthrough sub-tab. After clicking the sub-tab we will see three categories of VPN Passthrough options and radio buttons that allow us to disable or enable pass-through behavior:

IPSec Passthrough—IPSec Pass-Through is enabled by default. To disable IPSec Passthrough, select Disabled.
L2TP Passthrough—L2TP Pass-Through is enabled by default. To disable L2TP Passthrough, select Disabled.
PPTP Passthrough —PPTP Pass-Through is enabled by default. To disable PPTP Passthrough, select Disabled.

Any changes can be made permanent by clicking Save Settings and the Continue button on the next page.

Conclusion

VPNs are renowned for eliminating the need for expensive leased connectivity. T1 lines or frame circuits have traditionally been employed to connect multiple office locations in a secure fashion. If the office locations are very far apart, the cost of renting leased lines can be exorbitant. A VPN, however, only requires a broadband internet connection. Plus they eliminate the monthly cost of dedicated lines. This means that VPNs offer an excellent and cost effective solution for companies with several branch offices, partners, and/or remote users to share data and connect to a corporate network in a secure and private manner.

Differences between type 1 hypervisors and type 2 hypervisors.

In virtualization, the hypervisor (also called a virtual machine monitor) is the low-level program that allows multiple operating systems to run concurrently on a single host computer. Hypervisors use a thin layer of code in software or firmware to allocate resources in real-time. You can think of the hypervisor as the traffic cop that controls I/O and memory management.
There are two types of hypervisors: Type 1 and Type 2.
Type 1 hypervisors run directly on the system hardware. They are often referred to as a "native" or "bare metal" or "embedded" hypervisors in vendor literature.
Type 2 hypervisors run on a host operating system. When the virtualization movement first began to take off, Type 2 hypervisors were most popular. Administrators could buy the software and install it on a server they already had.
Type 1 hypervisors are gaining popularity because building the hypervisor into the firmware is proving to be more efficient. According to IBM, Type 1 hypervisors provide higher performance, availability, and security than Type 2 hypervisors. (IBM recommends that Type 2 hypervisors be used mainly on client systems where efficiency is less critical or on systems where support for a broad range of I/O devices is important and can be provided by the host operating system.)

Experts predict that shipping hypervisors on bare metal will impact how organizations purchase servers in the future. Instead of selecting an OS, they will simply have to order a server with an embedded hypervisor and run whatever OS they want.

Fragmentation In Database

Horizontal fragmentation
It refers to the division of a relation into subsets (fragments) of tuples (rows). Each fragment is stored at a different node, and each fragment has unique rows. However, the unique rows all have the same attributes (columns). In short, each fragment represents the equivalent of a SELECT statement, with the WHERE clause on a single attribute.
Vertical fragmentation
It refers to the division of a relation into attribute (column) subsets. Each subset (fragment) is stored at a different node, and each fragment has unique columns—with the exception of the key column, which is common to all fragments.
Types of horizontal fragmentation
i. Primary Horizontal Fragmentation
It is the fragmentation of primary relation
e.g. Employee table is fragmented for Department No.
ii. Derived Horizontal Fragmentation

Fragmentation of the secondary relations that are dependent on the primary relation; related with foreign keys.

Inodes data structure in linux (unix)

We are used to thinking about a directory containing files. This is really an illusion. Directories do not contain files. The data of the files is not stored in the directory.

A directory is really just a file. It's a special file with special rules (you can't just type "cp /dev/null directory" to erase it. It's got special bits to make sure a mere mortal can't mess it up. Because if a file system gets corrupted, then you can say goodbye to your data. On older UNIX systems, you actually could "read" the contents, using 'cat .', of a directory. But let me get back to that in a second...

A Unix file is "stored" in two different parts of the disk - the data blocks and the inodes. (I won't get into superblocks and other esoteric information.) The data blocks contain the "contents" of the file. The information about the file is stored elsewhere - in the inode.

Both the inodes and data blocks are stored in a "filesystem" which is how a disk partition is organized. But these inodes are strange and confusing. Let me give you an introduction.

"ls -i" lists the inode of a file

Normal Unix/Linux/MacOS users aren't even aware that inodes exist. But there's an easy way to discover them - using the "ls -i" command. Let's look at the root file system:

% cd /

% ls -i

2637825 bin 983041 etc 1572865 lib 2981889 media 2531329 root 106497 selinux 81921 usr

196609 boot 2 home 1761281 lib64 2129921 mnt 6416 run 2457601 srv 425985 var

The "-i" option lists the inode number before the filename. The numbers look like large numbers, except for "home." Now let's get more information, and list some more files by added "-a" and "-l" options:

% ls -lai | tail -7

total 132

2 drwxr-xr-x 24 root root 4096 Feb 26 13:31 .

2 drwxr-xr-x 24 root root 4096 Feb 26 13:31 ..

2637825 drwxr-xr-x 2 root root 4096 Jan 14 19:02 bin

196609 drwxr-xr-x 3 root root 4096 Feb 24 10:41 boot

3 drwxr-xr-x 16 root root 4460 Mar 5 09:35 dev

983041 drwxr-xr-x 206 root root 12288 Mar 5 07:45 etc

2 drwxr-xr-x 14 root root 4096 Dec 29 09:24 home

That's interesting - three of the files have the inode value of "2". But as you shall see, this makes perfect sense.

As Unix systems can support many different types of file systems, in the "classic"filesystem, inode #2 is always the root file system. If you want to look for a file, you start with inode #2 and work down into the directory structure. Normally the ".." directory points to the parent directory, but since "/" is the top of the tree, the parent of "/" is "/".

The "dev" directory has the inode "3". I suspect that when the filesystem was created, the "/dev" directory was the first file to be created.

But, you may wonder, why does "home" have the inode of "2"? You have sharp eyes.

The reason is simple. It happens to be a different partition, and "/home" is the root of that partition.

Inodes are always unique, but unique per partition. To uniquely identify a file, you need the inode and the device (the disk partition).

What is in an inode?

Before I said the data blocks contain the contents of the file. The inode contains the following pieces of information

Mode/permission (protection)
Owner ID
Group ID
Size of file
Number of hard links to the file
Time last accessed
Time last modified
Time inode last modified

As I said, a file system is divided into two parts - the inodes and data blocks. Once created, the number of blocks of each type is fixed. You can't increase the number of inodes on a partition, or increase the number of disk blocks. (See the manual pages on making and tuning file systems - mkfs.ext2).

Notice something missing? Where is the NAME of the file. Or the Path? It's NOT in the inode. It's NOT in the data blocks. It's _in_ the directory. That's right. A "file" is really in three (or more) places on the disk.

You see, the directory is just a table that contains the filenames in the directory, and the matching inode. Think of it as a table, and the first two entries are always "." and ".." The first points to the inode of the current directory, and the second points to the inode of the parent directory. By Definition. As spoken by the Gods of Unix. Verily.

This inode-magic is how you can create a "hard link" - having two or more names for the same file. Think of a directory as a table, which contains the name and the inode of each file in the directory. This is an important point - the name of the file is only used in directory. You can have another directory "containing" the same file, but it can have a different name.

When you create a hard link, it just created a new name in the table, along with the inode, without moving the file. When you move a file (or rename it), you don't copy the data. That would be Slow. You just create the (name,inode) entry in a new directory, and delete the old entry in the table inside the old directory entry. In other words, moving a gigabyte file takes very little time. In the same way, you can move/rename directories very easily. That's why "mv /usr /Old_usr" is so fast, even though "/usr" may contain (for example) 57981 files.

You can see this "inode" stuff if you use the "ls -i" option. It lists the inode number. find(1) can use it as well. Let's also use the "-d" option to list information about the directory, rather than the contents of the directory.

First - let's make a new directory using

cd /tmp

mkdir junk

cd junk

If you do a

ls -id ..

cd ..

ls -id .

You will get results that look like this

/tmp/junk$ ls -id ..

327681 ..

/tmp/junk$ cd ..

/tmp$ ls -id .

327681 .

You will see that the these two "files" point to the same inode - which has the number 327681.

Inode Structure of a Directory:

inode pointer structure

Learn Through Technology

Search