Hi all,

I'm really new in Linux, especially in driver development. Now I'm trying to write some sort of ramdisk driver (similar to sbull (LDD3), brd and RapidDisk). This should however be a simulator for a real block device.

Following questions are for me now really problems.

----------------------------1--------------------
I use a "no queue" mode, i.e., directly implement make_request function. The simulated device supports concurrent IOs and have certain time latencies for READ/WRITE ops. (which should be simulated, e.g. 20 us). The concurrent behaviour is based on the device structure, e.g. the device consists of two separate parts, which could process the IO separatly (and in parallel) (within a certain "part" requests are syncronous/sequential). For the delay simulation I've used udelay function (because usleep_range is very inaccurate for my purposes, i.e., 25us, 100us ...). Now the question: could the atomic udelay be used in such concurrent contex. Here is the code I'm thinking about (simplified ):

Code:
void make_request(struct request_queue *q, struct bio *bio)
{
    getnstimeofday(start);     
    spinlock_t lock = <<find a lock for a certain "device part" based on bio>>;
    spin_lock(lock);
    //simulate IO ....   
    getnstimeofday(end);
    udelay(needed - (end - start));
    spin_unlock(lock);
}

Without udelay this seems to be (to my mind ) a make_request function that supports parrallel requests between device "parts"(locks for each device part).

Would udelay broke this statement? I mean: could udelay be executed in PARALLEL for two (3, 4, ...) IO requests? Suppose my device consists of two separate, individual parts, which could process requests in parallel. So I will maintain two separate locks: lock1 and lock2. Once I receive a bio (in make_request function), I first figure out to which part of device this request belongs and based on this acquire either a lock1 or lock2. E.g., if lock1 is hold while the bio for the second part arrives - the process could acquire the lock2 and process further (in parallel to delay1) with the delay2. Is it so?

Could I achieve a parallelism based on acquiring different locks? Would the udelay works in such scenario in parallel?

How many parallel udelays could be executed? (#CPU ?)

----------------------------2--------------------
When is the transferred data available for user_space (i.e., for application that issues IO):
- after I've copied the data to buffer (e.g. memcpy);
- or after the call to bio_endio(bio, status) ???
this has impact on where I should put delay function.

----------------------------3--------------------
As I've understand allocating a driver memory (in my case - device memory) with vmalloc is not a good idea (bcz of mapping overhead), especially if I need huge amound of data (up to 50 Gb). Would alloc_page (indexed with RADIX_TREE) work fine for 50Gb (of course under the condition of availability of such amount of RAM)? Are there other variants?

Thanks a lot for your help?

Best,
Tim