1. Are you ready for the Galaxy S20? Here is everything we know so far!

3D camera speculations: what new possibilities?

Discussion in 'Android Devices' started by CanDMan, May 18, 2011.

  1. CanDMan

    CanDMan Well-Known Member
    Thread Starter

    Aside of the obvious video and photo tasks, what new uses can you conjure up for a device which can record and display in 3D or for the hardware that gives it that ability?

    For example:

    Using both cameras to improve a 2D photo. (Migrating the data from each camera into one picture, would this be enough to boost the quality?)


    Using the cameras to try to calculate the distance of an object (range finder). Could it be done? would it be outrageously inaccurate?

    What are your ideas?

    1. Download the Forums for Android™ app!


  2. Jensen

    Jensen Member

    I hate to be a bad news bear but Android does not have any native APIs in Gingerbread to access the second rear camera. As a developer it's either FRONT or BACK. If you call for BACK on the 3vo it's going to give you the top image sensor of the two.

    HTC may have some custom API's to access the additional hardware though.
  3. CanDMan

    CanDMan Well-Known Member
    Thread Starter

    Never say never ...so to speak ...while direct access may indeed be to the cameras as a set, one could always code to work with the captured images. Less elegant, but, the data is attainable.
  4. Jensen

    Jensen Member

    That's true, I guess you could pull the stereoscopic image apart. Is there any PC software that does this now? If so, is this software able to do anything interesting with the images when they are separated?
  5. EarlyMon

    EarlyMon The PearlyMon
    VIP Member

    Yes, the software exists, I've posted several links here for various forms of it.

    Only thing the stuff I'd found did was convert 3d to 2d - that was it. I'll look for the links, it's probably in the thread on using 3d pics on Facebook or the like...

    I looked everywhere I could think for 3d image processing to fuse into a higher 2d image, but while looking, someone pointed out the obvious flaws in that plan (can't recall, so I'll paraphrase and probably inadvertently exaggerate (at least I'm honest)) - with different perspectives for the two cameras, which are you trying to enhance? How much additional detail is really there, as opposed to additional detail showing different views (perspectives) of the same objects? How much of a processing budget is there and how much is required?

    These may be solvable issues, but the solutions seem non-trivial.

    PS - Here's a link with a link - the post shows the file type terms to search to find this as PC software -

  6. ArmageddonX

    ArmageddonX Android Expert

    All the ideas that come to mind for me would be more of a range-finding system. Distance to object etc etc

    An App that you could use to measure the square footage of a room would be cool. If you could spin 360 degrees in an empty room while it was range-finding the walls and then gave you the square footage would be cool.

    Or what about if you could take a picture of a person or item and the App could tell you the Height and/or Width of the item using the distance and size to judge the statistics?

    That's all the ideas I have, for now. ;)
  7. CanDMan

    CanDMan Well-Known Member
    Thread Starter

    Interesting ...how about being able to scan your home into a 3D FPS game ...would it freak you out to see zombies coming down your stairs? :D ...Sounds way too ambitious to me, and yet ....it should be reasonably doable no? :D
  8. yourfriendmat

    yourfriendmat Well-Known Member

    A range finder should be doable. It would just boil down to the Pythagorean theorem, and we know the separation of the cameras. What I'm envisioning is something that separates the stereoscopic image and then lays them over each other as ghost images. You could then pinch in and out to line up the two images at the point which you would like to measure.

    If any of you skilled android devs use this idea, just make sure you give me some credit ;-)
  9. CanDMan

    CanDMan Well-Known Member
    Thread Starter

    Read this article today ...about a CS student that created a 3D scanner app for the iPhone 4 ...so definitely possible even without a 3D camera setup

    Trimensional app turns the iPhone into a 3D scanner

    ...interesting strategy he used also ...hold phone near face while it takes 4 photos ..in the dark ...screen lights your face from different angles for each photo ...neat stuff ...How long before an android version?
  10. GaryColeman

    GaryColeman Well-Known Member

    I equip the two back cameras with the capability to be used as projectors to show movies on a wall or screen. You could either show one 3D movie using both lenses or if you and a friend wanted to watch different movies it would project two separate 2D movies. And then I'd add a small pan on the back to pop popcorn as the processor heated up. Possibly a butter melting option as well.

    Now sit back and enjoy the show....

    {patent pending}
  11. fattank

    fattank Member

    Here's some old info contained in a "full disclosure" about the Evo 3D some time ago.
    Granted that this was originally posted be someone else, but some of the ideas are genuinely clever. I've added them to my own in the list below of what could possibly result from the dual-camera setup. Tell me what you think!

    • Greater detail. Using both cameras at once to capture two 5mp images and "overlaying" them is interesting. Maybe adaptively "adding" the details in one to the other? Maybe using some kind of "interlaced" trick, or maybe the tilt is a bit different so both can take pictures of 2 different [but adjacent] visual fields for "instant panorama"?
    • Greater depth of focus. Each camera can shoot simultaneously with a different focus, and the result easily combined into a picture with incredible depth of focus.
    • Sport mode (shutter speed enhancement) seems like a wonderful idea. Both cameras fire, one a few ms after the other, and the result motion compensated. The perspective difference may also be used to identify and correct for motion blur. As an instantaneous "anti-shake" this would work wonderfully, too. All in all, this would allow the camera to capture faster motion with less blur!
    • Double speed burst-mode. Burst mode could theoretically be twice as fast, since the other lens could fire during the recovery time of the first.
    • Dark mode / night owl / de-graining. A fantastic idea would be to use both cameras to capture a picture simultaneously at high ISO and use a simple perspective-corrected denoiser to drastically improve quality of night scenes, which suffer most from grain issues. Each picture is bound to have a different amount of "random noise" or "camera noise" from the two cameras, and after perspective correction (and/or motion compensation, if one fires after the other), the randomness is detected and removed from the combined image.
    • HDR (High Dynamic Range) can be achieved with no additional delay by having both cameras snap at different exposures and combining the result, solving the "low dynamic range" problem with the Sensation and other phone cameras.
    • Apply all of these to video. If both capture video simultaneously, there could be some benefit in 2D mode -- timestamps of the recorded frames can allow them to be combined to produce a single video of higher effective average framerate -- again, with some simple HW perspective correction filter. It could also drastically improve the quality of night video in the same way as the above "night mode" camera use with the additional potential advantage of temporal denoising, resulting in the cleanest night video you can imagine. Not to mention the first ever high dynamic range (HDR) video on a mobile device.

    Even if these aren't included in the official ROM, I'm sure there can be "an app for [some of] that." Exciting prospect, is it not? The only one of these that seems a bit hard to chew is of course the 10mp nonsense. At most the combined detail would just produce a "very nice" 5mp shot.

    Cross-posted from XDA with explicit permission of original poster, myself (is there a reason it looks better on here?). ;)
  12. lordofthereef

    lordofthereef Android Expert

    This all sounds great, but everything you just suggested here is only on ultra high end cameras (if even those). We are talking about a smart phone camera.

    My bet? it will do just what they say it does.
    1. A single 3d picture using both cameras.
    2. a single 2d picture using only one of the cameras.
  13. fattank

    fattank Member

    High-end? Exactly. The point I'm making is that these "high-end" capabilities are certainly feasible on such a dual-camera setup by the means I describe.

    But my bet on the out-of-box device? Exactly the same as yours. ;) But don't forget, there are always software updates, and now that the bootloader isn't fried, developers and enthusiasts will probably find a way to access both cameras to implement some of these ideas.

    And that, my friend, is the very first step on the road to any of these ideas become realities shortly thereafter.
  14. lordofthereef

    lordofthereef Android Expert

    I wasn't trying to argue (if that's how you took it). Just saying that I don't really see many (or any) of these things happening officially ever. Look at the camera on the EVO 4G. Never really got touched. That said, they are touting the 3D camera a lot on this EVO 3D, so you never know. I still place my bets on being happy with what you get at launch, because I don't feel it will change.
  15. swnic

    swnic Well-Known Member

    I hope there are some "cool" things we will be able to do with this phone camera, but the post your is referring comes from BSOD and he has a questionable rep here in the forums:eek:
  16. novox77

    novox77 Leeeroy Jennnkinnns!

    All of the features originally suggested by BlueScreen make the assumption that you can combine the stereo images into one, or some other sort of merge algorithm that auto-corrects for the perspective difference.

    I can tell you right now that the success of such a merge is quite low. The closer your subject is to the phone, the more different the two images will appear, and the harder it is to merge. You also have to hold the camera perfectly steady. Any change in level, pitch, roll, etc, is going to drastically increase the difference between frames.

    If you were to alternate (burst) L and R images as suggested by BlueScreen for the sports shots, you would get a subject that appears to vibrate back and forth (again due to the different perspective of the cameras). This would look incredibly weird, especially in the golf ball example given.

    Same with the HDR use-case. The images won't line up, so your resulting HDR would be like a traditionally obtained HDR where wind is moving tree branches. The merge will look all blurry and ghosty. You'll end up with the same double-image look of a 3D movie in a theater when you remove your glasses.

    The only way these techniques have a shot at working is if your entire subject is at infinite-distance focus... that is, landscape shots with nothing in the foreground. At infinite distance, the perspective of both cameras approach equal. Only then can you attempt any of what BlueScreen proposes. And even then, you may run into lens distortions (barrel/pincushion/chromatic aberration/flare) that will prevent a good merge.

    If all this is hard to visualize, try the following scenario:

    Person 1 records a subject with videocam 1. Person 2 records same subject with videocam 2. Since person 2 cannot occupy the same space as person 1, s/he has to stand next to person 1. This means the perspective of each camera is different. The hand shake due to hand-holding the cameras also makes each frame different between cameras.

    Now, how can you possibly combine the frames from both cameras' footage and expect the resulting animation to be smooth? your subject is going to vibrate violently back and forth at the framerate of your video. Left and Right images don't line up; due to perspective and hand shake; and this fundamental difference prevents ANY of the use-cases presented in the thread.

    Creative, but not going to happen in practice.
    lordofthereef likes this.
  17. lordofthereef

    lordofthereef Android Expert

    Thanks for that. Said way more elegantly than I could have.
  18. fattank

    fattank Member

    I... really do hope that was a genuine technical question, because otherwise I'd probably feel like I just wasted a lot of words. :)

    There are some well-documented ways that borrow from virtual mv-dispersion to fixed-point perspective correction. For any two images captured at a certain identical point in time with a perspective difference created from a fixed-distance lens system, it is possible to resolve the two images with a high degree of perceptual accuracy. 3D space consists of vectors of uniform, linearized convergence into their respective vanishing points, with perspective offset proportionate to their distance from the first lens, a virtual point in between the two lenses, and the second lense. Mathematically treating the distance to the virtual point between both lenses as an absolute reference of interpolated convergence, a virutal field of "displacement vectors" can be propagated to map correlation between the two (sort of a mathematical analog of the very same interocular incident waveform displacement mapping employed by the HVS in the occipital lobe to perceive '3D' depth). Much like the human brain can use the bi-referential spatial displacement between two objects to resolve a single mid-point position in addition to proximity (a capability that similarly decreases as objects are moved very close to our eyes), the length of a distance vector between a detail (parsed by pixel/macroblock, successive approximation after a fast hexagonal "motion search" or a SATD-adjusted refinement search a la MPEG) for its counterpart on the other image will yield the vectors in question. Summation of SSD within a cohesive vector field (of arbitrary lambda depending on vector accuracy/SAD threshold) weighted by its boundaries will yield you your perspective ("object-adjusted vanishing point") vectors from your correlation vectors. A convolution (and/or further signal processing) can then take place to produce the resultant interpolated reference image. Texture preservation and the like will be kept, as will most discrete detail. This same process (and faster variants below) can be used directly for two of the 'features' I mentioned. Low SAD threshold => more detail preserved (potentially much more than a normal 5mp image). High SAD threshold => only the best vectors are emphasized, de-noising takes place (potentially same detail as a normal 5mp image but with substantially less grain).

    There are "simpler" ways to go about it, such as to establish one of the pictures as a reference and perform a cursory analysis of domain-independent regions of highest similarity to the other (and residual falloff from those points) to calculate a global vanishing point as the second reference. All irregularities not accounted for (objects closer to the foreground) are simply ignored, and only those regions of calculable consistent positional similarity (and derivative along a simple straight line) are merged into one of the images. Even if no "vanishing point" can be identified, an approximation based on transformed differences (hadamard) can be used (or a vanishing point(s) arbitrarily created with k-means cluster analysis of the resulting vectors). That way, even if all parts of the reference image that don't benefit from the other are left alone, all the parts mappable to the other are merged in accordance to the falloff transform. You'd get "some" added detail or grain removal in the most conserved regions of the picture, or the regions most easily mappable to another along a simple vector.

    The simplest method would of course be to read the lens' focus to establish a working distance plane over which a simple 2D planar image skew would "warp" one of the two pictures against the other until an artificial convergence is attained along the planar average. The process then uses the fast selective merging from the previous method to quickly pad the initial image with the most conserved detail or removed the least conserved grain. This process must employ strict limits to be useful. An auto-focus threshold can turn off this functionality when the focus is too close, obviously. This method would produce less quality than the other two, presumably. Post-processing could later feasibly be used to refine (maybe during processor idle time XD) the quality of the merge or process other regions outside the primary focal radius.

    HDR effects are simply an extension of any of these methods, but with luminance-weighted vector search (/simple picture warp) prior to merge (due to different exposure times). It is also highly beneficial that the lens set to a lower exposure time fire at a timepoint of half the exposure time difference between the two lenses after the start of the slower to produce the highest degree of symmetric temporal cohesion between the two shots of varying lengths.

    Depth of focus enhancement again uses the same analyze-process-merge strategy, but with applied Gaussian blur proportional to the square of the difference in focus between the two cameras at the pre-processing step to maximize the correlation between the motion vectors in the respective out-of-focus areas of the two images. However, this applies only during perspective/correlation processing, as once the approximate vector fields have been calculated, the combined image will simply merge together the gently (approximately) perspective-corrected in-focus backgrounds.

    Sport mode / anti-jitter is also possible -- take 3 shots (burst) with both cameras simultaneously. Use vector distance analysis to find which shots from a single camera have the shortest average vector magnitude of motion-compensated difference between one another (2D only, no analysis of perspective or vanishing point is used), and compare to the difference between the vectors generated from perspective difference alone. The picture that minimizes the ratio of Adjacent vector length / perspective vector length is the picture that is selected as the "best" or least "blurred" of the set. The process can end here in the simplest case, or post-processing can be performed similar to that in strategy 3 of the added detail/noise removal routine based on surrounding pictures. All pictures except the least blurred and/or post-processed are purged from memory.

    I agree that rapid-burst mode (or framerate increase of any sort) is perhaps the most difficult to reconcile into a single perspective due to the alternation of the cameras. That said, a burst mode employing this very technique as in the previous situation can still produce two independent streams of pictures with a temporal offset approaching the distance between the two individual burst rates. This is useful in that when you're aiming for the perfect shot (or jitter correction), you'll take either perspective.

    Video modes (except for framerate increase) are possible via the same methods, provided the GPU (and/or CPU) are capable of processing the perspective correction in tandem with the encoding. This doesn't seem like it would be too much of an issue, since hardware encoding frees up the mighty MSM8660 for all sorts of FPU + NEON accelerated threaded operations of these sorts.

    It is technically possible (and relatively straightforward) but a bit processor intensive to actually achieve this framerate increase in video or burst speed. Two videos (or picture streams) would have to be captured and encoded simultaneously (1080p is likely a no-go for video) and then in another process, de-shaking post-processing (or calculation) must occur between frames of neighboring timestamps if these frames were captured from different cameras, in addition to motion-compensated perspective correction (likely similar to that of the second method I outlined above for perspective correction) with temporal de-flicker for any misplaced macroblock noise. Long story short, it's a matter of correcting for perspective in an inherently less accurate way than if the cameras recorded instantaneously.

    Note: this entire process is simplified tremendously if both lenses are completely flat and not tilted with respect to one another as I originally thought. If the two lenses are flat, the entirety of the calculation reduces to a convolution about a singular Z axis with orientation indexed by the slope between any two distance fields in the XY plane. That is, it becomes a relatively simple calculation thanks to absolute (fixed) lens binocular displacement, and perhaps facilitated by an (adjusted) accelerometer reading for a shortcut to relative polar coordinate processing for the two lenses. From that point all it takes is an overlapped crop and a non-linear merge of the two images from the determined transverse axis (as a function of focus distance).

    Anyway, the point is it's possible to do in hardware and software with existing, straightforward DSP techniques. Most of it could actually be handled driver-side (or by camera hardware itself) if a decent driver had been written to take these into account. But due to Android kernelspace restrictions on interacting directly with other hardware components, that's obviously not possible here. Still, implementing the bulk of the work via software isn't too difficult, and since programmable vertex shaders in the GLES 2.0 pipeline are practically made for this kind of processing, I'm fairly confident it can be hardware accelerated with the Adreno 220. I'll have to get an Evo 3D myself before I can experiment (it'd be a godsend if both cameras shared a 'coronal plane,' that's for sure :p).

    Apologies for any inaccuracies/oversights; I'm a bit tired. Cheers!
  19. PyroSporker

    PyroSporker Android Expert

    :eek: My brain just melted.

    I was going to suggest that maybe it would be easier to accomplish some of these feats when using the camera in portrait mode. Got to remember that there is more than one choice for the camera configuration. I imagine merging the 'left' and 'right' channel perspective is more difficult than piecing together the 'top' and 'bottom' channel perspective? I'm probably way off base, maybe its all the same and it makes no difference which plane the 2 perspectives lie on. I can clearly see I am completely out classed here. I'll just go hide in the corner again. :D
  20. SamuraiBigEd

    SamuraiBigEd Under paid Sasquatch!

    Novox, there are software programs that do compensate for parallax quite well, the limiting factor here ( aside from keeping the lenses level ) would be storage space to load and run said programs, but the possibilities are interesting to contemplate.

    Your observations on keeping the lenses perfectly level are however the biggest limiting factor in any of these possibilities other than a quick burst mode, I don't think it would affect the pictures much unless you were taking closeups..
  21. yourfriendmat

    yourfriendmat Well-Known Member

    This is basically the thought I had about combining the images. The cameras take two photos, select one as the "main" photo and then enhance it with data from the other photo by finding similar parts of both images. I feel like this might be something better done by post-processing with a computer, though.

    I was thinking there could also be an app that would determine the distances of objects by lining up different parts of the stereo pictures captured. We know the distance between the cameras, so with some simple algebra, we should be able to determine the distance of an object. It wouldn't be very accurate since the distance between cameras is so small, but it would still be neat :) Also, you could do it with a live preview and just pinch to line up different parts of the image. I think it would be cool anyway.
  22. novox77

    novox77 Leeeroy Jennnkinnns!

    the thread got merged, so everywhere where I mention the "OP" below, I'm referring to this post: http://androidforums.com/htc-evo-3d...tions-what-new-possibilities.html#post2742949
    -end edit-

    I'm more than happy to stand corrected, and I'll admit that I have no intimate knowledge of the math involved to merge stereo images. If it is indeed possible to merge images of differing perspectives and show them sequentially in a video without the viewer seeing obvious artifacts of parallax, then a lot of the OP's suggestions do become real possibilities.

    I have personally seen my HDR post-processing software fail miserably at handling a merge when photos in the exposure-bracketed sequence are offset in view (as the result of bumping the tripod while the camera does the bracket burst shots). Hence my suggestion that the Evo3D would have to be held perfectly steady for highest chance of success.

    I still have a problem conceptualizing the solution for video, especially with the golf ball rolling on the green. Camera is not necessarily in a fixed position (it may be panning to follow the ball, and hand shake effects fully apply); the ball is not in a fixed position in the frame (unless you had a rig to pan the camera at exactly the angular velocity required to keep the ball stationary); the background behind the ball is constantly changing... So you have temporal displacement of golf ball, background, along with a non-stationary camera. Between left and right frames, the ball will be obscuring different parts of the background due to their apparent motion in the frame and the passage of time. Quite the complex math problem to solve.

    This would HAVE to be done post-process, not on the phone. Not that this is a problem; even if the phone doesn't allow such control with the stereo camera, it's conceivable that a modified camera app, along with possibly a kernel mod, could provide the raw footage for such a post-process.

    Still, I think that only a very controlled photo/video shoot could ensure a high degree of success. Gotta reduce those variables, right? So... camera on tripod to eliminate hand shake. Subject stationary or slow-moving. Subject further away to reduce parallax.

    Anyone have sample video which is the result of a mathematical merge of stereoscopic frames? I'm not looking for proof-of-concept; I can take the OP's word for it that it's possible. In particular, I'd be interested in the recording conditions (how controlled was the shot) and how those conditions can affect the final output. I'd love to be proven wrong on this one :)
  23. fattank

    fattank Member

    This is why two pictures taken at the exact same time (but from calculably different perspectives) solves a lot of the 'random motion' problem in traditional HDR, where it's admittedly tough to keep your camera exactly still (and cross you fingers the scene won't move in between your next shot).

    A motion vector approach (this is just avisynth, but the underlying analysis code is identical) followed by a trivial mask can be implemented in the GPU or software. An even faster method is an analysis-mode 3D fourier transform (spatiotemporal analysis to guage changes/'motion' throughout the frame based on surrounding frames) which is trivial to implement in the GPU (as it already has been in various incarnations for the desktop). In any case, like I hinted at in my post above, this step would logically be merged into the perspective correction -- instead of just a bifocal spatial analysis, it incorporates basic temporal motion data from surrounding frames.

    Again, some simplifications can be made -- a simpler vector-field analysis can be performed with a SAD threshold to filter out what components of the image are "most unlikely" to be accounted for their representative untransformed vector [-field] and to those regions the single-reference merge would ignore those details in one of the streams and the reference interpolated (with mv or even the simplest motion blur) with respect to the unaccounted-for components in the faster method.

    This is just a matter of software implementation, however. Adaptive thresholds based on focus, ISO, discrepencies in average vector field magnitudes, etc can be used for an even more accurate effect.

    novox77 likes this.
  24. EarlyMon

    EarlyMon The PearlyMon
    VIP Member

    It seems avisynth is down at present.

    You seem to be describing an attempt to take frame rate processing (per your use of the phrase, spatiotemporal analysis, something I've noted here and there) (aka 120/240 Hz processing on better LCD HDTVs) and apply it to blend stereoscopic images as opposed to successive frames. In either case, the processing steps are the same - except - frame rate processing assumes that the motion components are displaced by a maximum of 1/24th second - and that more than 2 successive frames are used to resolve the images in time.

    For what you propose, let's assume the two lenses are apart by at least an inch. Even at the lowest frame rate algorithm you're going to use, you're asking for the known algorithms to reconcile something with an apparent motion of 2 ft/sec - or over 1.3 mph, moving laterally.

    Your contention is that the existing algorithms can deal with that level of spatial uncertainty?

    And if so, you find that trivial? On this processor?

    Not with two images only - the uncertainty for frame rate algorithms with that little data is rather large.

    And at root in this issue, despite whether you de-multiplex space or time, is that the images have significant differences to solve in order to attempt to resolve greater detail.

    And absolutely level is not an option, you've traded time for space, it's a hard requirement.

    Yes or no?
  25. CanDMan

    CanDMan Well-Known Member
    Thread Starter

    Why is it that you have a camera capable of capturing approx 25-30 frames per second for video but not able to take individual pictures at that rate? Is it just that the write time for new files for each photo is slow?

HTC EVO 3D Forum

The HTC EVO 3D release date was July 2011. Features and Specs include a 4.3" inch screen, 5MP camera, 1GB RAM, Snapdragon S3 processor, and 1730mAh battery.

July 2011
Release Date

Share This Page